Note that the optional watermark extension is a small IPython notebook plugin that I developed to make the code reproducible. You can just skip the following line(s).
%load_ext watermark
%watermark -a 'Sebastian Raschka' -u -d -v -p numpy,matplotlib,theano,keras
Sebastian Raschka Last updated: 08/27/2015 CPython 3.4.3 IPython 4.0.0 numpy 1.9.2 matplotlib 1.4.3 theano 0.7.0 keras 0.1.2
# to install watermark just uncomment the following line:
#%install_ext https://raw.githubusercontent.com/rasbt/watermark/master/watermark.py
from IPython.display import Image
Depending on your system setup, it is typically sufficient to install Theano via
pip install Theano
For more help with the installation, please see: http://deeplearning.net/software/theano/install.html
Image(filename='./images/13_01.png', width=500)
...
Introducing the TensorType variables. For a complete list, see http://deeplearning.net/software/theano/library/tensor/basic.html#all-fully-typed-constructors
import theano
from theano import tensor as T
# initialize
x1 = T.scalar()
w1 = T.scalar()
w0 = T.scalar()
z1 = w1 * x1 + w0
# compile
net_input = theano.function(inputs=[w1, x1, w0], outputs=z1)
# execute
net_input(2.0, 1.0, 0.5)
array(2.5)
Configuring Theano. For more options, see
print(theano.config.floatX)
float64
theano.config.floatX = 'float32'
To change the float type globally, execute
export THEANO_FLAGS=floatX=float32
in your bash shell. Or execute Python script as
THEANO_FLAGS=floatX=float32 python your_script.py
Running Theano on GPU(s). For prerequisites, please see: http://deeplearning.net/software/theano/tutorial/using_gpu.html
Note that float32
is recommended for GPUs; float64
on GPUs is currently still relatively slow.
print(theano.config.device)
cpu
You can run a Python script on CPU via:
THEANO_FLAGS=device=cpu,floatX=float64 python your_script.py
or GPU via
THEANO_FLAGS=device=gpu,floatX=float32 python your_script.py
It may also be convenient to create a .theanorc
file in your home directory to make those configurations permanent. For example, to always use float32
, execute
echo -e "\n[global]\nfloatX=float32\n" >> ~/.theanorc
Or, create a .theanorc
file manually with the following contents
[global]
floatX = float32
device = gpu
import numpy as np
# initialize
x = T.fmatrix(name='x')
x_sum = T.sum(x, axis=0)
# compile
calc_sum = theano.function(inputs=[x], outputs=x_sum)
# execute (Python list)
ary = [[1, 2, 3], [1, 2, 3]]
print('Column sum:', calc_sum(ary))
# execute (NumPy array)
ary = np.array([[1, 2, 3], [1, 2, 3]], dtype=theano.config.floatX)
print('Column sum:', calc_sum(ary))
Column sum: [ 2. 4. 6.] Column sum: [ 2. 4. 6.]
Updating shared arrays. More info about memory management in Theano can be found here: http://deeplearning.net/software/theano/tutorial/aliasing.html
# initialize
x = T.fmatrix(name='x')
w = theano.shared(np.asarray([[0.0, 0.0, 0.0]],
dtype=theano.config.floatX))
z = x.dot(w.T)
update = [[w, w + 1.0]]
# compile
net_input = theano.function(inputs=[x],
updates=update,
outputs=z)
# execute
data = np.array([[1, 2, 3]], dtype=theano.config.floatX)
for i in range(5):
print('z%d:' % i, net_input(data))
z0: [[ 0.]] z1: [[ 6.]] z2: [[ 12.]] z3: [[ 18.]] z4: [[ 24.]]
We can use the givens
variable to insert values into the graph before compiling it. Using this approach we can reduce the number of transfers from RAM (via CPUs) to GPUs to speed up learning with shared variables. If we use inputs
, a datasets is transferred from the CPU to the GPU multiple times, for example, if we iterate over a dataset multiple times (epochs) during gradient descent. Via givens
, we can keep the dataset on the GPU if it fits (e.g., a mini-batch).
# initialize
data = np.array([[1, 2, 3]],
dtype=theano.config.floatX)
x = T.fmatrix(name='x')
w = theano.shared(np.asarray([[0.0, 0.0, 0.0]],
dtype=theano.config.floatX))
z = x.dot(w.T)
update = [[w, w + 1.0]]
# compile
net_input = theano.function(inputs=[],
updates=update,
givens={x: data},
outputs=z)
# execute
for i in range(5):
print('z:', net_input())
z: [[ 0.]] z: [[ 6.]] z: [[ 12.]] z: [[ 18.]] z: [[ 24.]]
Creating some training data.
import numpy as np
X_train = np.asarray([[0.0], [1.0], [2.0], [3.0], [4.0],
[5.0], [6.0], [7.0], [8.0], [9.0]],
dtype=theano.config.floatX)
y_train = np.asarray([1.0, 1.3, 3.1, 2.0, 5.0,
6.3, 6.6, 7.4, 8.0, 9.0],
dtype=theano.config.floatX)
Implementing the training function.
import theano
from theano import tensor as T
import numpy as np
def train_linreg(X_train, y_train, eta, epochs):
costs = []
# Initialize arrays
eta0 = T.fscalar('eta0')
y = T.fvector(name='y')
X = T.fmatrix(name='X')
w = theano.shared(np.zeros(
shape=(X_train.shape[1] + 1),
dtype=theano.config.floatX),
name='w')
# calculate cost
net_input = T.dot(X, w[1:]) + w[0]
errors = y - net_input
cost = T.sum(T.pow(errors, 2))
# perform gradient update
gradient = T.grad(cost, wrt=w)
update = [(w, w - eta0 * gradient)]
# compile model
train = theano.function(inputs=[eta0],
outputs=cost,
updates=update,
givens={X: X_train,
y: y_train,})
for _ in range(epochs):
costs.append(train(eta))
return costs, w
Plotting the sum of squared errors cost vs epochs.
%matplotlib inline
import matplotlib.pyplot as plt
costs, w = train_linreg(X_train, y_train, eta=0.001, epochs=10)
plt.plot(range(1, len(costs)+1), costs)
plt.tight_layout()
plt.xlabel('Epoch')
plt.ylabel('Cost')
plt.tight_layout()
# plt.savefig('./figures/cost_convergence.png', dpi=300)
plt.show()
Making predictions.
def predict_linreg(X, w):
Xt = T.matrix(name='X')
net_input = T.dot(Xt, w[1:]) + w[0]
predict = theano.function(inputs=[Xt], givens={w: w}, outputs=net_input)
return predict(X)
plt.scatter(X_train, y_train, marker='s', s=50)
plt.plot(range(X_train.shape[0]),
predict_linreg(X_train, w),
color='gray',
marker='o',
markersize=4,
linewidth=3)
plt.xlabel('x')
plt.ylabel('y')
plt.tight_layout()
# plt.savefig('./figures/linreg.png', dpi=300)
plt.show()
...
The logistic function, often just called "sigmoid function" is in fact a special case of a sigmoid function.
Net input $z$: $$z = w_1x_{1} + \dots + w_mx_{m} = \sum_{j=1}^{m} x_{j}w_{j} \\ = \mathbf{w}^T\mathbf{x}$$
Logistic activation function:
$$\phi_{logistic}(z) = \frac{1}{1 + e^{-z}}$$Output range: (0, 1)
# note that first element (X[0] = 1) to denote bias unit
X = np.array([[1, 1.4, 1.5]])
w = np.array([0.0, 0.2, 0.4])
def net_input(X, w):
z = X.dot(w)
return z
def logistic(z):
return 1.0 / (1.0 + np.exp(-z))
def logistic_activation(X, w):
z = net_input(X, w)
return logistic(z)
print('P(y=1|x) = %.3f' % logistic_activation(X, w)[0])
P(y=1|x) = 0.707
Now, imagine a MLP perceptron with 3 hidden units + 1 bias unit in the hidden unit. The output layer consists of 3 output units.
# W : array, shape = [n_output_units, n_hidden_units+1]
# Weight matrix for hidden layer -> output layer.
# note that first column (A[:][0] = 1) are the bias units
W = np.array([[1.1, 1.2, 1.3, 0.5],
[0.1, 0.2, 0.4, 0.1],
[0.2, 0.5, 2.1, 1.9]])
# A : array, shape = [n_hidden+1, n_samples]
# Activation of hidden layer.
# note that first element (A[0][0] = 1) is for the bias units
A = np.array([[1.0],
[0.1],
[0.3],
[0.7]])
# Z : array, shape = [n_output_units, n_samples]
# Net input of output layer.
Z = W.dot(A)
y_probas = logistic(Z)
print('Probabilities:\n', y_probas)
Probabilities: [[ 0.87653295] [ 0.57688526] [ 0.90114393]]
y_class = np.argmax(Z, axis=0)
print('predicted class label: %d' % y_class[0])
predicted class label: 2
The softmax function is a generalization of the logistic function and allows us to compute meaningful class-probalities in multi-class settings (multinomial logistic regression).
$$P(y=j|z) =\phi_{softmax}(z) = \frac{e^{z_j}}{\sum_{k=1}^K e^{z_k}}$$the input to the function is the result of K distinct linear functions, and the predicted probability for the j'th class given a sample vector x is:
Output range: (0, 1)
def softmax(z):
return np.exp(z) / np.sum(np.exp(z))
def softmax_activation(X, w):
z = net_input(X, w)
return sigmoid(z)
y_probas = softmax(Z)
print('Probabilities:\n', y_probas)
Probabilities: [[ 0.40386493] [ 0.07756222] [ 0.51857284]]
y_probas.sum()
1.0
y_class = np.argmax(Z, axis=0)
y_class
array([2])
Another special case of a sigmoid function, it can be interpreted as a rescaled version of the logistic function.
$$\phi_{tanh}(z) = \frac{e^{z}-e^{-z}}{e^{z}+e^{-z}}$$Output range: (-1, 1)
def tanh(z):
e_p = np.exp(z)
e_m = np.exp(-z)
return (e_p - e_m) / (e_p + e_m)
import matplotlib.pyplot as plt
%matplotlib inline
z = np.arange(-5, 5, 0.005)
log_act = logistic(z)
tanh_act = tanh(z)
# alternatives:
# from scipy.special import expit
# log_act = expit(z)
# tanh_act = np.tanh(z)
plt.ylim([-1.5, 1.5])
plt.xlabel('net input $z$')
plt.ylabel('activation $\phi(z)$')
plt.axhline(1, color='black', linestyle='--')
plt.axhline(0.5, color='black', linestyle='--')
plt.axhline(0, color='black', linestyle='--')
plt.axhline(-1, color='black', linestyle='--')
plt.plot(z, tanh_act,
linewidth=2,
color='black',
label='tanh')
plt.plot(z, log_act,
linewidth=2,
color='lightgreen',
label='logistic')
plt.legend(loc='lower right')
plt.tight_layout()
# plt.savefig('./figures/activation.png', dpi=300)
plt.show()
Image(filename='./images/13_05.png', width=700)
3 Copy the unzipped files to a directory ./mnist
import os
import struct
import numpy as np
def load_mnist(path, kind='train'):
"""Load MNIST data from `path`"""
labels_path = os.path.join(path,
'%s-labels-idx1-ubyte'
% kind)
images_path = os.path.join(path,
'%s-images-idx3-ubyte'
% kind)
with open(labels_path, 'rb') as lbpath:
magic, n = struct.unpack('>II',
lbpath.read(8))
labels = np.fromfile(lbpath,
dtype=np.uint8)
with open(images_path, 'rb') as imgpath:
magic, num, rows, cols = struct.unpack(">IIII",
imgpath.read(16))
images = np.fromfile(imgpath,
dtype=np.uint8).reshape(len(labels), 784)
return images, labels
X_train, y_train = load_mnist('mnist', kind='train')
print('Rows: %d, columns: %d' % (X_train.shape[0], X_train.shape[1]))
Rows: 60000, columns: 784
X_test, y_test = load_mnist('mnist', kind='t10k')
print('Rows: %d, columns: %d' % (X_test.shape[0], X_test.shape[1]))
Rows: 10000, columns: 784
Once you have Theano installed, Keras can be installed via
pip install Keras
In order to run the following code via GPU, you can execute the Python script that was placed in this directory via
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python mnist_keras_mlp.py
import theano
theano.config.floatX = 'float32'
X_train = X_train.astype(theano.config.floatX)
X_test = X_test.astype(theano.config.floatX)
One-hot encoding of the class variable:
from keras.utils import np_utils
print('First 3 labels: ', y_train[:3])
y_train_ohe = np_utils.to_categorical(y_train)
print('\nFirst 3 labels (one-hot):\n', y_train_ohe[:3])
First 3 labels: [5 0 4] First 3 labels (one-hot): [[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] [ 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]]
from keras.models import Sequential
from keras.layers.core import Dense
from keras.optimizers import SGD
np.random.seed(1)
model = Sequential()
model.add(Dense(input_dim=X_train.shape[1],
output_dim=50,
init='uniform',
activation='tanh'))
model.add(Dense(input_dim=50,
output_dim=50,
init='uniform',
activation='tanh'))
model.add(Dense(input_dim=50,
output_dim=y_train_ohe.shape[1],
init='uniform',
activation='softmax'))
sgd = SGD(lr=0.001, decay=1e-7, momentum=.9)
model.compile(loss='categorical_crossentropy', optimizer=sgd)
model.fit(X_train, y_train_ohe,
nb_epoch=50,
batch_size=300,
verbose=1,
validation_split=0.1,
show_accuracy=True)
Train on 54000 samples, validate on 6000 samples Epoch 0 54000/54000 [==============================] - 1s - loss: 2.2290 - acc: 0.3592 - val_loss: 2.1094 - val_acc: 0.5342 Epoch 1 54000/54000 [==============================] - 1s - loss: 1.8850 - acc: 0.5279 - val_loss: 1.6098 - val_acc: 0.5617 Epoch 2 54000/54000 [==============================] - 1s - loss: 1.3903 - acc: 0.5884 - val_loss: 1.1666 - val_acc: 0.6707 Epoch 3 54000/54000 [==============================] - 1s - loss: 1.0592 - acc: 0.6936 - val_loss: 0.8961 - val_acc: 0.7615 Epoch 4 54000/54000 [==============================] - 1s - loss: 0.8528 - acc: 0.7666 - val_loss: 0.7288 - val_acc: 0.8290 Epoch 5 54000/54000 [==============================] - 1s - loss: 0.7187 - acc: 0.8191 - val_loss: 0.6122 - val_acc: 0.8603 Epoch 6 54000/54000 [==============================] - 1s - loss: 0.6278 - acc: 0.8426 - val_loss: 0.5347 - val_acc: 0.8762 Epoch 7 54000/54000 [==============================] - 1s - loss: 0.5592 - acc: 0.8621 - val_loss: 0.4707 - val_acc: 0.8920 Epoch 8 54000/54000 [==============================] - 1s - loss: 0.4978 - acc: 0.8751 - val_loss: 0.4288 - val_acc: 0.9033 Epoch 9 54000/54000 [==============================] - 1s - loss: 0.4583 - acc: 0.8847 - val_loss: 0.3935 - val_acc: 0.9035 Epoch 10 54000/54000 [==============================] - 1s - loss: 0.4213 - acc: 0.8911 - val_loss: 0.3553 - val_acc: 0.9088 Epoch 11 54000/54000 [==============================] - 1s - loss: 0.3972 - acc: 0.8955 - val_loss: 0.3405 - val_acc: 0.9083 Epoch 12 54000/54000 [==============================] - 1s - loss: 0.3740 - acc: 0.9022 - val_loss: 0.3251 - val_acc: 0.9170 Epoch 13 54000/54000 [==============================] - 1s - loss: 0.3611 - acc: 0.9030 - val_loss: 0.3032 - val_acc: 0.9183 Epoch 14 54000/54000 [==============================] - 1s - loss: 0.3479 - acc: 0.9064 - val_loss: 0.2972 - val_acc: 0.9248 Epoch 15 54000/54000 [==============================] - 1s - loss: 0.3309 - acc: 0.9099 - val_loss: 0.2778 - val_acc: 0.9250 Epoch 16 54000/54000 [==============================] - 1s - loss: 0.3264 - acc: 0.9103 - val_loss: 0.2838 - val_acc: 0.9208 Epoch 17 54000/54000 [==============================] - 1s - loss: 0.3136 - acc: 0.9136 - val_loss: 0.2689 - val_acc: 0.9223 Epoch 18 54000/54000 [==============================] - 1s - loss: 0.3031 - acc: 0.9156 - val_loss: 0.2634 - val_acc: 0.9313 Epoch 19 54000/54000 [==============================] - 1s - loss: 0.2988 - acc: 0.9169 - val_loss: 0.2579 - val_acc: 0.9288 Epoch 20 54000/54000 [==============================] - 1s - loss: 0.2909 - acc: 0.9180 - val_loss: 0.2494 - val_acc: 0.9310 Epoch 21 54000/54000 [==============================] - 1s - loss: 0.2848 - acc: 0.9202 - val_loss: 0.2478 - val_acc: 0.9307 Epoch 22 54000/54000 [==============================] - 1s - loss: 0.2804 - acc: 0.9194 - val_loss: 0.2423 - val_acc: 0.9343 Epoch 23 54000/54000 [==============================] - 1s - loss: 0.2728 - acc: 0.9235 - val_loss: 0.2387 - val_acc: 0.9327 Epoch 24 54000/54000 [==============================] - 1s - loss: 0.2673 - acc: 0.9241 - val_loss: 0.2265 - val_acc: 0.9385 Epoch 25 54000/54000 [==============================] - 1s - loss: 0.2611 - acc: 0.9253 - val_loss: 0.2270 - val_acc: 0.9347 Epoch 26 54000/54000 [==============================] - 1s - loss: 0.2676 - acc: 0.9225 - val_loss: 0.2210 - val_acc: 0.9367 Epoch 27 54000/54000 [==============================] - 1s - loss: 0.2528 - acc: 0.9261 - val_loss: 0.2241 - val_acc: 0.9373 Epoch 28 54000/54000 [==============================] - 1s - loss: 0.2511 - acc: 0.9264 - val_loss: 0.2170 - val_acc: 0.9403 Epoch 29 54000/54000 [==============================] - 1s - loss: 0.2433 - acc: 0.9293 - val_loss: 0.2165 - val_acc: 0.9412 Epoch 30 54000/54000 [==============================] - 1s - loss: 0.2465 - acc: 0.9279 - val_loss: 0.2135 - val_acc: 0.9367 Epoch 31 54000/54000 [==============================] - 1s - loss: 0.2383 - acc: 0.9306 - val_loss: 0.2138 - val_acc: 0.9427 Epoch 32 54000/54000 [==============================] - 1s - loss: 0.2349 - acc: 0.9310 - val_loss: 0.2066 - val_acc: 0.9423 Epoch 33 54000/54000 [==============================] - 1s - loss: 0.2301 - acc: 0.9334 - val_loss: 0.2054 - val_acc: 0.9440 Epoch 34 54000/54000 [==============================] - 1s - loss: 0.2371 - acc: 0.9317 - val_loss: 0.1991 - val_acc: 0.9480 Epoch 35 54000/54000 [==============================] - 1s - loss: 0.2256 - acc: 0.9352 - val_loss: 0.1982 - val_acc: 0.9450 Epoch 36 54000/54000 [==============================] - 1s - loss: 0.2313 - acc: 0.9323 - val_loss: 0.2092 - val_acc: 0.9403 Epoch 37 54000/54000 [==============================] - 1s - loss: 0.2230 - acc: 0.9341 - val_loss: 0.1993 - val_acc: 0.9445 Epoch 38 54000/54000 [==============================] - 1s - loss: 0.2261 - acc: 0.9336 - val_loss: 0.1891 - val_acc: 0.9463 Epoch 39 54000/54000 [==============================] - 1s - loss: 0.2166 - acc: 0.9369 - val_loss: 0.1943 - val_acc: 0.9452 Epoch 40 54000/54000 [==============================] - 1s - loss: 0.2128 - acc: 0.9370 - val_loss: 0.1952 - val_acc: 0.9435 Epoch 41 54000/54000 [==============================] - 1s - loss: 0.2200 - acc: 0.9351 - val_loss: 0.1918 - val_acc: 0.9468 Epoch 42 54000/54000 [==============================] - 2s - loss: 0.2107 - acc: 0.9383 - val_loss: 0.1831 - val_acc: 0.9483 Epoch 43 54000/54000 [==============================] - 1s - loss: 0.2020 - acc: 0.9411 - val_loss: 0.1906 - val_acc: 0.9443 Epoch 44 54000/54000 [==============================] - 1s - loss: 0.2082 - acc: 0.9388 - val_loss: 0.1838 - val_acc: 0.9457 Epoch 45 54000/54000 [==============================] - 1s - loss: 0.2048 - acc: 0.9402 - val_loss: 0.1817 - val_acc: 0.9488 Epoch 46 54000/54000 [==============================] - 1s - loss: 0.2012 - acc: 0.9417 - val_loss: 0.1876 - val_acc: 0.9480 Epoch 47 54000/54000 [==============================] - 1s - loss: 0.1996 - acc: 0.9423 - val_loss: 0.1792 - val_acc: 0.9502 Epoch 48 54000/54000 [==============================] - 1s - loss: 0.1921 - acc: 0.9430 - val_loss: 0.1791 - val_acc: 0.9505 Epoch 49 54000/54000 [==============================] - 1s - loss: 0.1907 - acc: 0.9432 - val_loss: 0.1749 - val_acc: 0.9482
<keras.callbacks.History at 0x10df582e8>
y_train_pred = model.predict_classes(X_train, verbose=0)
print('First 3 predictions: ', y_train_pred[:3])
First 3 predictions: [5 0 4]
train_acc = np.sum(y_train == y_train_pred, axis=0) / X_train.shape[0]
print('Training accuracy: %.2f%%' % (train_acc * 100))
Training accuracy: 94.51%
y_test_pred = model.predict_classes(X_test, verbose=0)
test_acc = np.sum(y_test == y_test_pred, axis=0) / X_test.shape[0]
print('Test accuracy: %.2f%%' % (test_acc * 100))
Test accuracy: 94.39%
...