Theano Machine Learning on a GPU on Windows 10

Dino Konstantopoulos

4.77/5 (11 votes)

Nov 29, 2016

CPOL

22 min read

47708

Running Theano with an Nvidia 1070 GPU on Windows 10, with CUDA 8 and Visual Studio 2015

Introduction

There are 4 main Machine Learning (ML) frameworks out there: The University of Montreal's Theano, Facebook's Torch, Google's TensorFlow, and Berkeley's Caffe (Microsoft's Cognitive Toolkit, CNTK, is a bit more specialized). There are others, but these are the four that most University Machine Learning research is conducted on. The problem with Torch, arguably the most used of all ML frameworks, is that it really prefers to run on Linux and you need to learn a new language (LUA). I already speack C#, Java, Python, and Javascript, so my brain is already kinda full. The problem with TensorFlow is that you have to learn a new graph-based language, and Google being Google, they tend to not like Windows platforms very much, especially now that most elementary schools are busy upgrading their Chromebooks OS to Windows 10. I don't have much experience on Caffe, but there is nothing not to love about Theano. First, it's essentially a graph-language but it's in Python, and I already speak Python. Second, it's cross platform, with researchers running it on Linux, OSX, and Windows. Third, there are other frameworks, like Keras, built on top of Theano with the goal to simplify building neural networks with it. Whenever i see "simple", my heart lights up. There is however one huge issue with Theano, and that is the amount of junk advice out there about how to install and run it with a GPU. Well-intentioned advice, mind you, but hopelessly outdated and convoluted. So I set out on a mini-odyssey to make it run on Windows 10, with the latest Visual Studio (2105 CE), the latest CUDA toolkit from Nvidia (CUDA 8), and the latest everything-related. So i bought myself an ASUS laptop equipped with an Nvidia 1070 GPU, and started installing, prototyping, breaking, fixing, breaking again, and fixing again, till I got everything working. I need to say that we are experiencing a planet alignment phenomenon right now, because everything seems to be working fine with the latest from Microsoft and Nvidia, and that is a rare phenomenon. So I'm writing this article to get you ramped up to Machine Learning on Windows with the minimum amount of pain. I am sure that Theano works equally well on Linux and OSX, but Windows is where I have the most experience in, so Windows it is for this article. Along the way, we'll also learn a lot about GPUs, so this is a good introduction for GPU neophytes, along with GPU-equipped laptop recommendations. We'll also learn how to test Theano with Keras, a very simple deep learning framework built on top of Theano. This is a love story of software meeting hardware.

Background

I'll assume you have a computer running Windows 10, equipped with an Nvidia GPU. Asus, MSI, and AlienWare build some great laptops along this line. That is the starting block. The steps outlined in this article will get your computer up to speed for GPU-assisted Machine Learning with Theano on Windows 10. Another option is to spin up a GPU-equipped Amazon Machine Instance (AMI). Amazon offers an EC2 instance that provides access to the GPU for General Purpose GPU computing (GPGPU). This instance is named g2.2xlarge instance, costs about $0.65 per hour, and includes 4GB of memory and 1,526 CUDA cores on a K520 graphics card. That is a reasonable option. If you want to upgrade, you can upgrade to the g2.8xlarge instance in order to obtain four K520 GPUs for a grand total of 16GB of memory and for about $2 more per hour.

Python Distribution and your "theano" environment

We start by installing Python. I recognize you have some options here, but for Machine Learning, Anaconda Python ought to be your top choice. It is an enterprise-ready Python distribution, with packages for Big Data processing, predictive analytics, and scientific computing. Download it from https://www.continuum.io/anaconda. Realize that it is a big distribution, close to a third of a Gig. There is an option to download a barebones version called miniconda, but if you don't have enough hard disk space on your computer, better invest in a new hard drive or a USB3 external, because Machine Learning entails Big Data.

Test drive anaconda by running steps outlined in http://conda.pydata.org/docs/test-drive.html

You now have two options for installing python packages. Your first option should always be:

conda install <packagename>

When that fails, because anaconda does not recognize the package, use:

pip install <packagename>

We will see lots of mixes of conda and pip installs in these instructions, get ready for this.

Next step is to create a Python environment custom-tailored for Theano ML. That way, if you need to do any other kind of Python work on your machine and need to change the configuration, you can switch to another environment in order to leave your environment dedicated to ML with Theano in a pristine condition. Open up a command console with Administrative rights, and type:

conda create --name theano python=3.4

This will create your theano envionment, based on python 3.4 (not the latest, 3.5), because that is the version of python that Theano requires. Now, activate your theano environment like so:

activate theano

Now your command console prompt should be preceded by "(theano) ". Make sure all your subsequent installs are done on a command console within the theano environment. Learn how to list avaiable environment, and switch from one to another. That's all well explained in the anaconda test-drive page. All prerequisite packages, the Theano package, and the Keras package, will be installed on the "theano" environment in the next sections.

An Introduction to CUDA and Nvidia GPUs

A Graphics Processing Unit, or GPU, is a specialized chip designed to accelerate image creation in a frame buffer which is then projeccted onto your display. Its highly parallel structure makes it very efficient for any algorithm where data is processed in parallel and in large blocks. From 10 up to 5 years ago, most Big Data processing was done with distributed disk-based frameworks like Hadoop. Now, much of that and especially regression analysis, is performed with a GPU. As i said, Asus, MSI, and AlienWare make some of the best GPU-equipped laptops, but you can also buy desktops with GPUs, even though it is generally much cheaper (and fun) to build them yourself. Type dxdiag in your Cortana box, to find out if your computer has a GPU. This is what i get when i do this on mine:

As you can see, my laptop is equipped with Nvidia's GeForce GTX 1070 GPU. In the last few years, I have been quite satisfied with Asus laptops, and the Republic Of Gamers (ROG) Series from Asus has gotten good reviews.

So I set out to buy an Asus laptop at the sweet spot of GPU performance and portability (i.e. weight). The Asus ROG GL502VS fit the bill, and I can vouch for it so far.

Nvidia's Compute Unified Device Architecture (CUDA) was the first parallel computing platform and API model for GPUs, allowing software developers to use a GPU for general purpose processing. CUDA can be accessed directly as an API fpr Nvidia GPUs, but also supports programming frameworks like OpenACC (http://www.openacc.org), a set of compiler directives to specify loops and regions of code in standard C/C++ to be offloaded from a host CPU to an attached GPU, and OpenCL (https://www.khronos.org/opencl/), a framework for writing programs that execute across heterogeneous platforms like CPUs, GPUs, and FPGAs. As you can surmise, C/C++ is the main language for GPU programming, but there is also PyCUDA, a set of Python bindings that allow you to access the CUDA API straight from Python, and PyOpenCL, which is essentially PyCUDA for OpenCL. We don't have to worry about all of that, though, because we are going to leverage Theano in Python to do all our machine learning, delegating resposnbility to Theano for CUDA API integration.

Other GPU manufacturers, like Intel and AMD, have their own SDKs. Nvidia however was the first to produce a chip capable of programmable shading (the GeForce 3), and it was used in the Xbox console and competed equitably with a custom vector DSP on Playstation 2. By 2002, Nvidia's ATI Radeon 9700 (a.k.a. R300) was the first Direct3D (9.0) accelerator. The CUDA SDK, specifically targetting NVidia GPUs was introduced in 2007, and OpenCL based on CUDA followed suit with a general-purpose API designed to work across CPUs, GPUs, and DSPs. Nvidia's Kepler line of GPUs was followed by the Maxwell Line, and then by the Pascal line, the current generation of graphics cards, released this year. The GEForce 10 series of cards, a very popular line, are under this generation of graphics cards. It is widely recognized that the Nivdia 1080 with 7 billion transistors, 8GB of GDDR5X memory and 2560 CUDA cores (a performance score of sorts), although not the most powerful, is the best graphics card ever built from a price/performance standpoint. My Asus laptop has an Nvidia 1070 graphics card, with 8GB of GDDR5X as well, but reduced to 1920 CUDA cores by shedding one of the four Graphics Processing Clusters (GPCs) on the 1080. This is a comparison of the two GPUs, under their shrouds:

As you can see, pretty similar! There is some difference in their cooling circuitry, as the GTX 1080 is more next-generation, with a vapor chamber for cooling, while the GTX 1070 uses an aluminum heat sink with embedded copper heat pipes. In my opinion the GTX1070 is a great compromise for mobile General Purpose GPU computing. I think of it as my Porsche 911 (if i had one, of course) of computing: I can still drive it everywhere (i.e. lug it in my backpack) even though it is not as much of a thoroughbred as, say, a Porsche 911 GT3, which would require me to rent time on a racetrack (i.e. the machine would be too heavy to lug around in a backpack and would be confined to my de-facto laboratory). Prices are coming down. Right now, the top of the line in performance, the Nvidia Titan X hovers at $1,200, the GTX 1080 at $700, and the GTX 1070 at $500. These prices will come down in 2017, and luggable 1080-equipped laptops are due to appear in 2017. That would be a great time to buy.

Setting up CUDA on Windows 10 with all prerequisites

The first step is to install Visual Studio 2015 Community Edition (CE). That will take some time, and it will eat up a sizable chunk of your hard disk. My laptop has a small SSD hard drive, and a much bigger mechanical hard drive. I installed Visual Studio on the mechanical hard drive, reserving my SSD for pure Machine Learning packages (and Anaconda). Make sure your WiFi has reserve capacity. Don't attempt this in a coffee shop. I also recommend you install the mingw compiler suite, you never know when you might need it, and it's an easy install with either a conda install or a pip install mingw.

If you have an Nvidia GPU, the next step is to install two libraries:

The Nvidia CUDA SDK and toolkit: a development environment for building GPU-accelerated applications, including a compiler specifically designed to Nvidia GPUs, and now, with the latest version (8.0), working with Visual Studio 2015 CE. That is a hige download, too, at 1.2 Gigs! https://developer.nvidia.com/cuda-downloads Make sure you pick version 8, 64k, for Windows.
Nvidia's cuDNN library: a GPU-accelerated library of primitives for the deep neural networks we can author with Theano. It provides highly tuned implementations for standard ML routines such as forward and backward convolution. It requires membership in the accelerated computing developer pgoram (free). Nvidia advertizes an increase in network training speeds by upwards of 44%, although I cannot say I have witnessed this type of speedup. https://developer.nvidia.com/cudnn

The next step is to install CUDA's Visual Studio 2015 bindings: https://developer.nvidia.com/nvidia-nsight-visual-studio-edition

For reference, the bible is at http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-microsoft-windows/index.html#abstract

The download of Nsight Visual Studio requires you to join Nvidia's developer program (free): https://developer.nvidia.com/gameworksdownload#?dn=nsight-visual-studio-edition-5-2-0

Note that the CUDA 8 SDK for Windows downloads stuff both to your Program Files and Program Files (x86) folders, and also to your ProgramData folder, which, is invisible on your C:\ drive unless made visible by you through folder options and show hidden files/folders (you can also see the folder in a command console). That is an important note because the CUDA SDK downloads all sample programs in that folder.

Cuda 8 also install the GeForce driver version 369.30, which is not the latest version!

The latest version is 375.95, so to download that driver, you need to get it from http://www.nvidia.com/Download/index.asp Note that if you are happy with the resolution of your computer, you may want to download the driver package, but not upgrade your current display driver (I upgraded mine, which demoted the factory resolution on my ASUS ROG, bad for gaming, but good for readability and Machine Learning).

Now, let's do some testing: Open C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\0_Simple\matrixMul_vs2015.sln in Visual Studio 2015. Compile in debug mode, go to a command line at C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\bin\win64\Debug and run matrixMul.exe

You should pass the test.

Now, note that cuDNN has specific installation instructions per platform. For Windows, it says you need to add the cuDNN install path to your PATH envionment variable, and various other mods to your Visual Studio projects for Include and Library folders. Make a note of these (http://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/#axzz4Qa4i6r8L). Since we're focusing on Theano, it is simpler to actually take the cuDNN binaries and copy them over to the CUDA SDK folders:

Copy cudnn64_5.dll to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin
Copy cudnn.h to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include
Copy cudnn.lib to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64

Next, we need to install the Windows 10 SDK from https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk There must be a reason why that download does not ship with Windows by default nor installs with Visual Studio. Maybe someone can tell me.

Next we install the Microsoft Visual C++ Compiler for Python 2.7. Yup, 2.7, even though we are going to use Python 3.4 on Theano. That is because they are used in different layers in the Theano-to-GPU toolchain. Download from https://www.microsoft.com/en-us/download/details.aspx?id=44266

And now now we are finally ready to modify the Nvidia CUDA profile at C:\Program Files\NVIDIA GPU Computing Toolkit\v8.0\bin\nvcc.profile. This is the new content, specialized for Windows 10, CUDA 8, and Visual Studio 2015:

TOP              = $(_HERE_)/..

NVVMIR_LIBRARY_DIR = $(TOP)/nvvm/libdevice

PATH            += $(TOP)/open64/bin;$(TOP)/nvvm/bin;$(_HERE_);$(TOP)/lib;

INCLUDES        +=  "-I$(TOP)/include" "-I$(TOP)/include/cudart" "-IC:/Program Files (x86)/Microsoft Visual Studio 12.0/VC/include" "-IC:\Program Files (x86)\Microsoft SDKs\Windows\v7.1A\Include" $(_SPACE_)

LIBRARIES        =+ $(_SPACE_) "/LIBPATH:$(TOP)/lib/$(_WIN_PLATFORM_)" "/LIBPATH:C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64" "/LIBPATH:C:/Program Files (x86)/Common Files/Microsoft/Visual C++ for Python/9.0/VC/lib/amd64" "/LIBPATH:C:\Program Files (x86)\Microsoft SDKs\Windows\v7.1A\Lib\x64"

CUDAFE_FLAGS    +=
PTXAS_FLAGS     +=

And with that, we should be done with Visual Studio, CUDA, cuDNN, and GPU setup (we should, but we'll find out soon enough not..). Onto Theano for now.

Setting up Theano

Theano is one of the great Machine Learning frameworks, together with Facebooks' Torch, Google's TensorFlow, U Berkeley's Caffe, and Microsoft's CNTK. Keras is an awesome deep learning framework, too, but it's more of a wrapper over Theano, simplifying Theano neural network programming for us. Theano is brought to us by Yoshua Bengio and his ML group at Universite de Montreal (http://deeplearning.net/software/theano/install.htm). Why Canada? Because their equivalent of our National Science Foundation was more forward thinking than our NSF as it extended research grants to artificial neural network (ANN) researchers in the 80s and 90s when our NSF cut them off. And so they moved to Canada. Now they're back, running research at Google, Facebook, etc. Theano is really a graph-based language over Python, where symbolic mathematical computations are represented as graphs, built to compile and run efficiently on both CPU and GPU architectures. So even if you skip all sectons above, you will still be able to use Theano (and Keras) to build neural networks, albeit not very large ones since network training performance on a CPU is abysmal compared to that of the GPU. By the way, the origin of the name is Theano of Croton, a Pythagorean philsopher and wife of Pythagoras, that famous mathematician and philisopher that gave us the triangle equality and "no one is free who has not obtained the empire of himself. No man is fee who cannot command himself".

A few prerequisites before installing Theano (make sure you are in your "theano" Anaconda environment, tailored to Python 3.4). In a command line:

conda install matplotlib
conda install numpy
conda install six
conda install scipy
pip install atlas

matplotlib is a plotting library, numpy a package for mathematical numerical recipes, scipy a library of scientific tools, six a package with tools for wrapping over differences between Python2 and Python 3, and atlas is a build tool.

Next we can pip install theano, but i would recommend installing the bleeding edge of Theano straight from Github. For this, we need to install git source control from https://git-scm.com/book/en/v2/Getting-Started-Installing-Git Accept all default install options. And then to install the latest Theano (Theano-0.9.0.dev4 as of this writing):

pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git

Always make sure your command console is in Administrative mode when installing python packages, and in the "theano" environment. You can list the version of theano installed with

pip show theano

And list all installed packages in your "theano" environment with

pip list

To configure Theano to use Nvidia's CUDA SDK: http://deeplearning.net/software/theano/install_windows.html#gpu-windows

In short, you will need to create in your C:\Users\<UserName> folder a configuration file called .theanorc with the following contents:

[global]
device = gpu
floatX = float32

[nvcc]
flags = --use-local-env --cl-version=2015

You cannot create a .anything file with Windows Explorer at all, so call it something else in Windows Explorer, then rename it in a command console as .theanorc

And you'd think you're home free, but when you run your first recommended test program from http://deeplearning.net, listed here for your convenience, with python test.py on the command line:

import numpy as np
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

You find that you are not using the GPU at all! After a lot of hunting down errors and finding missing libraries (many of which you already installed in the previous section), here are the source code (!) modifications you need to perform, followed by the final contents of the .theanorc file. This is delicate surgery, and should be performed by experienced developers. Especially with the python source code file, where indentation of each line is key.

First, navigate to C:\Program Files (x86)\Microsoft SDKs\Windows\v7.1A\Include\ObjBase.h and edit the file with Notepad++ or another serious editor (notepad won't do, it messes up carriage returns), and right before

extern "C++"
{
	template<typename T> void** IID_PPV_ARGS_Helper(T** pp)
	{
		static_case<IUnknown*>(*pp);  //make sure evryone derives from Iunknown
		retrn reinterpret_case<void**>(pp);
	}
}

add the following line:

//~dk
typedef interface IUnknown IUnknown;

(the //~dk moniker are my initials and it's just meant for me to be able to locate my hacks in case I need to revert back, you can add any placeholder in lieu). This required modification is likely related to the WIN32_LEAN_AND_MEAN directive. Ok, first hack out of the way. Onto the second one: Navigate to C:\Program Files\Anaconda3\envs\theano\Lib\site-packages\theano\sandbox\cuda. Edit file __init__.py with Notepad++ or a serious editor. Right after the line:

params = ["-l", "cudnn", "-I" + os.path.dirname(__file__)]

modify the following lines:

            if config.dnn.include_path:
                params.append("-I" + config.dnn.include_path)
            if config.dnn.library_path:
                params.append("-L" + config.dnn.library_path)
            if config.nvcc.compiler_bindir:
                params.extend(['--compiler-bindir',
                               config.nvcc.compiler_bindir])

            # Do not run here the test program. It would run on the
            # default gpu, not the one selected by the user. If mixed
            # GPU are installed or if the GPUs are configured in
            # exclusive mode, this cause bad detection.
            comp, out, err = nvcc_compiler.NVCC_compiler.try_flags(
                flag_list=params, preambule=preambule, body=body,
                try_run=False, output=True)

to:

            if config.dnn.include_path:
                params.append("-I" + config.dnn.include_path)
            if config.dnn.library_path:
                params.append("-L" + config.dnn.library_path)
                #~dk
                #bug: config.dnn.library_path = C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v8.0\\lib64
                params.append("-LC:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v8.0\\lib\\x64")
                #~dk this was the trick to enable cudnn:
                params.append("-LC:/Users/Dino/AppData/Local/Programs/Common/Microsoft/Visual C++ for Python/9.0/VC/lib/amd64")
            if config.nvcc.compiler_bindir:
                params.extend(['--compiler-bindir',
                               config.nvcc.compiler_bindir])
            #~dk      
            print(params)
            # Do not run here the test program. It would run on the
            # default gpu, not the one selected by the user. If mixed
            # GPU are installed or if the GPUs are configured in
            # exclusive mode, this cause bad detection.
            comp, out, err = nvcc_compiler.NVCC_compiler.try_flags(
                flag_list=params, preambule=preambule, body=body,
                try_run=False, output=True)

Keep in mind that this is a python file, so all spaces preceding each line should be individual white space, no tab characters at all, and no line should wrap around as it may appear above. Also, please replace "Dino" in the C:/Users/Dino/AppData path with your own username. As you can see, we are adding additional folders to the library include list to include the right binaries in the CUDA SDK and Microsoft's Visual C++ Tools for Python. These folders should be harvested from the nvcc.profile configuration, but they weren't for some reason on my machine. It is possible that with the latest nvcc.profile listed in the previous section, all is well, but these source modifications worked for me, so i'm sticking with them!

And now for the final contents of your .theanorc file in your C:\Users\<Username> folder:

[global]
device = gpu
floatX = float32
cuda.disable_gcc_cudnn_check=True
optimizer_including=cudnn

[cuda]
root = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0

[nvcc]
flags=-D_FORCE_INLINES
fastmath=True
compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64
optimizer_including=cudnn

[dnn]
enabled = True

[lib]
cnmem=0.8

Let's test Theano now, by running the following test program: Save the following lines in a test.py file in a new folder:

from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

Before you run python test.py from a command line within your "theano" anaconda environment, modify your .theanorc configuration with device = cpu in the [global] section of that file. You should get:

and now modify with device = gpu in [globals], and you should see:

So you are using the GPU! And going from a 15 second computation to a third of a second computation is a great jump in performance indeed! Additionally, this output should be preceded by:

Specifically, you should see the lines:

using gpu device 0: GeForce GTX 1070 (CNMem is enabled with initial size 80.0% of memory, cuDNN 5105)
... UserWarning: Your cuDNN version is more recent that the one Theano officially supports. If you see
any problems, try updating Theano or downgrading cuDNN to version 5.

I don't have a problem with bleeding edge versions, if everything works ok!

A few words of advice for General Purpose GPU computing: Only computations with float32 data types can be GPU accelerated at this point, aggregation operations over rows and columns of tensors can actually be slower on a GPU versus a CPU, copying a lot of data to and from the GPU may invalidate the performance advantage of computing on a GPU, and always use shared float32 variables to store frequently-accessed data (shared()).

Testing Theano and building network with Keras

Keras is a great wrapper over Theano in that it allows us to build neural networks with but a few lines of code. Let's install it first. In a command line with Administrative rights:

pip install keras

All documentation and source code examples can be found at http://keras.io Keras means horn in greek, and is said to be a reference to a literary image from ancient Greek and Latin literature, first found in the Odyssey, where dream spirits (Oneiroi, singular Oneiros) are divided between those who deceive with false visions and arrive to Earth through a gate of ivory (blue pill in the Matrix®), and those who announce a future that will come to pass and arrive through a gate of horn (red pill in the Matrix®). Anyway, the bottom line here is that greek litterature rocks! Francois Chollet is the author of Keras.

You build a network by instantiating a Sequential model, and add layers that correspond respectively to a layer of neurons with number of neurons and number of inputs and outputs as variables, and another layer as its activation function, like so for a layer of 32 neurons with 784 inputs, with a Rectified Linear Unit (ReLU) as its activation function:

model = Sequential()
model.add(Dense(32, input dim=784))
model.add(Activation('relu'))

You then compile a model after you've added all the layers like so:

model.compile(loss='mean_squared_error', optimizer='sgd')

where mean squared error between the label and the network's output is the metric to minimize, using the stochastic gradient descent (gsd) optimizer. In Machine Learning, a "label" is the dependent data when it is known, as it is with the training dataset, and the output is what is computed by the neural network given its current state with all its weights. The utlimate goal is for the output to be very close to the label, for each data item ("observation") in the dataset. When that is the case, we say that our network is trained, the weights are finalized, and we can now use the network to predict dependent variables given any set of independent variables we desire.

You then train your network on numpy arrays X_train (independent variables) and Y_train (dependent variables), for 2000 epochs, like so:

model.fit(X_train,
          Y_train,
          nb_epoch=2000,
          verbose=0)

You can then measure accuracy on earmarked test data, and prediction dependent variables for synthetic independent variables.

Let's test our Theano installation with Keras. We'll use Keras to compute a single neuron's weights before and after training, on a dataset that consists of a Rectified Linear Unit-looking curve (ReLU). The ReLU is also the simplest (and one of the most effective) activation functions, so training a node that is ReLU-activated on a dataset that looks like a ReLU sounds like child's play! Here is the code for printing the weights (two weights: the single input to the neuron, and the single output from the neuron) both before and after training. Save it in file one.py in a folder on your machine.

import numpy as np
import matplotlib.pyplot as plt
import math

n_points = 200
x = np.linspace(0, 2, n_points)
y = np.array([0] * int(n_points / 2) + list(x[:int(n_points / 2)])) * 2

plt.figure(figsize=(5, 2))
plt.plot(x, y, linewidth=2)
plt.title('ridiculously simple data')
plt.xlabel('a')
plt.ylabel('b')
plt.show()

from keras.models import Sequential
from keras.layers.core import Dense, Activation
import numpy as np

np.random.seed(0)
model = Sequential()
model.add(Dense(output_dim=1, input_dim=1, init="normal"))
model.add(Activation("relu"))
model.compile(loss='mean_squared_error', optimizer='sgd')

# print initial weigths
weights = model.layers[0].get_weights()
w0 = weights[0][0][0]
w1 = weights[1][0]
print('neural net initialized with weigths w0: {w0:.2f}, w1: {w1:.2f}'.format(**locals()))
print('done.')

You can now run python one.py in a command console, and observe the data being modelled. Then kill the plot, initial weights should display, training happens on the GPU, and the final weights are displayed, with CNMem and cuDNN enabled and active,

Conclusion

Keras and Theano are a great 1-2 punch for ramping up to Deep Learning, and CUDA is a great SDK for leveraging the parallel power of a GPU to accelerate computations. Theano is also a great cross-platform library, with documented success on Windows, Linux, and OSX. Today, we are experiencing an unprecedented alignment in that the latest compilers from Microsoft work with the latest compilers from Nvidia to simplify installation and configuration. Still, it's not for the faint of heart, as documentaion and advice on the Web is hopelessly outdated. That is why I set out on this mini odyssey to make it easy for you to build netrworks with Theano in order to model datasets and interpolate with predictions.

One last word about deep learning. It's been said that it scares some people, because the dynamics of the prediction taking place is not entirely clear: Indeed, there is no clear procedural (imperative) algorithm that a human can follow to understand how a network makes a prediction. This has prompted some to declare that machines are the single greatest danger to humanity. Apart from the fact that this is a cry that some part of humanity has already uttered against another part of humanity with different traits, my opinion is that although deep learning is a very effective modelling and regresion tool, it is one that is a bit overhyped. It is nothing but an algebraic technique to approximate a dataset in feature space by smooting out the noise and capturing essential features. Multi-dimensional spline approximation accomplishes the same goal, and its geometric technique of following gradients along the curve to model is very reminiscent of the basic technique behind the backpropagation algorithm, the most used algorithm for modifying a neural network's weights in order to reduce mean squared error between label and computed output. I've read on the Web that neural networks are an example of a tool that wags the analytics, and I would agree with that. Still, it's a powerful tool, and it's good to have it one's toolset in pristine condition and ludicrous-performance mode, on a GPU-equipped laptop on Windows. I expect more and more laptops to ship equipped with a GPU, as General Purpose GPU computing (GPGPU) becomes more commonplace.

History

Version 1.0