Tag Archives: machine learning

TensorFlow 101

There’s a sort of “gold rush” between Machine Learning toolkits to grab the attention of developers. Caffe, Torch, Theano, and now TensorFlow, are only some of the competitors. Which one to choose?

Hard to know for sure. There are the usuall technical trade-offs (see https://github.com/zer0n/deepframeworks), but for the user, besides technical capabilities, often times the choice comes down to which one has the best documentation (i.e., which one is easier to use).

So far, given the power of it’s sponsor, TensorFlow seems to be the one with a more serious approach to documentation. Still, the MNIST and CNN tutorials could be simpler.

Introducing: TensorFlow 101.

This project has two main files (nn_shallow.py and cnn_deep.py), and two sample datasets (subsets of the MNIST and CIFAR10 databases). The Python routines are modified from the “MNIST For ML Beginners” and “Deep MNIST for Experts” (from https://www.tensorflow.org/versions/0.6.0/tutorials/index.html).

The main diference is that nn_shallow.py and cnn_deep.py contain code to read and build train/test sets from regular image files, and therefore can be more easily deployed to other databases (which, ultimately, is the goal of the user). Notice that the folders MNIST and CIFAR10 are organized in subfolders Train and Test, and in these subfolders each class has a separate folder (0, 1, etc).

Therefore, one simple way to deploy nn_shallow.py and cnn_deep.py to your own custom database is to organize your database in the same hierarchy as MNIST and CIFAR10 in this project, and modify the variable “path” on the .py routines to point to your dataset. Notice that your dataset doesn’t have to have 10 classes; however, all images in the provided sample datasets are grayscale and have size 28×28, hence non-trivial modifications to the code should be performed in order to deal with other types of images.

See also: Udacity Deep Learning Course.

The State and Future of Artificial Intelligence, Part 1

(Appeared in Washington Square News.)

It’s an early morning of the just-arrived winter. People I can see on the street from my window wear heavy coats, but it’s unclear how cold it is. I can open the window and let my built-in skin sensors grab an approximate measurement, but I realize a much more accurate value can be obtained by pressing a button. “Siri, what’s the temperature outside?,” I ask, with a brazilian accent most americans I talk with think is russian. “Brr! It’s 32 degrees outside,” answers the piece of rectangular glass I hold. It’s a female voice, with an accent of her own. Artificial. That’s probably how I’d describe it.

The application, acronym for Speech Integration and Recognition Interface, encountered a wave of sarcastic, philosophical, flirtatious, and mundane questions, since it was made natively available to certain iOS devices in October 2011. Countless jokes featuring Siri made their way through the nodes of the social-media graph, and books about her witty personality have been printed. But if you could take Siri in a time trip back to when your grandmother was 10 (fear not, the time travel paradox involves your grandfather), she would definitely fulfill Clarke’s third law and qualify your talking device as “magic”. Perhaps she would even call Siri “intelligent.”

We’ll skip the fact that Siri is intelligent, indeed, according to the definition she grabs from Wolfram Alpha when asked, for there’s no consensus about what it means to be intelligent (nor for what “meaning” means, as a matter of fact; but enough about metalinguistics). In the following, I’ll put Siri in context with recent developments in artificial intelligence. But first, come back from your time travel and book a trip into your brain. This one is easier: simply think of your grandmother.

By doing so, a specific area in the back of your brain, responsible for face recognition, activates. Moreover, it has been conjectured that a single neuron “fires” when you think of her. It’s the “grandmother neuron” and, as the hypothesis goes, there’s one for any particular object you are able to identify. While the existence of such particular neurons is just a conjecture, at least two things about the architecture of the visual cortex have been figured. One, functional specialization: there are areas designated to recognize specific categories of objects (such as faces and places). Two, hierarchical processing: visual information is analyzed in layers, with the level of abstraction increasing as the signal travels deeper in the architecture.

Computational implementations of so-called “deep learning” algorithms have been around for decades, but they were usually outperformed by “shallow” architectures (architectures with only one or two layers). Since 2006, new techniques have been discovered for training deep architectures, and substantial improvements happened in algorithms aimed to tasks that are easily performed by humans, like recognizing objects, voice, faces, and handwritten digits.

Deep learning is used in Siri for speech recognition, and in Google’s Street View for the identification of specific addresses. In 2011, an implementation running in 1,000 computers was able to identify objects from 20,000 different categories with record (despite poor) accuracy. In 2012, a deep learning algorithm won a competition for designing drug agents using a database of molecules and their chemical structures.

These achievements relighted public interest in Artificial Intelligence (well, not as public as in vampire literature, but definitely among computer scientists). Now, while the improvements are substantial, specially when compared to what happened (or didn’t happen) in previous decades, AI still remains in the future. As professor Andrew Ng pointed out, “my gut feeling is that we still don’t quite have the right algorithm yet.”

The reason is that these algorithms are still, in general, severely outperformed by humans. You can recognize your grandmother, for instance, just with a side look, by the way she walks. Computers can barely detect and recognize frontal faces. Similarly for recognizing songs, identifying objects in pictures and movies, and a whole range of other tasks.

Wether comparison with human performance is a good criterion for intelligence is debatable. But I’ll leave that discussion for part 2.