(Appeared in Washington Square News.)
It’s an early morning of the just-arrived winter. People I can see on the street from my window wear heavy coats, but it’s unclear how cold it is. I can open the window and let my built-in skin sensors grab an approximate measurement, but I realize a much more accurate value can be obtained by pressing a button. “Siri, what’s the temperature outside?,” I ask, with a brazilian accent most americans I talk with think is russian. “Brr! It’s 32 degrees outside,” answers the piece of rectangular glass I hold. It’s a female voice, with an accent of her own. Artificial. That’s probably how I’d describe it.
The application, acronym for Speech Integration and Recognition Interface, encountered a wave of sarcastic, philosophical, flirtatious, and mundane questions, since it was made natively available to certain iOS devices in October 2011. Countless jokes featuring Siri made their way through the nodes of the social-media graph, and books about her witty personality have been printed. But if you could take Siri in a time trip back to when your grandmother was 10 (fear not, the time travel paradox involves your grandfather), she would definitely fulfill Clarke’s third law and qualify your talking device as “magic”. Perhaps she would even call Siri “intelligent.”
We’ll skip the fact that Siri is intelligent, indeed, according to the definition she grabs from Wolfram Alpha when asked, for there’s no consensus about what it means to be intelligent (nor for what “meaning” means, as a matter of fact; but enough about metalinguistics). In the following, I’ll put Siri in context with recent developments in artificial intelligence. But first, come back from your time travel and book a trip into your brain. This one is easier: simply think of your grandmother.
By doing so, a specific area in the back of your brain, responsible for face recognition, activates. Moreover, it has been conjectured that a single neuron “fires” when you think of her. It’s the “grandmother neuron” and, as the hypothesis goes, there’s one for any particular object you are able to identify. While the existence of such particular neurons is just a conjecture, at least two things about the architecture of the visual cortex have been figured. One, functional specialization: there are areas designated to recognize specific categories of objects (such as faces and places). Two, hierarchical processing: visual information is analyzed in layers, with the level of abstraction increasing as the signal travels deeper in the architecture.
Computational implementations of so-called “deep learning” algorithms have been around for decades, but they were usually outperformed by “shallow” architectures (architectures with only one or two layers). Since 2006, new techniques have been discovered for training deep architectures, and substantial improvements happened in algorithms aimed to tasks that are easily performed by humans, like recognizing objects, voice, faces, and handwritten digits.
Deep learning is used in Siri for speech recognition, and in Google’s Street View for the identification of specific addresses. In 2011, an implementation running in 1,000 computers was able to identify objects from 20,000 different categories with record (despite poor) accuracy. In 2012, a deep learning algorithm won a competition for designing drug agents using a database of molecules and their chemical structures.
These achievements relighted public interest in Artificial Intelligence (well, not as public as in vampire literature, but definitely among computer scientists). Now, while the improvements are substantial, specially when compared to what happened (or didn’t happen) in previous decades, AI still remains in the future. As professor Andrew Ng pointed out, “my gut feeling is that we still don’t quite have the right algorithm yet.”
The reason is that these algorithms are still, in general, severely outperformed by humans. You can recognize your grandmother, for instance, just with a side look, by the way she walks. Computers can barely detect and recognize frontal faces. Similarly for recognizing songs, identifying objects in pictures and movies, and a whole range of other tasks.
Wether comparison with human performance is a good criterion for intelligence is debatable. But I’ll leave that discussion for part 2.