Software systems could one day analyze everything from blurry war-zone footage to the subtle sarcasm in a written paragraph, thanks to two unassuming scientists who are inspired by biology to make revolutionary strides in intelligent computing.
Yann LeCun and Rob Fergus, both computer science professors at New York University, are the brains behind “Deep Learning,” a program sponsored by Darpa, the Pentagon’s blue-sky research agency. The idea, ultimately, is to develop code that can teach itself to spot objects in a picture, actions in a video, or voices in a crowd. LeCun and Fergus have $2 million and four years to make it happen.
Existing software programs rely heavily on human assistance to identify objects. A user extracts key feature sets, like edge statistics (how many edges an object has, and where they are) and then feeds the data into a running algorithm, which uses the feature sets to recognize the visual input.
“People spend huge amounts of time building these feature sets, figuring out which are better or more accurate, and then refining them,” LeCun told Danger Room. “The question we’re asking is whether we can create computers that automatically learn feature sets from data. The brain can do it, so why not machines?”
The computer systems will be inspired by biology, but not modeled after it. That’s because researchers still aren’t entirely sure how animals are able to turn inputs — an object, a movement, a sound — into usable information. Ten years ago, a study at MIT helped answer the question. Researchers rewired ferret brains, so that the optical nerve fed into the auditory cortex, and vice versa. But the ferrets still saw and heard normally, leading the team to conclude that brain function depends on the signal — not the area.
Brains also display plenty of abstraction when it comes to identifying specific inputs: LeCun was inspired to create his algorithmic layering approach, called “a convolutional network,” by the 1960s research of David Hubel and Torstein Weisel. The two used cats to demonstrate how the brain’s visual cortex relies on abstractions to create complex representations of a given visual input.
In other words, LeCun said, “There’s some sort of learning algorithm within the brain. We just don’t know what it is.”
But the algorithmic talents of the mind, along with its ability to identify visual data by abstraction, will be the key components of the NYU team’s new system. Right now, an algorithm recognizes objects in one of two ways. In one, it is shown some representative examples of what, say, a horse looks like. Then the code tries to match any new creature to the ur-stallion. (That’s called “supervised” learning.) In the other way, the software is shown lots and lots of horses, and it builds its own model of what a horse is supposed to resemble. (That’s “unsupervised” learning.)
What LeCun and Fergus are trying to do is make code that can get it right on a first, unsupervised example — using layer after layer of code to abstract the essential attributes of an object. This first step is to turn an image into numbers: For a 100 x 100 pixel image, the software produces a grid of 10,000 numbers; 9 x 9 “masks” are then applied to that grid, to uncover attributes of the image. The first feature spotted is an object’s edge. (The human brain makes a similar initial pass.)
Several more “masks” follow. The final output? A series of 256 numbers that identifies the input.
The two are only six weeks into the project, but they’ve already got demos up and running.
The Deep Learning algorithm and I had never met, but with a quick shot by a small webcam on LeCun’s laptop, the layers of code captured my features and could immediately distinguish me from other objects and people in LeCun’s office. The same thing happens when LeCun introduces the system to two different coffee mugs — it takes mere seconds for the computer to acquaint itself with each, then distinguish one from the other.
And this is only the beginning. Darpa also wants a system that can spot activities, like running, jumping or getting out of a car. The final version will operate unsupervised, by being programmed to hold itself accountable for errors — and then auto-correct them at each algorithmic layer.
It should also be able to apply the layered algorithmic technique to text. Right now, computer systems can parse sentences to categorize them as positive or negative, based on how often different words appear in the text. By applying layers of analysis, the Deep Learning machine will — LeCun and Fergus hope — spot sarcasm and irony too.
“Ideally, what we’ll come away with is a ‘generic learning box’ that can identify every data cue,” Fergus tells Danger Room.
You need to be a member of 12160 Social Network to add comments!
Join 12160 Social Network