Demis Hassabis |
Demis became a chess master at 13, began designing games @ 17 and was co-designer and lead programmer on the classic Theme Park which sold several million copies, got him a Golden Joystick Award and inspired a whole genre of sim management games.
Demis then got a Double First from the computer laboratory @ Cambridge and ran several companies before getting a Ph.D. @ University College London in cognitive neuroscience and artificial intelligence.
His combination of skills led to important cognitive neuroscience breakthroughs, centered on damage to the hippocampus memory center and how the amnesia that resulted made folk unable to imagine themselves in new experiences. This established a critical link between the constructive process of imagination and the reconstructive process of episodic memory recall.
Hassabis winning @ poker |
Demis was also the "best all-around games player in the world", having won the world games championship a record 5 times before "retiring" in 2003. He cashed out at the World Series of Poker six times including in the Main Event.
Golden Joystick award (you gamers know you want one) |
He was elected a Fellow of the Royal Society of Arts in 2009, received a major award from them in 2014, and is a visiting scientist @ Harvard and MIT.
Hassabis and his collaborators recently published in Nature, a leading journal, "Human-Level Control Through Deep Reinforcement Learning" which shows just how far AI has reached.
There are three principal components to these "human-level control" algorithms, based on how our brains work: a) reinforcement learning, b) deep convolutional networks and c) selective memory recall.
dog learning |
In Breakout (yes, Demis knows it's an old game) the score is the reward, and the algorithm makes a small adjustment based on what/how it did, over, and over and over...
This works best with little time and few intervening actions between critical behavior and score, like in Breakout. With a complex behavior/game, like chess, w/many intervening moves, the benefit of any particular move on the ultimate goal will be more difficult to discern.
cat learning |
Deep convolutional networks to supervise this reinforcement learning process, i.e. predict what the joystick should try next for the best results, were borrowed from the Nobel Prize-winning work of Torsten Wiesel and David H. Hubel.
Wiesel and Hubel showed how the brain processes visual images, generates edge, motion, stereoscopic depth, and color, and decides what to use, and how much weight to give it to make a useful picture. As we are "visual-priority" primates, about 30% of our brain is dedicated to this complex process.
Christof Koch Allen Institute for Brain Science |
Selective memory recall, or "hippocampal replay" is modeled after how this memory center operates. When patterns of nerve cells associated with a particular experience reoccur on replay, they are at a faster pace.
This is the familiar experience of going to someplace new and feeling it took a long time, but then on returning, it appears to take much less time. This makes possible quick consideration of earlier experiences when updating the evaluation function, i.e. where we should put the game paddle on the next try.
When Hassabis and his/Google's DeepMind AI team combine these three brain-based elements they have great success with many Atari 2600 games, as shown at right in this simplified graphic from SA Mind.
The "human tester" red line shows what a professional human games tester could do. On the 49 games evaluated, the algorithm was "super-human" on 29.
The games at which it was less successful were those with a longer "time to reward after behavior", like Ms. Pac-Man, as the "gobbling ghost" may be a dozen moves in the future.
Learning and beating Breakout in 12 seconds |
This same approach is now being applied to more complex, contemporary, first-person shooter or strategy games like Doom, Halo or StarCraft.
With the rapid evolution of these learning algorithms based on, and in many cases exceeding, our own brain functions, when/what functions will "we" be needed for?
As Christof Koch says in his summary to the SA Mind article, "Perhaps these learning algorithms are the dark clouds on humanity's horizon. Perhaps they will be our final invention."
Another question is whether these learning algorithms apply to decisions in "our lives". Is our daily life more like Ms. Pac-Man where we can anticipate a dozen moves in the future, or more like Breakout where the world is too massively-complex to anticipate "a dozen moves in the future". If our decisions are more like Breakout, the algorithms are already far ahead.
No comments:
Post a Comment