Saturday, June 20, 2015

Where/how AI is beating our problem solving ability...

Demis Hassabis
Major advances in the fast-moving world of artificial intelligence (AI) promise to change what roles "we" play in the future.  A major player in this is a British chess wunderkind, highly-skilled in gaming as well as computational neuroscience - Demis Hassabis.

Demis became a chess master at 13, began designing games @ 17 and was co-designer and lead programmer on the classic Theme Park which sold several million copies, got him a Golden Joystick Award and inspired a whole genre of sim management games.

Demis then got a Double First from the computer laboratory @ Cambridge and ran several companies before getting a Ph.D. @ University College London in cognitive neuroscience and artificial intelligence.  

His combination of skills led to important cognitive neuroscience breakthroughs, centered on damage to the hippocampus memory center and how the amnesia that resulted made folk unable to imagine themselves in new experiences.  This established a critical link between the constructive process of imagination and the reconstructive process of episodic memory recall.

Hassabis winning @ poker
This gave Hassabis an insight into scene construction, the generation and online maintenance of a complex and coherent scene, as a process underlying both memory recall and imagination.  His "Deconstructing Episodic memory with construction" w/E. A. Maquire was one of the top 10 scientific breakthroughs of the year in Science.

Demis was also the "best all-around games player in the world", having won the world games championship a record 5 times before "retiring" in 2003.   He cashed out at the World Series of Poker six times including in the Main Event.

Golden Joystick award
(you gamers know
you want one)

Hassabis then founded DeepMind Technologies whose goal is to "solve intelligence" which Google bought for $600MM; he now is Google's V.P. of Engineering for general AI projects.

He was elected a Fellow of the Royal Society of Arts in 2009, received a major award from them in 2014, and is a visiting scientist @ Harvard and MIT.

Hassabis and his collaborators recently published in Nature, a leading journal, "Human-Level Control Through Deep Reinforcement Learning" which shows just how far AI has reached.

There are three principal components to these "human-level control" algorithms, based on how our brains work: a)  reinforcement learning, b) deep convolutional networks and c) selective memory recall.

dog learning
Reinforcement learning comes from behaviorism, a controversial psychology from B. F. Skinner and others on "operant conditioning".  A reward (or a punishment) is coupled to a desired (or undesired) behavior asap after the behavior.   This works on dogs often, kids sometimes, us/partners sometimes, and cats, well...it's complicated.

In Breakout (yes, Demis knows it's an old game) the score is the reward, and the algorithm makes a small adjustment based on what/how it did, over, and over and over...

This works best with little time and few intervening actions between critical behavior and score, like in Breakout.  With a complex behavior/game, like chess, w/many intervening moves, the benefit of any particular move on the ultimate goal will be more difficult to discern.
cat learning

Deep convolutional networks to supervise this reinforcement learning process, i.e. predict what the joystick should try next for the best results, were borrowed from the Nobel Prize-winning work of Torsten Wiesel and David H. Hubel.

Wiesel and Hubel showed how the brain processes visual images, generates edge, motion, stereoscopic depth, and color, and decides what to use, and how much weight to give it to make a useful picture.   As we are "visual-priority" primates, about 30% of our brain is dedicated to this complex process.

Christof Koch
Allen Institute for
Brain Science
How the DeepMind team used this in developing their Q-learning and how deep convolutional networks operate, are to quote wikipedia "...too technical for most readers to understand".   The Hassabis, et al. paper describes it, and a recent SA Mind article by Christof Koch gives an overview.

Selective memory recall, or "hippocampal replay" is modeled after how this memory center operates.  When patterns of nerve cells associated with a particular experience reoccur on replay, they are at a faster pace.

This is the familiar experience of going to someplace new and feeling it took a long time, but then on returning, it appears to take much less time.  This makes possible quick consideration of earlier experiences when updating the evaluation function, i.e. where we should put the game paddle on the next try.


When Hassabis  and his/Google's DeepMind AI team combine these three brain-based elements they have great success with many Atari 2600 games, as shown at right in this simplified graphic from SA Mind.

The "human tester" red line shows what a professional human games tester could do.  On the 49 games evaluated, the algorithm was "super-human" on 29.


The games at which it was less successful were those with a longer "time to reward after behavior", like Ms. Pac-Man, as the "gobbling ghost" may be a dozen moves in the future.

Learning and beating Breakout
in 12 seconds

To see the algorithm in action, watch this short (smartphone) video of  learning how to play Breakout @ a 2014 technical conference with, after 12 seconds, a clever approach that had folk cheering.  (graphic from Hassabis' paper at left).

This same approach is now being applied to more complex, contemporary, first-person shooter or strategy games like Doom, Halo or StarCraft.

With the rapid evolution of these learning algorithms based on, and in many cases exceeding, our own brain functions, when/what functions will "we" be needed for?

As Christof Koch says in his summary to the SA Mind article, "Perhaps these learning algorithms are the dark clouds on humanity's horizon.  Perhaps they will be our final invention."

Another question is whether these learning algorithms apply to decisions in "our lives".   Is our daily life more like Ms. Pac-Man where we can anticipate a dozen moves in the future, or more like Breakout where the world is too massively-complex to anticipate "a dozen moves in the future".   If our decisions are more like Breakout, the algorithms are already far ahead.








No comments:

Post a Comment