The mobile game Flappy Bird was removed in 2014 at the request of its creator, because it turned out to be too addictive. But IBM has found a way to use it for deep learning research.
Specialists of the company presented this week a study on how machines could learn various skills – including playing Flappy Bird – constantly, improving their performance, rather than stopping, when faced with a very difficult level. This approach is called continuous learning and, despite decades of research, it still remains a challenge, according to ZDNet.
The problem of lifelong learning was formulated in 1987 by Gail Carpenter and Stephen Grossberg, who called it the stability-plasticity dilemma.
Artificial intelligence, they wrote, should be “plastic to learn about important new developments, but should remain stable in response to irrelevant or frequently repeated events.”
In other words, the neural network must be created in such a way as to preserve and expand what is optimized, in each interval of time. Its goal is to minimize intervention, that is, interference with learning, and at the same time maximize the process of future learning, changing priorities based on new information.
To do this, the researchers mixed two elements of optimization of priorities: GEM, based on the development of Facebook specialists in 2017, and the so-called Reptile, created last year by scientists from OpenAI. This algorithm helps to learn new things from the experience of past learning.
The researchers concluded that the capabilities of GEM and Reptile are limited: each algorithm “looks” only in one direction of the time arrow. GEM wants to preserve the past, and Reptile wants to change significance only at the moment of learning a new one.
Instead, symmetry is required when the significance indicator is refined in both directions over time.
Flappy Bird game has become the main test of the capabilities of the new tool. In it, the player must help the bird safely reach its destination, bypassing the obstacles – the pipes. The developers identified each change in the aspect of the game — for example, the height of the pipes — as a new task.
Then the neural network had to extrapolate data from one task to the next in order to maximize the effect of the already studied and processed information.
The authors tested their approach on two different tests and in both cases obtained results that exceeded the capabilities of the GEM and Reptile systems.
IBM and MIT specialists teach the AI to ask for help and to help each other. The collective learning strategy they proposed mimics how people get new information — not only from direct observation, but also from other people.