New algorithm makes up for intermittent reinforcement.
In most games, a machine learning algorithm based on reinforcement learning has something like a steadily changing scoreline to sustain learning. There are some games, however, where the reinforcement occurs intermittently. Such games include Montezuma’s Revenge and Pitfall!. Where does the algorithm go when, at stages in the game, such reinforcement is absent? If learning stops, the algorithm stops developing. As a result, it plateaus. One way to handle this is to introduce a random search function to the algorithm to ensure that it keeps searching anyway. Uber’s researchers have gone one stage further. They have introduced a memory function as well. This takes the algorithm back to areas in the search space that did not yield a reward the first time but might do so a second or third time. It reinforces the point that human brains have developed a number of techniques to explore the world around them. AI will doubtless need to adopt a similar approach.
Link to article: https://www.technologyreview.com/s/612470/uber-has-cracked-two-classic-80s-video-games-by-giving-an-ai-algorithm-a-new-type-of-memory/
You may also like to browse other AI articles: https://www.thesentientrobot.com/category/ai/ai-articles/