AlphaGo Zero needs no human training at all.

AlphaGo Zero is the latest variant of AlphaGo. It has soundly beaten its predecessors, and all with just three days’ training. There was a number of changes from AlphaGo Fan (and from AlphaGo Lee, which followed AlphaGo Fan). AlphaGo Zero was simpler in many respects and accordingly used much less computing power. The main reason was that the policy and value networks has been combined into one neural network. Moreover, the tree search functionality has been simplified.

Most surprising, though, was the fact that AlphaGo Fan’s supervised learning training stage has been dropped entirely. Instead, AlphaGo Zero used reinforcement learning in millions of games of self-play from scratch. The one aspect of AlphaGo Fan that was recognisably human, aside from its design, was the supervised learning training stage. AlphaGo Fan needed to learn from humans in order to orient itself towards promising game strategies. It could thereby escape the unmanageably large number of combinations of moves inherent in Go in its search for the best next move. AlphaGo Zero needs no human training input at all.

Link to paper:

You may also like to browse other AI papers: