Master Atari, Go, Chess and Shogi by planning with a learned model

[ad_1]

Campbell, M., Hoane, AJ Jr and Hsu, F.-h. Deep blue. Artif. Intell. 134, 57–83 (2002).

Google Scholar article

Silver, D. et al. Master the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

Article ADS CAS Google Scholar

Bellemare, MG, Naddaf, Y., Veness, J. & Bowling, M. The Arcade Learning Environment: An Assessment Platform for General Agents. J. Artif. Intell. Res. 47, 253-279 (2013).

Google Scholar article

Machado, M. et al. Revisiting the Arcade Learning Environment: Assessment Protocols and Open Problems for General Agents. J. Artif. Intell. Res. 61, 523–562 (2018).

MathSciNet Google Scholar article

Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and independent play. Science 362, 1140-1144 (2018).

ADS MathSciNet Article CAS Google Scholar

Schaeffer, J. et al. A world championship caliber ladies’ program. Artif. Intell. 53, 273-289 (1992).

Google Scholar article

seven.

Brown, N. & Sandholm, T. Superhuman AI for No-Limit Heads-Up Poker: Libratus beats the top pros. Science 359, 418–424 (2018).

ADS MathSciNet Article CAS Google Scholar

Moravčík, M. et al. Deepstack: expert-level artificial intelligence in no-limit heads-up poker. Science 356, 508–513 (2017).

ADS MathSciNet Google Scholar article

Vlahavas, I. and Refanidis, I. Planning and planning Technical report (EETN, 2013).

ten.

Segler, MH, Preuss, M. & Waller, MP Planning of chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).

Article ADS CAS Google Scholar

11.

Sutton, RS and Barto, AG Reinforcement learning: an introduction 2nd edition (MIT Press, 2018).

12.

Deisenroth, M. & Rasmussen, C. PILCO: An Effective Models and Data-Based Policy Research Approach. In Proc. 28th International Conference on Machine Learning, ICML 2011 465–472 (Omnipress, 2011).

13.

Heess, N. et al. Learning of continuous control policies by stochastic value gradients. In NIPS’15: Proc. 28th International Conference on Neural Information Processing Systems Flight. 2 (edited by Cortes, C. et al.) 2944-2952 (MIT Press, 2015).

14.

Levine, S. & Abbeel, P. Learning neural network policies with guided policy research under unknown dynamics. Adv. Neural Inf. Process. Syst. 27, 1071-1079 (2014).

Google Scholar

15.

Hafner, D. et al. Learning latent dynamics for planning from pixels. Preprint at https://arxiv.org/abs/1811.04551 (2018).

16.

Kaiser, L. et al. Model-based reinforcement learning for atari. Preprint at https://arxiv.org/abs/1903.00374 (2019).

17.

Buesing, L. et al. Learning and interrogation of rapid generative models for reinforcement learning. Preprint at https://arxiv.org/abs/1802.03006 (2018).

18.

Espeholt, L. et al. IMPALA: Scalable in-depth distributed RL with importance-weighted actor-learner architectures. In Proc. International Conference on Machine Learning, ICML Flight. 80 (eds Dy, J. & Krause, A.) 1407–1416 (2018).

19.

Kapturowski, S., Ostrovski, G., Dabney, W., Quan, J. & Munos, R. Replay of recurring experience in distributed reinforcement learning. In International conference on learning representations (2019).

20.

Horgan, D. et al. Distributed hierarchical experience replay. In International conference on learning representations (2018).

21.

Puterman, ML Markov decision process: discrete stochastic dynamic programming 1st edition (John Wiley & Sons, 1994).

22.

Coulom, R. Efficient Selectivity and Save Operators in Monte-Carlo Tree Search. In International Conference on Computers and Games 72–83 (Springer, 2006).

23.

Wahlström, N., Schön, TB & Deisenroth, MP From pixels to couples: policy learning with deep dynamic models. Preprint at http://arxiv.org/abs/1502.02251 (2015).

24.

Watter, M., Springenberg, JT, Boedecker, J. & Riedmiller, M. Embed to control: a locally linear latent dynamics model for control from raw images. In NIPS’15: Proc. 28th International Conference on Neural Information Processing Systems Flight. 2 (edited by Cortes, C. et al.) 2746-2754 (MIT Press, 2015).

25.

Ha, D. & Schmidhuber, J. Recurrent global models facilitate policy change. In NIPS’18: Proc. 32nd International Conference on Neural Information Processing Systems (eds. Bengio, S. et al.) 2455–2467 (Curran Associates, 2018).

26.

Gelada, C., Kumar, S., Buckman, J., Nachum, O. & Bellemare, MG DeepMDP: learning of continuous latent space models for learning representations. Proc. 36th International Conference on Machine Learning: Volume 97 of Proc. Machine learning research (edited by Chaudhuri, K. and Salakhutdinov, R.) 2170-2179 (PMLR, 2019).

27.

van Hasselt, H., Hessel, M. & Aslanides, J. When to use parametric models in reinforcement learning? Preprint at https://arxiv.org/abs/1906.05243 (2019).

28.

Tamar, A., Wu, Y., Thomas, G., Levine, S. & Abbeel, P. Value iteration networks. Adv. Neural Inf. Process. Syst. 29, 2154-2162 (2016).

Google Scholar

29.

Silver, D. et al. The predictron: end-to-end learning and planning. In Proc. 34th International Conference on Machine Learning Flight. 70 (edited by Precup, D. & Teh, YW) 3191–3199 (JMLR, 2017).

30.

Farahmand, AM, Barreto, A. & Nikovski, D. Value-sensitive loss function for model-based reinforcement learning. In Proc. 20th International Conference on Artificial Intelligence and Statistics: Volume 54 of Proc. Machine learning research (eds Singh, A. & Zhu, J) 1486-1494 (PMLR, 2017).

31.

Farahmand, A. Iterative Value-Sensitive Model Learning. Adv. Neural Inf. Process. Syst. 31, 9090–9101 (2018).

Google Scholar

32.

Farquhar, G., Rocktaeschel, T., Igl, M. & Whiteson, S. TreeQN and ATreeC: Differentiable tree planning for deep reinforcement learning. In International conference on learning representations (2018).

33.

Oh, J., Singh, S. & Lee, H. Value Prediction Network. Adv. Neural Inf. Process. Syst. 30, 6118-6128 (2017).

Google Scholar

34.

Krizhevsky, A., Sutskever, I. & Hinton, GE Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097-1105 (2012).

Google Scholar

35.

He, K., Zhang, X., Ren, S. & Sun, J. Identity mappings in deep residual networks. In 14th European Conference on Computer Vision 630–645 (2016).

36.

Hessel, M. et al. Rainbow: combination of improvements in deep reinforcement learning. In Thirty-second AAIA Conference on Artificial Intelligence (2018).

37.

Schmitt, S., Hessel, M. & Simonyan, K. Non-political actor-critic with replay of shared experience. Preprint at https://arxiv.org/abs/1909.11583 (2019).

38.

Azizzadenesheli, K. et al. Surprising negative results for generative antagonist tree research. Preprint at http://arxiv.org/abs/1806.05780 (2018).

39.

Mnih, V. et al. Human level control through deep reinforcement learning. Nature 518, 529-533 (2015).

Article ADS CAS Google Scholar

40.

Open, AI OpenAI five. OpenAI https://blog.openai.com/openai-five/ (2018).

41.

Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).

Article ADS CAS Google Scholar

42.

Jaderberg, M. et al. Reinforcement learning with unsupervised auxiliary tasks. Preprint at https://arxiv.org/abs/1611.05397 (2016).

43.

Silver, D. et al. Master the game of Go without human knowledge. Nature 550, 354–359 (2017).

Article ADS CAS Google Scholar

44.

Kocsis, L. & Szepesvári, C. Bandit based in Monte-Carlo planning. Tendon European Conference on Machine Learning 282-293 (Springer, 2006).

45.

Rosin, CD Multi-armed Bandits with episode context. Ann. Math. Artif. Intell. 61, 203–230 (2011).

MathSciNet Google Scholar article

46.

Schadd, MP, Winands, MH, Van Den Herik, HJ, Chaslot, GM-B. & Uiterwijk, JW Solo Monte-Carlo tree search. In International Conference on Computers and Games 1–12 (Springer, 2008).

47.

Pohlen, T. et al. Observe and Look Further: Get Consistent Performance on Atari. Preprint at https://arxiv.org/abs/1805.11593 (2018).

48.

Schaul, T., Quan, J., Antonoglou, I. & Silver, D. Priority experience replay. In International conference on learning representations (2016).

49.

Cloud TPU. Google cloud https://cloud.google.com/tpu/ (2019).

50.

Coulom, R. Ranking the whole story: A Bayesian scoring system for players of time varying strength. In International Conference on Computers and Games 113-124 (2008).

51.

Nair, A. et al. Massively Parallel Methods for Deep Reinforcement Learning. Preprint at https://arxiv.org/abs/1507.04296 (2015).

52.

Lanctot, M. et al. OpenSpiel: a framework for reinforcement learning in games. Preprint at http://arxiv.org/abs/1908.09453 (2019).

[ad_2]

Source link