Master Atari, Go, Chess and Shogi by planning with a learned model



[ad_1]

  • 1.

    Campbell, M., Hoane, AJ Jr and Hsu, F.-h. Deep blue. Artif. Intell. 134, 57–83 (2002).

    Google Scholar article

  • 2.

    Silver, D. et al. Master the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

    Article ADS CAS Google Scholar

  • 3.

    Bellemare, MG, Naddaf, Y., Veness, J. & Bowling, M. The Arcade Learning Environment: An Assessment Platform for General Agents. J. Artif. Intell. Res. 47, 253-279 (2013).

    Google Scholar article

  • 4.

    Machado, M. et al. Revisiting the Arcade Learning Environment: Assessment Protocols and Open Problems for General Agents. J. Artif. Intell. Res. 61, 523–562 (2018).

    MathSciNet Google Scholar article

  • 5.

    Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and independent play. Science 362, 1140-1144 (2018).

    ADS MathSciNet Article CAS Google Scholar

  • 6.

    Schaeffer, J. et al. A world championship caliber ladies’ program. Artif. Intell. 53, 273-289 (1992).

    Google Scholar article

  • seven.

    Brown, N. & Sandholm, T. Superhuman AI for No-Limit Heads-Up Poker: Libratus beats the top pros. Science 359, 418–424 (2018).

    ADS MathSciNet Article CAS Google Scholar

  • 8.

    Moravčík, M. et al. Deepstack: expert-level artificial intelligence in no-limit heads-up poker. Science 356, 508–513 (2017).

    ADS MathSciNet Google Scholar article

  • 9.

    Vlahavas, I. and Refanidis, I. Planning and planning Technical report (EETN, 2013).

  • ten.

    Segler, MH, Preuss, M. & Waller, MP Planning of chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).

    Article ADS CAS Google Scholar

  • 11.

    Sutton, RS and Barto, AG Reinforcement learning: an introduction 2nd edition (MIT Press, 2018).

  • 12.

    Deisenroth, M. & Rasmussen, C. PILCO: An Effective Models and Data-Based Policy Research Approach. In Proc. 28th International Conference on Machine Learning, ICML 2011 465–472 (Omnipress, 2011).

  • 13.

    Heess, N. et al. Learning of continuous control policies by stochastic value gradients. In NIPS’15: Proc. 28th International Conference on Neural Information Processing Systems Flight. 2 (edited by Cortes, C. et al.) 2944-2952 (MIT Press, 2015).

  • 14.

    Levine, S. & Abbeel, P. Learning neural network policies with guided policy research under unknown dynamics. Adv. Neural Inf. Process. Syst. 27, 1071-1079 (2014).

    Google Scholar

  • 15.

    Hafner, D. et al. Learning latent dynamics for planning from pixels. Preprint at https://arxiv.org/abs/1811.04551 (2018).

  • 16.

    Kaiser, L. et al. Model-based reinforcement learning for atari. Preprint at https://arxiv.org/abs/1903.00374 (2019).

  • 17.

    Buesing, L. et al. Learning and interrogation of rapid generative models for reinforcement learning. Preprint at https://arxiv.org/abs/1802.03006 (2018).

  • 18.

    Espeholt, L. et al. IMPALA: Scalable in-depth distributed RL with importance-weighted actor-learner architectures. In Proc. International Conference on Machine Learning, ICML Flight. 80 (eds Dy, J. & Krause, A.) 1407–1416 (2018).

  • 19.

    Kapturowski, S., Ostrovski, G., Dabney, W., Quan, J. & Munos, R. Replay of recurring experience in distributed reinforcement learning. In International conference on learning representations (2019).

  • 20.

    Horgan, D. et al. Distributed hierarchical experience replay. In International conference on learning representations (2018).

  • 21.

    Puterman, ML Markov decision process: discrete stochastic dynamic programming 1st edition (John Wiley & Sons, 1994).

  • 22.

    Coulom, R. Efficient Selectivity and Save Operators in Monte-Carlo Tree Search. In International Conference on Computers and Games 72–83 (Springer, 2006).

  • 23.

    Wahlström, N., Schön, TB & Deisenroth, MP From pixels to couples: policy learning with deep dynamic models. Preprint at http://arxiv.org/abs/1502.02251 (2015).

  • 24.

    Watter, M., Springenberg, JT, Boedecker, J. & Riedmiller, M. Embed to control: a locally linear latent dynamics model for control from raw images. In NIPS’15: Proc. 28th International Conference on Neural Information Processing Systems Flight. 2 (edited by Cortes, C. et al.) 2746-2754 (MIT Press, 2015).

  • 25.

    Ha, D. & Schmidhuber, J. Recurrent global models facilitate policy change. In NIPS’18: Proc. 32nd International Conference on Neural Information Processing Systems (eds. Bengio, S. et al.) 2455–2467 (Curran Associates, 2018).

  • 26.

    Gelada, C., Kumar, S., Buckman, J., Nachum, O. & Bellemare, MG DeepMDP: learning of continuous latent space models for learning representations. Proc. 36th International Conference on Machine Learning: Volume 97 of Proc. Machine learning research (edited by Chaudhuri, K. and Salakhutdinov, R.) 2170-2179 (PMLR, 2019).

  • 27.

    van Hasselt, H., Hessel, M. & Aslanides, J. When to use parametric models in reinforcement learning? Preprint at https://arxiv.org/abs/1906.05243 (2019).

  • 28.

    Tamar, A., Wu, Y., Thomas, G., Levine, S. & Abbeel, P. Value iteration networks. Adv. Neural Inf. Process. Syst. 29, 2154-2162 (2016).

    Google Scholar

  • 29.

    Silver, D. et al. The predictron: end-to-end learning and planning. In Proc. 34th International Conference on Machine Learning Flight. 70 (edited by Precup, D. & Teh, YW) 3191–3199 (JMLR, 2017).

  • 30.

    Farahmand, AM, Barreto, A. & Nikovski, D. Value-sensitive loss function for model-based reinforcement learning. In Proc. 20th International Conference on Artificial Intelligence and Statistics: Volume 54 of Proc. Machine learning research (eds Singh, A. & Zhu, J) 1486-1494 (PMLR, 2017).

  • 31.

    Farahmand, A. Iterative Value-Sensitive Model Learning. Adv. Neural Inf. Process. Syst. 31, 9090–9101 (2018).

    Google Scholar

  • 32.

    Farquhar, G., Rocktaeschel, T., Igl, M. & Whiteson, S. TreeQN and ATreeC: Differentiable tree planning for deep reinforcement learning. In International conference on learning representations (2018).

  • 33.

    Oh, J., Singh, S. & Lee, H. Value Prediction Network. Adv. Neural Inf. Process. Syst. 30, 6118-6128 (2017).

    Google Scholar

  • 34.

    Krizhevsky, A., Sutskever, I. & Hinton, GE Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097-1105 (2012).

    Google Scholar

  • 35.

    He, K., Zhang, X., Ren, S. & Sun, J. Identity mappings in deep residual networks. In 14th European Conference on Computer Vision 630–645 (2016).

  • 36.

    Hessel, M. et al. Rainbow: combination of improvements in deep reinforcement learning. In Thirty-second AAIA Conference on Artificial Intelligence (2018).

  • 37.

    Schmitt, S., Hessel, M. & Simonyan, K. Non-political actor-critic with replay of shared experience. Preprint at https://arxiv.org/abs/1909.11583 (2019).

  • 38.

    Azizzadenesheli, K. et al. Surprising negative results for generative antagonist tree research. Preprint at http://arxiv.org/abs/1806.05780 (2018).

  • 39.

    Mnih, V. et al. Human level control through deep reinforcement learning. Nature 518, 529-533 (2015).

    Article ADS CAS Google Scholar

  • 40.

    Open, AI OpenAI five. OpenAI https://blog.openai.com/openai-five/ (2018).

  • 41.

    Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).

    Article ADS CAS Google Scholar

  • 42.

    Jaderberg, M. et al. Reinforcement learning with unsupervised auxiliary tasks. Preprint at https://arxiv.org/abs/1611.05397 (2016).

  • 43.

    Silver, D. et al. Master the game of Go without human knowledge. Nature 550, 354–359 (2017).

    Article ADS CAS Google Scholar

  • 44.

    Kocsis, L. & Szepesvári, C. Bandit based in Monte-Carlo planning. Tendon European Conference on Machine Learning 282-293 (Springer, 2006).

  • 45.

    Rosin, CD Multi-armed Bandits with episode context. Ann. Math. Artif. Intell. 61, 203–230 (2011).

    MathSciNet Google Scholar article

  • 46.

    Schadd, MP, Winands, MH, Van Den Herik, HJ, Chaslot, GM-B. & Uiterwijk, JW Solo Monte-Carlo tree search. In International Conference on Computers and Games 1–12 (Springer, 2008).

  • 47.

    Pohlen, T. et al. Observe and Look Further: Get Consistent Performance on Atari. Preprint at https://arxiv.org/abs/1805.11593 (2018).

  • 48.

    Schaul, T., Quan, J., Antonoglou, I. & Silver, D. Priority experience replay. In International conference on learning representations (2016).

  • 49.

    Cloud TPU. Google cloud https://cloud.google.com/tpu/ (2019).

  • 50.

    Coulom, R. Ranking the whole story: A Bayesian scoring system for players of time varying strength. In International Conference on Computers and Games 113-124 (2008).

  • 51.

    Nair, A. et al. Massively Parallel Methods for Deep Reinforcement Learning. Preprint at https://arxiv.org/abs/1507.04296 (2015).

  • 52.

    Lanctot, M. et al. OpenSpiel: a framework for reinforcement learning in games. Preprint at http://arxiv.org/abs/1908.09453 (2019).

  • [ad_2]

    Source link