Reinforcement Learning

Math, Software and Application

Athena Scientific

Athena Scientific is a small publisher specializing in textbooks written by the professors at the Massachusetts Institute of Technology and used in their courses.

http://www.athenasc.com/ordering.html

Special discount: Order directly from Athena Scientific electronically, by email, by mail, or by fax, three or more different titles (i.e., ISBN numbers) in a single order, and you will receive an automatic discount of 10% from the list prices.

Neuro-Dynamic Programming, Dimitri Bertsekas, John N. Tsitsiklis. Publisher: Athena Scientific; 1 edition (May 1, 1996). ISBN: 1-886529-10-8 Publication: September 1996, 512 pages, hardcover.
Reinforcement Learning and Optimal Control, Dimitri Bertsekas. Publisher: Athena Scientific. ISBN: 978-1-886529-39-7 Publication: 2019, 388 pages, hardcover.
Stochastic Optimal Control: The Discrete-Time Case, Dimitri Bertsekas and Steven E. Shreve. Publisher: Athena Scientific. ISBN: 1-886529-03-5 Publication: 1996, 330 pages, softcover.
Dynamic Programming and Optimal Control, Dimitri Bertsekas. Publisher: Athena Scientific; ISBNs: 1-886529-43-4 (Vol. I, 4th Edition), 1-886529-44-2 (Vol. II, 4th Edition), 1-886529-08-6 (Two-Volume Set, i.e., Vol. I, 4th ed. and Vol. II, 4th edition) Vol. I, 4TH EDITION, 2017, 576 pages, hardcover Vol. II, 4TH EDITION: APPROXIMATE DYNAMIC PROGRAMMING 2012, 712 pages, hardcover.

Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto. ISBN: 978-0-262-19398-6. 2nd edition 2018.
Reinforcement Learning with Soft State Aggregation, Satinder P. Singh, Tommi Jaakkola, Micheal I. Jordan, MIT.
Policy Gradient Methods for Reinforcement Learning with Function Approximation, Richard S. Sutton, David McAllester, Satinder Singh, Yishay Mansour AT&T Labs – Research, 180 Park Avenue, Florham Park, NJ 07932.
Actor-Critic Algorithms, Vijay R. Konda, John N. Tsitsitklis, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA, 02139.
Hierarchical Actor-Critic, Andrew Levy¹ , Robert Platt² , Kate Saenko¹ , ¹Department of Computer Science, Boston University, Boston, MA, USA, ²College of Information and Computer Science, Northeastern University, Boston, MA, USA.
Hierarchical Policy Gradient Algorithms, Mohammad Ghavamzadeh, Sridhar Mahadevan, Department of Computer Science, University of Massachusetts Amherst, Amherst, MA 01003-4610, USA. 20th International Conference on Machine Learning (ICML-2003), Washington DC, 2003.
Decentralized Stabilization for a Class of Continuous-Time Nonlinear Interconnected Systems Using Online Learning Optimal Approach, Derong Liu, Fellow, IEEE, Ding Wang, and Hongliang Li. IEEE Transactions on Neural Networks and Learning Systems, Vol. 25, No. 2, February 2014.
Neural-network-based decentralized control of continuous-time nonlinear interconnected systems with unknown dynamics, Derong Liu, Chao Li, Hongliang Li, Ding Wang, Hongwen Ma, Neurocomputing 165 90-98 2015.
Reinforcement Learning is Direct Adaptive Optimal Control, Richard S. Sutton, Andrew G. Barto, and Ronald J. Williams, IEEE Control Systems, April 1992.
Decentralized Optimal Control of Distributed Interdependent Automata With Priority Structure, Olaf Stursberg, Member, IEEE, and Christian Hillmann, IEEE Transaction on Automation Science and Engineering, Vol. 14, No. 2, April 2017.
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, Tejas D Kulkarni, DeepMind, London, Karthik R. Narasimhan, CSAIL, MIT, Ardavan Saeedi, CSAIL, MIT, Joshua B. Tenenbaum, BCS, MIT. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.
Meta Learning Shared Hierarchies, Kevin Frans, Henry M. Gunn High School, work done as an intern at OpenAI, Jonathan Ho, Xin Chen, Pieter Abbeel, UC Berkeley, Department of Electrical Engineering and Computer Science, John Schulman, OpenAI. ICLR 2018.
Actor-critic Algorithm for Hierarchical Markov Decision Processes, Shalabh Bhatnagar, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India, J. Ranjan Panigrahi, SoftJin Technologies Private Limited, India. 2005.
Anlysis II: Metric Spaces, Continuous functions on metric spaces, Uniform convergence. Terence Tao, UCLA.
Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations. Dimitri P. Bertsekas, MIT
Hierarchical Apprenticeship Learning, with Application to Quadruped Locomotion, J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng, Department of Computer Science, Stanford University.
The Asymptotic Convergence-Rate of Q-learning, Cs. Szepesvari, Research Group on Artificial Intelligence, “Jozsef Attila” University, Szeged, Aradi vrt. tere 1, Hungary, H-6720. 1998.
Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Run Time, Mengdi Wang, Department of Operations Research and Financial Engineering, Princeton University, 2017.
Solving H-horizon, Stationary Markov Decision Problems In Time Proportional To Log(H), Paul Tseng, Laboratory for Information and Decision Systems, MIT. Operations Reseserch Letters 9 (1990) 287-297.
Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms, Michael Kearns and Satinder Singh, AT&T Labs, 180 Park Avenue, Florham Park, NJ 07932.

RL other useful reference