مروری بر مدل¬های انگیزش درونی یادگیری ماشین
الموضوعات :
سعید جمالی
1
,
سعید ستایشی
2
,
محسن جهانشاهی
3
,
سجاد تقوایی
4
1 - دانشکده فنی مهندسی، دانشگاه آزاد اسلامی واحد تهران مرکزی، تهران، ایران
2 - دانشکده فیزیک و مهندسی انرژی، دانشگاه صنعتی امیرکبیر، تهران، ایران
3 - دانشکده فنی مهندسی، دانشگاه آزاد اسلامی واحد تهران مرکزی، تهران، ایران
4 - دانشکده مهندسی مکانیک، دانشگاه شیراز، شیراز، ایران
الکلمات المفتاحية: علومشناختی, انگیزش درونی, یادگیری ماشین, یادگیری تقویتی, محیطهای خلوت (اسپارس), جستجو,
ملخص المقالة :
امروزه انگیزش، توجه فزایندهای را به خود جلب کرده است؛ زیرا که به موجود زنده و ربات اجازه میدهد در فقدان راهنماییِ انگیزش بیرونی (پاداش از محیط)، دانش و مهارت را بهصورت تجمعی و کاملاً خودمختار اخذ نماید. علاوه بر روانشناسی و علوم اعصاب، این مدلها چشمانداز جدیدی را در هوش مصنوعی ایجاد کردهاند. معماریهای الگوریتمی انگیزش درونی در اکتشاف، امکان اخذ مهارتهای حرکتیِ مؤثری را در مسائل به وجود میآورند. از طرفی بسیاری از مسائل دنیای واقعی دارای چنین خصوصیتی هستند که در بخشهای زیادی از محیط، پاداشی از محیط وجود ندارد. در نتیجه، این موضوع نهتنها از لحاظ نظری اهمیت بسزایی در بهبود الگوریتمهای هوش مصنوعی مخصوصاً در بحث اکتشاف دارد، بلکه به لحاظ کاربردی و عملی نیز میتواند در کاربردهای واقعی و یا نزدیک به واقعیت مورداستفاده قرار گیرد. این مقاله به اهمیت انگیزش درونی پرداخته و نگاه کوتاهی به جایگاه آن در روانشناسی دارد. سپس تحقیقات پیرامون انگیزش درونی در هوش مصنوعی دستهبندیشده و مورد بررسی قرار گرفتهاند. همچنین، روش یادگیری تقویتی بهعنوان رویکردی موفق در ترکیب انگیزش درونی بررسی شده است. در نهایت، به برخی از کاربردهای عملی انگیزش درونی، محدودیتها و تحقیقات آینده اشاره شده است.
[1] M. Begum and F. Karray, "Computational intelligence techniques in bio-inspired robotics," in Design and Control of Intelligent Robotic Systems: Springer, 2009, pp. 1-28, doi: http://dx.doi.org/10.1007/978-3-540-89933-4_1.
[2] Lieto and D. P. Radicioni, "From human to artificial cognition and back: New perspectives on cognitively inspired ai systems," ed: Elsevier, 2016, doi: http://dx.doi.org/10.1016/j.cogsys.2014.11.001.
[3] G. Baldassarre, T. Stafford, M. Mirolli, P. Redgrave, R. M. Ryan, and A. Barto, "Intrinsic motivations and open-ended development in animals, humans, and robots: an overview," Frontiers in psychology, vol. 5, p. 985, 2014, doi: http://dx.doi.org/10.3389/fpsyg.2014.00985.
[4] Cangelosi and M. Schlesinger, Developmental robotics: From babies to robots. MIT Press, 2015, doi: http://dx.doi.org/10.7551/mitpress/9320.001.0001.
[5] K. Merrick, "Value systems for developmental cognitive robotics: A survey," Cognitive Systems Research, vol. 41, pp. 38-55, 2017, doi: http://dx.doi.org/10.1016/j.cogsys.2016.08.001.
[6] M. Asada et al., "Cognitive developmental robotics: A survey," IEEE transactions on autonomous mental development, vol. 1, no. 1, pp. 12-34, 2009, doi: https://dx.doi.org/10.1109/TAMD.2009.2021702.
[7] J. Reeve, Understanding motivation and emotion. John Wiley & Sons, 2014.
[8] E. L. Deci, "Article commentary: on the nature and functions of motivation theories," Psychological Science, vol. 3, no. 3, pp. 167-171, 1992, doi: http://dx.doi.org/10.1111/j.1467-9280.1992.tb00020.x.
[9] R. M. Ryan and E. L. Deci, "Intrinsic and extrinsic motivations: Classic definitions and new directions," Contemporary educational psychology, vol. 25, no. 1, pp. 54-67, 2000, doi: http://dx.doi.org/10.1006/ceps.1999.1020.
[10] C. Darwin, On the origin of species, 1859. Routledge, 2004, doi: http://dx.doi.org/10.9783/9780812200515.
[11] R. S. Woodworth, "Columbia University lectures: Dynamic psychology," 1918, doi: http://dx.doi.org/10.1037/10015-000.
[12] C. L. Hull, "Principles of behavior: An introduction to behavior theory," 1943.
[13] J. W. Atkinson and N. T. Feather, A theory of achievement motivation. Wiley New York, 1966.
[14] L. Festinger, A theory of cognitive dissonance. Stanford university press, 1962, doi: http://dx.doi.org/10.1515/9781503620766.
[15] S. Harter, "Effectance motivation reconsidered. Toward a developmental model," Human development, vol. 21, no. 1, pp. 34-64, 1978, doi: http://dx.doi.org/10.1159/000271574.
[16] M. Csikszentmihalyi, Beyond boredom and anxiety. Jossey-Bass, 2000.
[17] E. A. Locke, "Motivation through conscious goal setting," Applied and preventive psychology, vol. 5, no. 2, pp. 117-124, 1996, doi: http://dx.doi.org/10.1016/S0962-1849(96)80005-9.
[18] M. E. Seligman, M. E. Seligman, and M. E. Seligman, "Helplessness: On depression, development, and death," 1975.
[19] Bandura, "Self-efficacy: toward a unifying theory of behavioral change," Psychological review, vol. 84, no. 2, p. 191, 1977, doi: http://dx.doi.org/10.1037/0033-295X.84.2.191.
[20] H. Markus, "Self-schemata and processing information about the self," Journal of personality and social psychology, vol. 35, no. 2, p. 63, 1977, doi: http://dx.doi.org/10.1037/0022-3514.35.2.63.
[21] R. M. Ryan and E. L. Deci, "Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being," American psychologist, vol. 55, no. 1, p. 68, 2000, doi: http://dx.doi.org/10.1037/0003-066X.55.1.68.
[22] G. Barto, "Intrinsic motivation and reinforcement learning," in Intrinsically motivated learning in natural and artificial systems: Springer, 2013, pp. 17-47, doi: http://dx.doi.org/10.1007/978-3-642-32375-1_2.
[23] E. L. Deci and R. M. Ryan, "The general causality orientations scale: Self-determination in personality," Journal of research in personality, vol. 19, no. 2, pp. 109-134, 1985, doi: http://dx.doi.org/10.1016/0092-6566(85)90023-6.
[24] R. W. White, "Motivation reconsidered: The concept of competence," Psychological review, vol. 66, no. 5, p. 297, 1959, doi: http://dx.doi.org/10.1037/14156-005.
[25] P.-Y. Oudeyer and F. Kaplan, "How can we define intrinsic motivation," in Proc. of the 8th Conf. on Epigenetic Robotics, 2008, vol. 5, pp. 29-31.
[26] S. Roohi, J. Takatalo, C. Guckelsberger, and P. Hämäläinen, "Review of intrinsic motivation in simulation-based game testing," in Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 2018: ACM, p. 347, doi: http://dx.doi.org/10.1145/3173574.3173921.
[27] D. Schunk, M. DiBenedetto. "Motivation and social cognitive theory," in Contemporary educational psychology, vol. 60, pp. 101832, 2020, doi: http://dx.doi.org/10.1093/oxfordhb/9780195399820.013.0002.
[28] M. Csikszentmihalyi, M. Csikzentmihaly, Flow: The psychology of optimal experience. Harper & Row New York, 1990.
[29] Rhinehart, N., et al. "Information is power: Intrinsic control via information capture," in Advances in Neural Information Processing Systems, vol. 34, pp. 10745–10758, 2021, doi: https://dx.doi.org/10.48550/arXiv.2112.03899.
[30] Zhang, T., et al. "Made: Exploration via maximizing deviation from explored regions," in Advances in Neural Information Processing Systems, vol. 34, pp. 9663–9680, 2021.
[31] M. Mirolli and G. Baldassarre, "Functions and mechanisms of intrinsic motivations," in Intrinsically Motivated Learning in Natural and Artificial Systems: Springer, 2013, pp. 49-72, doi: http://dx.doi.org/10.1007/978-3-642-32375-1_3.
[32] R. De Charms, Personal causation: The internal affective determinants of behavior. Routledge, 2013.
[33] M. Csikszentmihalyi, "Toward a psychology of optimal experience," in Flow and the foundations of positive psychology: Springer, 2014, pp. 209-226, doi: http://dx.doi.org/10.1007/978-94-017-9088-8_14.
[34] J. Schmidhuber, "Maximizing fun by creating data with easily reducible subjective complexity," in Intrinsically motivated learning in natural and artificial systems: Springer, 2013, pp. 95-128, doi: http://dx.doi.org/10.1007/978-3-642-32375-1_5.
[35] Eppe, M., et al. "Intelligent problem-solving as integrated hierarchical reinforcement learning," in Nature Machine Intelligence, vol. 4, no. 1, pp. 11–20, 2022, doi: http://dx.doi.org/10.1038/s42256-021-00433-9.
[36] P. Redgrave and K. Gurney, "The short-latency dopamine signal: a role in discovering novel actions?," Nature reviews neuroscience, vol. 7, no. 12, p. 967, 2006, doi: http://dx.doi.org/10.1038/nrn2022.
[37] G. Barto, S. Singh, and N. Chentanez, "Intrinsically motivated learning of hierarchical collections of skills," in Proceedings of the 3rd International Conference on Development and Learning, 2004, pp. 112-19.
[38] J. Schmidhuber, "Formal theory of creativity, fun, and intrinsic motivation (1990–2010)," IEEE Transactions on Autonomous Mental Development, vol. 2, no. 3, pp. 230-247, 2010.
[39] M. Mirolli and G. Baldassarre, "Intrinsically motivated learning in natural and artificial systems," Intrinsically Motivated Learning in Natural and Artificial Systems, pp. 49-72, 2013.
[40] S. Forestier, Y. Mollard, and P.-Y. Oudeyer, "Intrinsically motivated goal exploration processes with automatic curriculum learning," arXiv preprint arXiv:1708.02190, 2017, doi: https://dx.doi.org/10.48550/arXiv.1708.02190.
[41] P.-Y. Oudeyer, A. Baranes, and F. Kaplan, "Intrinsically motivated learning of real-world sensorimotor skills with developmental constraints," in Intrinsically motivated learning in natural and artificial systems: Springer, 2013, pp. 303-365, doi: http://dx.doi.org/10.1007/978-3-642-32375-1_13.
[42] G. Baldassarre and M. Mirolli, "Deciding which skill to learn when: temporal-difference competence-based intrinsic motivation (TD-CB-IM)," in Intrinsically Motivated Learning in Natural and Artificial Systems: Springer, 2013, pp. 257-278, doi: http://dx.doi.org/10.1007/978-3-642-32375-1_11.
[43] J. Schmidhuber, "A possibility for implementing curiosity and boredom in model-building neural controllers," in Proc. of the international conference on simulation of adaptive behavior: From animals to animats, 1991, pp. 222-227, doi: http://dx.doi.org/10.7551/mitpress/3115.003.0030.
[44] J. Schmidhuber, "Curious model-building control systems," in Proc. international joint conference on neural networks, 1991, pp. 1458-1463, doi: http://dx.doi.org/10.1109/IJCNN.1991.170605.
[45] N. Roy and A. McCallum, "Toward optimal active learning through monte carlo estimation of error reduction," ICML, Williamstown, pp. 441-448, 2001.
[46] M. Mutti, R. De Santi, M. Restelli, "The importance of non-markovianity in maximum state entropy exploration," in International Conference on Machine Learning, 2022, pp. 16223–16239, doi: https://dx.doi.org/10.48550/arXiv.2202.03060.
[47] P.-Y. Oudeyer, "Intelligent adaptive curiosity: a source of self-development," 2004.
[48] Kim, D., et al. "Accelerating reinforcement learning with value-conditional state entropy exploration," in Advances in Neural Information Processing Systems, vol. 36, 2024.
[49] Q. Yang, M. Spaan, "Cem: Constrained entropy maximization for task-agnostic safe exploration," in Proceedings of the AAAI Conference on Artificial Intelligence, 2023, pp. 10798–10806, doi: http://dx.doi.org/10.1609/aaai.v37i9.26281.
[50] Colin, T., et al. "Hierarchical reinforcement learning as creative problem solving," in Robotics and Autonomous Systems, vol. 86, pp. 196–206, 2016, doi: http://dx.doi.org/10.1016/j.robot.2016.08.021.
[51] C. Tenorio-González and E. F. Morales, "Automatic discovery of concepts and actions," Expert Systems with Applications, vol. 92, pp. 192-205, 2018, doi: http://dx.doi.org/10.1016/j.eswa.2017.09.023.
[52] Memmel, M., et al. "ASID: Active Exploration for System Identification in Robotic Manipulation," in arXiv preprint arXiv:2404.12308, 2024.
[53] R. Rayyes, H. Donat, J. Steil. "Efficient online interest-driven exploration for developmental robots," in IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 4, pp. 1367–1377, 2020, doi: http://dx.doi.org/10.1109/TCDS.2020.3001633.
[54] Rayyes, R., et al. "Interest-driven exploration with observational learning for developmental robots," in IEEE Transactions on Cognitive and Developmental Systems, vol. 15, no. 2, pp. 373–384, 2021, doi: http://dx.doi.org/10.1109/TCDS.2021.3057758.
[55] M. Mutti and M. Restelli, "An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies," arXiv preprint arXiv:1907.04662, 2019, doi: http://dx.doi.org/10.1609/aaai.v34i04.5968.
[56] V. G. Santucci, G. Baldassarre, and M. Mirolli, "GRAIL: a goal-discovering robotic architecture for intrinsically-motivated learning," IEEE Transactions on Cognitive and Developmental Systems, vol. 8, no. 3, pp. 214-231, 2016, doi: http://dx.doi.org/10.1109/TCDS.2016.2538961.
[57] J. Achterhold, M. Krimmel, J. Stueckler, "Learning temporally extended skills in continuous domains as symbolic actions for planning," in Conference on Robot Learning, 2023, pp. 225–236.
[58] M. Schembri, M. Mirolli, and G. Baldassarre, "Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot," in 2007 IEEE 6th International Conference on Development and Learning, 2007: IEEE, pp. 282-287, doi: http://dx.doi.org/10.1109/DEVLRN.2007.4354052.
[59] Cartoni, E., et al. "REAL-X—Robot open-Ended Autonomous Learning Architecture: Building Truly End-to-End Sensorimotor Autonomous Learning Systems," in IEEE Transactions on Cognitive and Developmental Systems, 2023, doi: http://dx.doi.org/10.1109/TCDS.2023.3270081.
[60] Baranes and P.-Y. Oudeyer, "Intrinsically motivated goal exploration for active motor learning in robots: A case study," in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010: IEEE, pp. 1766-1773, doi: http://dx.doi.org/10.1109/IROS.2010.5651385.
[61] S. Hangl, V. Dunjko, H. J. Briegel, and J. Piater, "Skill learning by autonomous robotic playing using active learning and creativity," arXiv preprint arXiv:1706.08560, 2017, doi: https://dx.doi.org/10.48550/arXiv.1706.08560.
[62] H. Qureshi, Y. Nakamura, Y. Yoshikawa, and H. Ishiguro, "Intrinsically motivated reinforcement learning for human–robot interaction in the real-world," Neural Networks, vol. 107, pp. 23-33, 2018, doi: http://dx.doi.org/10.1016/j.neunet.2018.03.014.
[63] Chiappa, A., et al. "Acquiring musculoskeletal skills with curriculum-based reinforcement learning," in bioRxiv, pp. 2024–01, 2024, doi: http://dx.doi.org/10.1016/j.neuron.2024.09.002.
[64] D. Tanneberg, J. Peters, and E. Rueckert, "Intrinsic motivation and mental replay enable efficient online adaptation in stochastic recurrent networks," Neural Networks, vol. 109, pp. 67-80, 2019, doi: http://dx.doi.org/10.1016/j.neunet.2018.10.005.
[65] U. Nehmzow, Y. Gatsoulis, E. Kerr, J. Condell, N. Siddique, and T. M. McGuinnity, "Novelty detection as an intrinsic motivation for cumulative learning robots," in Intrinsically Motivated Learning in Natural and Artificial Systems: Springer, 2013, pp. 185-207, doi: http://dx.doi.org/10.1007/978-3-642-32375-1_8.
[66] S. Marsland, U. Nehmzow, and J. Shapiro, "A real-time novelty detector for a mobile robot," arXiv preprint cs/0006006, 2000, doi: https://dx.doi.org/10.48550/arXiv.cs/0006006.
[67] H. V. Neto and U. Nehmzow, "Incremental PCA: An alternative approach for novelty detection," Towards Autonomous Robotic Systems, 2005.
[68] D. Tanneberg, J. Peters, and E. Rueckert, "Online learning with stochastic recurrent neural networks using intrinsic motivation signals," in Conference on Robot Learning, 2017, pp. 167-174.
[69] E. Özbilge, "Experiments in online expectation-based novelty-detection using 3D shape and colour perceptions for mobile robot inspection," Robotics and Autonomous Systems, vol. 117, pp. 68-79, 2019, doi: http://dx.doi.org/10.1016/j.robot.2019.04.003
[70] S. Klyubin, D. Polani, and C. L. Nehaniv, "Empowerment: A universal agent-centric measure of control," in 2005 IEEE Congress on Evolutionary Computation, 2005, vol. 1: IEEE, pp. 128-135, doi: http://dx.doi.org/10.1109/CEC.2005.1554676.
[71] P. Capdepuy, D. Polani, and C. L. Nehaniv, "Maximization of potential information flow as a universal utility for collective behaviour," in 2007 IEEE Symposium on Artificial Life, 2007: Ieee, pp. 207-213, doi: http://dx.doi.org/10.1109/ALIFE.2007.367798.
[72] S. Mohamed and D. J. Rezende, "Variational information maximisation for intrinsically motivated reinforcement learning," in Advances in neural information processing systems, 2015, pp. 2125-2133.
[73] K. Gregor, D. J. Rezende, and D. Wierstra, "Variational intrinsic control," arXiv preprint arXiv:1611.07507, 2016, doi: https://dx.doi.org/10.48550/arXiv.1611.07507.
[74] D. Emukpere, B. Wu, J. Perez. "SLIM: Skill Learning with Multiple Critics," in arXiv preprint arXiv:2402.00823, 2024, doi: https://dx.doi.org/10.48550/arXiv.2402.00823.
[75] F. Leibfried, S. Pascual-Diaz, and J. Grau-Moya, "A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment," arXiv preprint arXiv:1907.12392, 2019, doi: https://dx.doi.org/10.48550/arXiv.1907.12392.
[76] J. Achiam and S. Sastry, "Surprise-based intrinsic motivation for deep reinforcement learning," arXiv preprint arXiv:1703.01732, 2017, doi: https://dx.doi.org/10.48550/arXiv.1703.01732
[77] B. C. Stadie, S. Levine, and P. Abbeel, "Incentivizing exploration in reinforcement learning with deep predictive models," arXiv preprint arXiv:1507.00814, 2015, doi: https://doi.org/10.48550/arXiv.1507.00814.
[78] M. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, and R. Munos, "Unifying count-based exploration and intrinsic motivation," in Advances in Neural Information Processing Systems, 2016, pp. 1471-1479.
[79] D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, "Curiosity-driven exploration by self-supervised prediction," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 16-17, doi: http://dx.doi.org/10.1109/CVPRW.2017.70.
[80] H.-K. Yang, P.-H. Chiang, K.-W. Ho, M.-F. Hong, and C.-Y. Lee, "Never Forget: Balancing Exploration and Exploitation via Learning Optical Flow," arXiv preprint arXiv:1901.08486, 2019, doi: https://doi.org/10.48550/arXiv.1901.08486.
[81] M. Salichs Sánchez-Caballero and M. Á. Malfaz Vázquez, "A new approach to modelling emotions and their use on a decision-making system for artificial agent," 2012.
[82] D. Dörner and C. D. Güss, "PSI: A computational architecture of cognition, motivation, and emotion," Review of General Psychology, vol. 17, no. 3, pp. 297-317, 2013, doi: http://dx.doi.org/10.1037/a0032947.
[83] P. Sequeira, "Socio-emotional reward design for intrinsically motivated learning agents," Unpublished doctoral dissertation). Universidada Técnica de Lisboa, 2013.
[84] Z. Deng et al., "Deep structured models for group activity recognition," arXiv preprint arXiv:1506.04191, 2015, doi: https://dx.doi.org/10.48550/arXiv.1506.04191.
[85] M. McGrath, D. Howard, and R. Baker, "A lagrange-based generalised formulation for the equations of motion of simple walking models," Journal of biomechanics, vol. 55, pp. 139-143, 2017, doi: http://dx.doi.org/10.1016/j.jbiomech.2017.02.013.
[86] Baranes and P.-Y. Oudeyer, "Active learning of inverse models with intrinsically motivated goal exploration in robots," Robotics and Autonomous Systems, vol. 61, no. 1, pp. 49-73, 2013, doi: http://dx.doi.org/10.1016/j.robot.2012.05.008.
[87] K. Seepanomwan, V. G. Santucci, and G. Baldassarre, "Intrinsically motivated discovered outcomes boost user's goals achievement in a humanoid robot," in 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), 2017: IEEE, pp. 178-183, doi: http://dx.doi.org/10.1109/DEVLRN.2017.8329804.
[88] Péré, S. Forestier, O. Sigaud, and P.-Y. Oudeyer, "Unsupervised learning of goal spaces for intrinsically motivated goal exploration," arXiv preprint arXiv:1803.00781, 2018, doi: https://doi.org/10.48550/arXiv.1803.00781
[89] R. Zhao, X. Sun, and V. Tresp, "Maximum Entropy-Regularized Multi-Goal Reinforcement Learning," arXiv preprint arXiv:1905.08786, 2019, doi: https://doi.org/10.48550/arXiv.1905.08786.
[90] Ramamurthy, R., et al, "Novelty-guided reinforcement learning via encoded behaviors," in 2020 International Joint Conference on Neural Networks (IJCNN), 2020, pp. 1–8, doi: http://dx.doi.org/10.1109/IJCNN48605.2020.9206982.
[91] Xu, H., et al. "Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making," in PLOS Computational Biology, vol. 17, no. 6, pp. e1009070, 2021, doi: http://dx.doi.org/10.1371/journal.pcbi.1009070.
[92] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, "A brief survey of deep reinforcement learning," arXiv preprint arXiv:1708.05866, 2017, doi: https://dx.doi.org/10.1109/MSP.2017.2743240.
[93] V. Mnih et al., "Asynchronous methods for deep reinforcement learning," in International conference on machine learning, 2016, pp. 1928-1937.
[94] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized experience replay," arXiv preprint arXiv:1511.05952, 2015, doi: https://dx.doi.org/10.48550/arXiv.1511.05952.
[95] R. S. Sutton and A. G. Barto, Introduction to reinforcement learning (no. 4). MIT press Cambridge, 1998, doi: http://dx.doi.org/10.1109/TNN.1998.712192.
[96] N. Dilokthanakul, C. Kaplanis, N. Pawlowski, and M. Shanahan, "Feature control as intrinsic motivation for hierarchical reinforcement learning," IEEE transactions on neural networks and learning systems, 2019, doi: http://dx.doi.org/10.1109/TNNLS.2019.2891792.
[97] J. Schmidhuber, "Artificial curiosity based on discovering novel algorithmic predictability through coevolution," in Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), 1999, vol. 3: IEEE, pp. 1612-1618, doi: http://dx.doi.org/10.1109/CEC.1999.785467
[98] T. D. Kulkarni, K. Narasimhan, A. Saeedi, and J. Tenenbaum, "Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation," in Advances in neural information processing systems, 2016, pp. 3675-3683.
[99] O. Nachum, S. S. Gu, H. Lee, and S. Levine, "Data-efficient hierarchical reinforcement learning," in Advances in Neural Information Processing Systems, 2018, pp. 3303-3313.
[100] Levy, R. Platt, and K. Saenko, "Hierarchical actor-critic," arXiv preprint arXiv:1712.00948, 2017.
[101] S. Vezhnevets et al., "Feudal networks for hierarchical reinforcement learning," in Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017: JMLR. org, pp. 3540-3549.
[102] D. J. Mankowitz, T. A. Mann, and S. Mannor, "Adaptive skills adaptive partitions (ASAP)," in Advances in Neural Information Processing Systems, 2016, pp. 1588-1596.
[103] P.-L. Bacon, J. Harb, and D. Precup, "The option-critic architecture," in Thirty-First AAAI Conference on Artificial Intelligence, 2017, doi: http://dx.doi.org/10.1609/aaai.v31i1.10916.
[104] V. Mnih et al., "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, p. 529, 2015, doi: http://dx.doi.org/10.1038/nature14236.
[105] H. Van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double q-learning," in Thirtieth AAAI conference on artificial intelligence, 2016, doi: http://dx.doi.org/10.1609/aaai.v30i1.10295.
[106] T. P. Lillicrap et al., "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015, doi: https://dx.doi.org/ 10.48550/arXiv.1509.02971.
[107] N. Heess et al., "Emergence of locomotion behaviours in rich environments," arXiv preprint arXiv:1707.02286, 2017, doi: https://dx.doi.org/ 10.48550/arXiv.1707.02286.
[108] Rajeswaran et al., "Learning complex dexterous manipulation with deep reinforcement learning and demonstrations," arXiv preprint arXiv:1709.10087, 2017, doi: https://dx.doi.org/10.48550/arXiv.1709. 10087.
[109] S. Gu, E. Holly, T. Lillicrap, and S. Levine, "Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates," in 2017 IEEE international conference on robotics and automation (ICRA), 2017: IEEE, pp. 3389-3396.
[110] T. Lesort, N. Díaz-Rodríguez, J.-F. Goudou, and D. Filliat, "State representation learning for control: An overview," Neural Networks, vol. 108, pp. 379-392, 2018, doi: http://dx.doi.org/10.1016/j.neunet.2018 .07.006.
[111] V. Nair, V. Pong, M. Dalal, S. Bahl, S. Lin, and S. Levine, "Visual reinforcement learning with imagined goals," in Advances in Neural Information Processing Systems, 2018, pp. 9191-9200.
[112] V. H. Pong, M. Dalal, S. Lin, A. Nair, S. Bahl, and S. Levine, "Skew-Fit: State-Covering Self-Supervised Reinforcement Learning," arXiv preprint arXiv:1903.03698, 2019, doi: https://dx.doi.org/ 10.48550/arXiv.1903.03698.
[113] Florensa, Y. Duan, and P. Abbeel, "Stochastic neural networks for hierarchical reinforcement learning," arXiv preprint arXiv:1704.03012, 2017, doi: https://dx.doi.org/10.48550/arXiv.1704.03012.
[114] Eysenbach, A. Gupta, J. Ibarz, and S. Levine, "Diversity is all you need: Learning skills without a reward function," arXiv preprint arXiv:1802.06070, 2018, doi: https://dx.doi.org/10.48550/arXiv. 1802. 06070.
[115] Warde-Farley, T. Van de Wiele, T. Kulkarni, C. Ionescu, S. Hansen, and V. Mnih, "Unsupervised control through non-parametric discriminative rewards," arXiv preprint arXiv:1811.11359, 2018, doi: https://dx.doi.org/10.48550/arXiv.1811.11359.
[116] J. D. Co-Reyes, Y. Liu, A. Gupta, B. Eysenbach, P. Abbeel, and S. Levine, "Self-consistent trajectory autoencoder: Hierarchical reinforcement learning with trajectory embeddings," arXiv preprint arXiv:1806.02813, 2018, doi: https://doi.org/10.48550/ arXiv.1806.02813.
[117] Ozbilge, E. Ozbilge. "Fusion of Novelty Detectors Using Deep and Local Invariant Visual Features for Inspection Task," in IEEE Access, vol. 10, pp. 121032–121047, 2022, doi: http://dx.doi.org/10.1109/ ACCESS .2022.3222810.
[118] Modirshanechi, A., et al. "The curse of optimism: a persistent distraction by novelty," in bioRxiv, pp. 2022–07, 2022.
[119] Le, H., et al, "Beyond Surprise: Improving Exploration Through Surprise Novelty.," in AAMAS, 2024, pp. 1084–1092.
[120] H. Jiang, Z. Ding, Z. Lu. "Settling Decentralized Multi-Agent Coordinated Exploration by Novelty Sharing," in arXiv preprint arXiv:2402.02097, 2024, doi: https://dx.doi.org/10.48550/arXiv.2402.02097.
[121] R. Zhao, P. Abbeel, S. Tiomkin. "Efficient online estimation of empowerment for reinforcement learning," in arXiv preprint arXiv:2007.07356, 2020, doi: https://dx.doi.org/10.48550/arXiv.2412.07762.
[122] Choi, J., et al. "Variational empowerment as representation learning for goal-based reinforcement learning," in arXiv preprint arXiv:2106.01404, 2021, doi: https://dx.doi.org/10.48550/arXiv.2106.01404.
[123] Brändle, F., et al. "Intrinsically motivated exploration as empowerment,”,2022.
[124] Dai, S., et al. "An empowerment-based solution to robotic manipulation tasks with sparse rewards," in Autonomous Robots, vol. 47, no. 5, pp. 617–633, 2023, doi: http://dx.doi.org/10.1007/s10514-023-10087-8.
[125] Brändle, F., et al. "Empowerment contributes to exploration behaviour in a creative video game," in Nature Human Behaviour, vol. 7, no. 9, pp. 1481–1489, 2023, doi: http://dx.doi.org/10.1038/s41562-023-01661-2.
[126] Becker-Ehmck, P., et al, "Exploration via empowerment gain: Combining novelty, surprise and learning progress," in ICML 2021 Workshop on Unsupervised Reinforcement Learning, 2021.
[127] Heiden, T., et al, "Reliably Re-Acting to Partner’s Actions with the Social Intrinsic Motivation of Transfer Empowerment," in ALIFE 2022: The 2022 Conference on Artificial Life, 2022.
[128] Andrychowicz, M., et al. "Hindsight experience replay," in Advances in neural information processing systems, vol. 30, 2017.
[129] Guzzi, J., et al, "A model of artificial emotions for behavior-modulation and implicit coordination in multi-robot systems," in Proceedings of the genetic and evolutionary computation conference, 2018, pp. 21–28, doi: http://dx.doi.org/10.1145/3205455.3205650.
[130] Wang, A., et al, "A unifying framework for social motivation in human-robot interaction," in The AAAI
[131] 2020 Workshop on Plan, Activity, and Intent Recognition (PAIR 2020), 2020.
[132] K. Iinuma, K. Kogiso. "Emotion-involved human decision-making model," in Mathematical and Computer Modelling of Dynamical Systems, vol. 27, no. 1, pp. 543–561, 2021, doi: http://dx.doi.org/10.1080/ 13873954.2021.1986846.
[133] Kirtay, M., et al. "Emotion as an emergent phenomenon of the neurocomputational energy regulation mechanism of a cognitive agent in a decision-making task," in Adaptive Behavior, vol. 29, no. 1, pp. 55–71, 2021, doi: http://dx.doi.org/10.1177/ 1059712319880649.
[134] J. Taverner, E. Vivancos, V. Botti. "A Multidimensional Culturally Adapted Representation of Emotions for Affective Computational Simulation and Recognition," in IEEE Transactions on Affective Computing, vol. 14, no. 01, pp. 761–772, 2023, doi: http://dx.doi.org/10.1109/TAFFC.2020.3030586.
[135] Ren, Z., et al. "Exploration via hindsight goal generation," in Advances in Neural Information Processing Systems, vol. 32, 2019.
[136] Bing, Z., et al. "Complex robotic manipulation via graph-based hindsight goal generation," in IEEE transactions on neural networks and learning systems, vol. 33, no. 12, pp. 7863–7876, 2021, doi: http://dx.doi.org/10.1109/TNNLS.2021.3088947.
[137] J. Kim, Y. Seo, J. Shin. "Landmark-guided subgoal generation in hierarchical reinforcement learning," in Advances in neural information processing systems, vol. 34, pp. 28336–28349, 2021.
[138] Bagaria, A., et al. "Scaling goal-based exploration via pruning proto-goals," in arXiv preprint arXiv:2302.04693, 2023, doi: https://dx.doi.org/ 10.48550/arXiv.2302.04693.
[139] Park, S., et al, "Offline Goal-Conditioned RL with Latent States as Actions," in ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023.
[140] L. Wu, K. Chen. "Goal Exploration via Adaptive Skill Distribution for Goal-Conditioned Reinforcement Learning," in arXiv preprint arXiv:2404.12999, 2024, doi: https://dx.doi.org/10.48550/arXiv.2404.12999.
[141] M. Hameed, M. Khan, A. Schwung. "Curiosity Based Reinforcement Learning on Robot Manufacturing Cell," in arXiv preprint arXiv:2011.08743, 2020, doi: https://dx.doi.org/10.48550/arXiv.2011.08743.
[142] N. Bougie, R. Ichise. "Fast and slow curiosity for high-level exploration in reinforcement learning," in Applied Intelligence, vol. 51, pp. 1086–1107, 2021, doi: http://dx.doi.org/10.1007/s10489-020-01849-3.
[143] Mazzaglia, P., et al, "Curiosity-driven exploration via latent bayesian surprise," in Proceedings of the AAAI conference on artificial intelligence, 2022, pp. 7752–7760, doi: http://dx.doi.org/10.1609/aaai.v36i7.20743.
[144] Jarrett, D., et al. "Curiosity in hindsight: intrinsic exploration in stochastic environments," ,2023.
[145] C. Zhou, T. Machado, C. Harteveld, "Cautious curiosity: a novel approach to a human-like gameplay agent," in Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 2023, pp. 370–379, doi: http://dx.doi.org/10.1609/aiide.v19i1.27533.
[146] C. Sun, H. Qian, C. Miao, "CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning," in Proceedings of the AAAI Conference on Artificial Intelligence, 2024, pp. 15145–15153, doi: http://dx.doi.org/10.1609/aaai.v38i13.29437.
[147] Dewan, S., et al. "Curiosity & Entropy Driven Unsupervised RL in Multiple Environments," in arXiv preprint arXiv:2401.04198, 2024, doi: https://dx.doi.org/10.48550/arXiv.2401.04198.
[148] P. Oudeyer, F. Kaplan, V. Hafner. "Intrinsic motivation systems for autonomous mental development," in IEEE transactions on evolutionary computation, vol. 11, no. 2, pp. 265–286, 2007, doi: http://dx.doi.org/ 10.1109/TEVC.2006.890271.
[149] S. Hart, R. Grupen. "Learning generalizable control programs," in IEEE Transactions on Autonomous Mental Development, vol. 3, no. 3, pp. 216–231, 2011, doi: http://dx.doi.org/10.1109/TAMD.2010.2103311.
[150] N. Duminy, D. Duhaut, undefined. others, "Strategic and interactive learning of a hierarchical set of tasks by the Poppy humanoid robot," in 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), 2016, pp. 204–209, doi: http://dx.doi.org/10.1109/ DEVLRN.2016.7846820.
[151] Gerken, M. Spranger, "Continuous Value Iteration (CVI) Reinforcement Learning and Imaginary Experience Replay (IER) for learning multi-goal, continuous action and state space controllers," in 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 7173–7179, doi: http://dx.doi.org/ 10.1109/ICRA.2019.8794347.
[152] R. Rayyes, H. Donat, J. Steil, "Hierarchical interest-driven goal babbling for efficient bootstrapping of sensorimotor skills," in 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 1336–1342, doi: http://dx.doi.org/10.1109/ ICRA40945.2020.9196763.
[153] Huang, S., et al. "Learning gentle object manipulation with curiosity-driven deep reinforcement learning. arXiv 2019," in arXiv preprint arXiv:1903.08542, doi: https://dx.doi.org/10.48550/arXiv.1903.08542
[154] Schulman, J., et al. "Proximal policy optimization algorithms," in arXiv preprint arXiv:1707.06347, 2017, doi: https://dx.doi.org/10.48550/arXiv.1707.06347.
[155] J. Lee, K. Toutanova. "Pre-training of deep bidirectional transformers for language understanding," in arXiv preprint arXiv:1810.04805, vol. 3, no. 8, 2018, doi: https://dx.doi.org/10.48550/arXiv.1810.04805.
[156] Chen, T., et al, "A simple framework for contrastive learning of visual representations," in International conference on machine learning, 2020, pp. 1597–1607.
[157] Frans, K., et al. "Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings," in arXiv preprint arXiv:2402.17135, 2024, doi: https://dx.doi.org/10.48550/arXiv.2402.17135.
A. Radford. "Improving language understanding by generative pre-training,", 2018.
[158] Tarvainen, H. Valpola. "Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results," in Advances in neural information processing systems, vol. 30, 2017.
[159] Burda, Y., et al. "Exploration by random network distillation," in arXiv preprint arXiv:1810.12894, 2018, doi: https://dx.doi.org/10.48550/arXiv.1810.12894.
[160] Haarnoja, T., et al, "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor," in International conference on machine learning, 2018, pp. 1861–1870.
[161] Hafner, D., et al. "Dream to control: Learning behaviors by latent imagination," in arXiv preprint arXiv:1912.01603, 2019, doi: https://dx.doi.org/ 10.48550/arXiv.1912.01603.
[162] Schrittwieser, J., et al. "Mastering atari, go, chess and shogi by planning with a learned model," in Nature, vol. 588, no. 7839, pp. 604–609, 2020, doi: http://dx.doi.org/ 10.1038/s41586-020-03051-4.
[163] OpenAI, O., et al. "Asymmetric self-play for automatic goal discovery in robotic manipulation," in arXiv preprint arXiv:2101.04882, 2021, doi: https://dx.doi.org/10.48550/arXiv.2101.04882.
[164] Cao, J., et al. "Hierarchical multi-agent reinforcement learning for cooperative tasks with sparse rewards in continuous domain," in Neural Computing and Applications, vol. 36, no. 1, pp. 273–287, 2024, doi: http://dx.doi.org/10.1007/s00521-023-08882-6.
[165] J. Lehman, K. Stanley. "Abandoning objectives: Evolution through the search for novelty alone," in Evolutionary computation, vol. 19, no. 2, pp. 189–223, 2011, doi: http://dx.doi.org/10.1162/ EVCO_a_00025.
[166] Nguyen, D., et al, "Social Motivation for Modelling Other Agents under Partial Observability in Decentralised Training.," in IJCAI, 2023, pp. 4082–4090, doi: http://dx.doi.org/10.24963/ijcai.2023/454.
[167] Duminy, N., et al. "Intrinsically motivated open-ended multi-task learning using transfer learning to discover task hierarchy," in Applied Sciences, vol. 11, no. 3, pp. 975, 2021, doi: http://dx.doi.org/10.3390/app11030975.