知行合一:从深度学习到深度增强学习,创始人&CEO,地平线机器人技术
深度学习
2015年6月一次关于人工智能的网上公开讨论,“我认为深度学习是机器学习的唯一方向”
“ 深度学习还有一个激动人心的应用,就是learning to control, 我认为机器人的控制,会因为DNN reinforcement learning的方法而发生改变 ”
Deep Learning Since 2006
从神经元到深度学习
为什么深度学习受到重视?
• 模拟大脑的行为
• 特别适合大数据
• End-to-end学习
• 提供一套建模语言
• 特别适合大数据
• End-to-end学习
• 提供一套建模语言
深度学习与大数据
End-to-end Learning
• Most critical for accuracy
• Account for most of the computation for testing
• Most time-consuming in development cycle
• Often hand-craft in practice
• Account for most of the computation for testing
• Most time-consuming in development cycle
• Often hand-craft in practice
Deep Learning
深度卷积神经网络
ImageNet image classification error rate
Speech recognition
End-to-end learning for speech recognition
过去10年,深度学习导致感知方面的巨大突破
增强学习:学习如何做决策
看作决策系统和环境的博弈,连续做决策来优化长期收益
增强学习(reinforcement Agent anleadrn inEg)- 如何行为决策
- At each step t, the agent
• receive state st
• receive scalar reward rt
• execute action at
- The environment:
• receive action at
• emit state st+1
• emit scalar reward rt+1
• receive state st
• receive scalar reward rt
• execute action at
- The environment:
• receive action at
• emit state st+1
• emit scalar reward rt+1
- Reinforcement Learning (RL) is a general-purpose framework for artificial intelligence
• RL given an agent the capacity to act
• each action influence the agent’s future state
• success is measured by a scalar reward signal
- RL in a nutshull
• select actions to maximize future reward
- We seek a single agent which can solve any human-level task
• The essence of an intelligence agent
• RL given an agent the capacity to act
• each action influence the agent’s future state
• success is measured by a scalar reward signal
- RL in a nutshull
• select actions to maximize future reward
- We seek a single agent which can solve any human-level task
• The essence of an intelligence agent
增强学习的应用领域
- Control physical systems: walk, fly, drive, swim
- Interact with users: retain customers, personalize channel, optimize user experience
- Solve logical problems: scheduling, bandwidth allocation
- Play games: chess, checker, Go, Atari games
- Learn sequential algorithms: attention, memory, conditional computation, activations
- Interact with users: retain customers, personalize channel, optimize user experience
- Solve logical problems: scheduling, bandwidth allocation
- Play games: chess, checker, Go, Atari games
- Learn sequential algorithms: attention, memory, conditional computation, activations
增强学习的框架:策略函数和价值函数
知行合一:AlphaGo 深度增强学习
- Apply deep learning to RL?
- Use deep network to represent value function/policy/model
- Optimize value function/policy/model end-to-end
- Using stochastic gradient descent
- Use deep network to represent value function/policy/model
- Optimize value function/policy/model end-to-end
- Using stochastic gradient descent
Reinforce算法
- The simplest possible reinforcement learning algorithm to maximize total reward
- Use policy gradient in a whole episode to update parameters
- Use policy gradient in a whole episode to update parameters
深度神经网络增强学习App: Gliocogale tDioeenpM:i nGd’s oAlphaGo
- Training Steps
1. train a policy network using previous matches (SL network)
2. train a policy network by playing with itself (RL network)
3. train an value network using matches played using RL network.
1. train a policy network using previous matches (SL network)
2. train a policy network by playing with itself (RL network)
3. train an value network using matches played using RL network.
Bellman equation
Deep Q-learning
这次围棋人机大赛意味着什么?
深度增强学习:让自动驾驶从感知到控制