互联网品牌营销落地全网营销特训营互联网品牌咨询互联网品牌推广
品牌咨询专线:18922358791
×

加入圈子

当前位置: 首页>品牌智库>数据中心> 余凯:从深度学习到深度增强学习

余凯:从深度学习到深度增强学习

日期:   作者:帷幄咨询官网:品牌营销策划|数字营销案例|互联网品牌策划|品牌营销策划案例   阅读次数:416

知行合一:从深度学习到深度增强学习,创始人&CEO,地平线机器人技术


深度学习





2015年6月一次关于人工智能的网上公开讨论,“我认为深度学习是机器学习的唯一方向”


“ 深度学习还有一个激动人心的应用,就是learning to control, 我认为机器人的控制,会因为DNN reinforcement learning的方法而发生改变 ”



Deep Learning Since 2006


从神经元到深度学习


为什么深度学习受到重视?

• 模拟大脑的行为
• 特别适合大数据
• End-to-end学习
• 提供一套建模语言


深度学习与大数据


End-to-end Learning

• Most critical for accuracy
• Account for most of the computation for testing
• Most time-consuming in development cycle
• Often hand-craft in practice


Deep Learning


深度卷积神经网络


ImageNet image classification error rate


Speech recognition


End-to-end learning for speech recognition


过去10年,深度学习导致感知方面的巨大突破


增强学习:学习如何做决策

看作决策系统和环境的博弈,连续做决策来优化长期收益


增强学习(reinforcement Agent anleadrn inEg)- 如何行为决策

- At each step t, the agent
• receive state st
• receive scalar reward rt
• execute action at
- The environment:
• receive action at
• emit state st+1
• emit scalar reward rt+1


- Reinforcement Learning (RL) is a general-purpose framework for artificial intelligence
• RL given an agent the capacity to act
• each action influence the agent’s future state
• success is measured by a scalar reward signal
- RL in a nutshull
• select actions to maximize future reward
- We seek a single agent which can solve any human-level task
• The essence of an intelligence agent


增强学习的应用领域

- Control physical systems: walk, fly, drive, swim
- Interact with users: retain customers, personalize channel, optimize user experience
- Solve logical problems: scheduling, bandwidth allocation
- Play games: chess, checker, Go, Atari games
- Learn sequential algorithms: attention, memory, conditional computation, activations


增强学习的框架:策略函数和价值函数



知行合一:AlphaGo 深度增强学习

- Apply deep learning to RL?
- Use deep network to represent value function/policy/model
- Optimize value function/policy/model end-to-end
- Using stochastic gradient descent


Reinforce算法

- The simplest possible reinforcement learning algorithm to maximize total reward
- Use policy gradient in a whole episode to update parameters


深度神经网络增强学习App: Gliocogale tDioeenpM:i nGd’s oAlphaGo

- Training Steps
1. train a policy network using previous matches (SL network)
2. train a policy network by playing with itself (RL network)
3. train an value network using matches played using RL network.


Bellman equation


Deep Q-learning


这次围棋人机大赛意味着什么?


深度增强学习:让自动驾驶从感知到控制



帷幄咨询是我国专注互联网品牌营销落地执行的第三方服务机构;主要服务项目有:全网营销特训营、互联网品牌营销微咨询、互联网思维培训、互联网实效性定制培训系统;助力传统企业互联网品牌转型升级,让互联网思维更加落地,促进线上业务更快发展。

关注你附近
下一篇:信息图:2016年国内外100多家大公司对VR的布局
上一篇:新和创 何家平:智能家居发展路线图
分享到:微信QQ空间新浪微博腾讯微博人人网

你可能喜欢的文章: