Update notes文末翻译版本
This file records all major updates and new features, starting from version 0.5. As Tensorforce is still developing, updates and bug fixes for the internal architecture are continuously being implemented, which will not be tracked here in detail.
Latest
- Added optional
memory
argument to various agents - Improved summary labels, particularly
"entropy"
and"kl-divergence"
linear
layer now accepts tensors of rank 1 to 3- Network output / distribution input does not need to be a vector anymore
Version 0.5.2
- Improved unittest performance
- Added
updates
and renamedtimesteps
/episodes
counter for agents and runners - Renamed
critic_{network,optimizer}
argument tobaseline_{network,optimizer}
- Added Actor-Critic (
ac
), Advantage Actor-Critic (a2c
) and Dueling DQN (dueling_dqn
) agents - Improved "same" baseline optimizer mode and added optional weight specification
- Reuse layer now global for parameter sharing across modules
- New block layer type (
block
) for easier sharing of layer blocks - Renamed
PolicyAgent/-Model
toTensorforceAgent/-Model
- New
Agent.load(...)
function, saving includes agent specification - Removed
PolicyAgent
argument(baseline-)network
- Added policy argument
temperature
- Removed
"same"
and"equal"
options forbaseline_*
arguments and changed internal baseline handling - Combined
state/action_value
tovalue
objective with argumentvalue
either"state"
or"action"
Version 0.5.1
- Fixed setup.py packages value
Version 0.5.0
Agent:
- DQFDAgent removed (temporarily)
- DQNNstepAgent and NAFAgent part of DQNAgent
- Agents need to be initialized via
agent.initialize()
before application - States/actions of type
int
require an entrynum_values
(instead ofnum_actions
) Agent.from_spec()
changed and renamed toAgent.create()
Agent.act()
argumentfetch_tensors
changed and renamed toquery
,index
renamed toparallel
,buffered
removedAgent.observe()
argumentindex
renamed toparallel
Agent.atomic_observe()
removedAgent.save/restore_model()
renamed toAgent.save/restore()
Agent arguments:
update_mode
renamed toupdate
states_preprocessing
andreward_preprocessing
changed and combined topreprocessing
actions_exploration
changed and renamed toexploration
execution
entrynum_parallel
replaced by a separate argumentparallel_interactions
batched_observe
andbatching_capacity
replaced by argumentbuffer_observe
scope
renamed toname
DQNAgent arguments:
update_mode
replaced bybatch_size
,update_frequency
andstart_updating
optimizer
removed, implicitly defined as'adam'
,learning_rate
addedmemory
defines capacity of implicitly defined memory'replay'
double_q_model
removed (temporarily)
Policy gradient agent arguments:
- New mandatory argument
max_episode_timesteps
update_mode
replaced bybatch_size
andupdate_frequency
memory
removedbaseline_mode
removedbaseline
argument changed and renamed tocritic_network
baseline_optimizer
renamed tocritic_optimizer
gae_lambda
removed (temporarily)
PPOAgent arguments:
step_optimizer
removed, implicitly defined as'adam'
,learning_rate
added
TRPOAgent arguments:
cg_*
andls_*
arguments removed
VPGAgent arguments:
optimizer
removed, implicitly defined as'adam'
,learning_rate
added
Environment:
- Environment properties
states
andactions
are now functionsstates()
andactions()
- States/actions of type
int
require an entrynum_values
(instead ofnum_actions
) - New function
Environment.max_episode_timesteps()
Contrib environments:
- ALE, MazeExp, OpenSim, Gym, Retro, PyGame and ViZDoom moved to
tensorforce.environments
- Other environment implementations removed (may be upgraded in the future)
Runners:
- Improved
run()
API forRunner
andParallelRunner
ThreadedRunner
removed
Other:
examples
folder (includingconfigs
) removed, apart fromquickstart.py
- New
benchmarks
folder to replace parts of oldexamples
folder
更新记录
目录
此文件记录从版本0.5开始的所有主要更新和新功能。由于Tensorforce仍在开发中,内部体系结构的更新和bug修复正在不断地实现,这里不会对其进行详细的跟踪。
最新的
- 为各种代理添加可选的内存参数
- 改进的摘要标签,特别是“熵”和“kl散度”
- 线性层现在接受1到3阶的张量
- 网络输出/分配输入不再需要是矢量
版本0.5.2
- 改进的unittest性能
- 为代理程序和运行程序添加更新和重命名时间步/集计数器
- 将critic_{network,optimizer}参数重命名为baseline_{network,optimizer}
- 增加了角色-批评家(ac),优势角色-批评家(a2c)和决斗的DQN (dueling_dqn)代理
- 改进了“相同的”基准优化器模式,并增加了可选的重量规范
- 重用层现在是全局的,用于跨模块的参数共享
- 新的块层类型(块),更容易共享层块
- 将PolicyAgent/-Model重命名为TensorforceAgent/-Model
- 新增agent .load(…)功能,保存包含agent规范
- 删除了PolicyAgent参数(基线-)网络
- 附加策略参数温度
- 删除baseline_*参数的“相同”和“相等”选项,并更改内部基线处理
- 将state/action_value to value objective与参数值“state”或“action”组合起来
版本0.5.1
- 修正了setup.py包的值
版本0.5.0
代理:
- DQFDAgent删除(暂时)
- DQNNstepAgent和NAFAgent是DQNAgent的一部分
- 在应用程序之前,需要通过agent.initialize()初始化代理
- int类型的状态/动作需要一个条目num_values(而不是num_actions)
- 修改并重命名为Agent.create()
- act()参数fetch_tensors改变并改名为query, index改名为parallel, bufferedremove
- observe()参数索引重命名为parallel
- Agent.atomic_observe()删除
- 将Agent.save/restore_model()重命名为Agent.save/restore()
代理参数:
- 将update_mode重命名为update
- states_preprocessing和reward_preprocessing更改并合并为预处理
- actions_exploration更改并重命名为exploration
- 执行条目num_parallel被一个单独的参数parallel_interaction替换
- 参数buffer_observe代替了batched_observe和batching_capacity
范围重命名
- DQNAgent参数:
- 将update_mode替换为batch_size、update_frequency和start_updates
- 删除了优化器,隐式定义为“adam”,添加了learning_rate
- 内存定义了隐式定义内存的容量“重放”
- double_q_model删除(暂时)
政策梯度代理参数:
- 新的强制参数max_幕式_timesteps
- 将update_mode替换为batch_size和update_frequency
- 记忆删除
- baseline_mode删除
- 基线参数更改并重命名为critic_network
- baseline_optimizer改名为critic_optimizer
- gae_lambda删除(暂时)
PPOAgent参数:
- 删除step_optimizer,隐式定义为“adam”,添加learning_rate
TRPOAgent参数:
- 删除了cg_*和ls_*参数
VPGAgent参数:
- 删除了优化器,隐式定义为“adam”,添加了learning_rate
环境:
- 环境属性状态和动作现在是函数状态()和动作()
- int类型的状态/动作需要一个条目num_values(而不是num_actions)
- Environment.max_episode_timesteps()的新函数
Contrib环境:
- ALE, MazeExp, OpenSim, Gym, Retro, PyGame和ViZDoom移到了tensorforce.environments
- 删除了其他环境实现(将来可能会升级)
跑步者:
- 改进了运行程序和并行运行程序的run() API
- ThreadedRunner删除
其他:
- 删除了示例文件夹(包括配置),除了quickstart.py
- 新的基准文件夹,以取代部分旧的示例文件夹