Update notes文末翻译版本

This file records all major updates and new features, starting from version 0.5. As Tensorforce is still developing, updates and bug fixes for the internal architecture are continuously being implemented, which will not be tracked here in detail.

Latest

Added optional memory argument to various agents
Improved summary labels, particularly "entropy" and "kl-divergence"
linear layer now accepts tensors of rank 1 to 3
Network output / distribution input does not need to be a vector anymore

Version 0.5.2

Improved unittest performance
Added updates and renamed timesteps/episodes counter for agents and runners
Renamed critic_{network,optimizer} argument to baseline_{network,optimizer}
Added Actor-Critic (ac), Advantage Actor-Critic (a2c) and Dueling DQN (dueling_dqn) agents
Improved "same" baseline optimizer mode and added optional weight specification
Reuse layer now global for parameter sharing across modules
New block layer type (block) for easier sharing of layer blocks
Renamed PolicyAgent/-Model to TensorforceAgent/-Model
New Agent.load(...) function, saving includes agent specification
Removed PolicyAgent argument (baseline-)network
Added policy argument temperature
Removed "same" and "equal" options for baseline_* arguments and changed internal baseline handling
Combined state/action_value to value objective with argument value either "state" or "action"

Version 0.5.1

Fixed setup.py packages value

Version 0.5.0

Agent:

DQFDAgent removed (temporarily)
DQNNstepAgent and NAFAgent part of DQNAgent
Agents need to be initialized via agent.initialize() before application
States/actions of type int require an entry num_values (instead of num_actions)
Agent.from_spec() changed and renamed to Agent.create()
Agent.act() argument fetch_tensors changed and renamed to query, index renamed to parallel, bufferedremoved
Agent.observe() argument index renamed to parallel
Agent.atomic_observe() removed
Agent.save/restore_model() renamed to Agent.save/restore()

Agent arguments:

update_mode renamed to update
states_preprocessing and reward_preprocessing changed and combined to preprocessing
actions_exploration changed and renamed to exploration
execution entry num_parallel replaced by a separate argument parallel_interactions
batched_observe and batching_capacity replaced by argument buffer_observe
scope renamed to name

DQNAgent arguments:

update_mode replaced by batch_size, update_frequency and start_updating
optimizer removed, implicitly defined as 'adam', learning_rate added
memory defines capacity of implicitly defined memory 'replay'
double_q_model removed (temporarily)

Policy gradient agent arguments:

New mandatory argument max_episode_timesteps
update_mode replaced by batch_size and update_frequency
memory removed
baseline_mode removed
baseline argument changed and renamed to critic_network
baseline_optimizer renamed to critic_optimizer
gae_lambda removed (temporarily)

PPOAgent arguments:

step_optimizer removed, implicitly defined as 'adam', learning_rate added

TRPOAgent arguments:

cg_* and ls_* arguments removed

VPGAgent arguments:

optimizer removed, implicitly defined as 'adam', learning_rate added

Environment:

Environment properties states and actions are now functions states() and actions()
States/actions of type int require an entry num_values (instead of num_actions)
New function Environment.max_episode_timesteps()

Contrib environments:

ALE, MazeExp, OpenSim, Gym, Retro, PyGame and ViZDoom moved to tensorforce.environments
Other environment implementations removed (may be upgraded in the future)

Runners:

Improved run() API for Runner and ParallelRunner
ThreadedRunner removed

Other:

examples folder (including configs) removed, apart from quickstart.py
New benchmarks folder to replace parts of old examples folder

更新记录

版本0.5.2

改进的unittest性能
为代理程序和运行程序添加更新和重命名时间步/集计数器
将critic_{network,optimizer}参数重命名为baseline_{network,optimizer}
增加了角色-批评家(ac)，优势角色-批评家(a2c)和决斗的DQN (dueling_dqn)代理
改进了“相同的”基准优化器模式，并增加了可选的重量规范
重用层现在是全局的，用于跨模块的参数共享
新的块层类型(块)，更容易共享层块
将PolicyAgent/-Model重命名为TensorforceAgent/-Model
新增agent .load(…)功能，保存包含agent规范
删除了PolicyAgent参数(基线-)网络
附加策略参数温度
删除baseline_*参数的“相同”和“相等”选项，并更改内部基线处理
将state/action_value to value objective与参数值“state”或“action”组合起来

版本0.5.1

修正了setup.py包的值

版本0.5.0

代理:

DQFDAgent删除(暂时)
DQNNstepAgent和NAFAgent是DQNAgent的一部分
在应用程序之前，需要通过agent.initialize()初始化代理
int类型的状态/动作需要一个条目num_values(而不是num_actions)
修改并重命名为Agent.create()
act()参数fetch_tensors改变并改名为query, index改名为parallel, bufferedremove
observe()参数索引重命名为parallel
Agent.atomic_observe()删除
将Agent.save/restore_model()重命名为Agent.save/restore()

代理参数:

将update_mode重命名为update
states_preprocessing和reward_preprocessing更改并合并为预处理
actions_exploration更改并重命名为exploration
执行条目num_parallel被一个单独的参数parallel_interaction替换
参数buffer_observe代替了batched_observe和batching_capacity

范围重命名

DQNAgent参数:
将update_mode替换为batch_size、update_frequency和start_updates
删除了优化器，隐式定义为“adam”，添加了learning_rate
内存定义了隐式定义内存的容量“重放”
double_q_model删除(暂时)

政策梯度代理参数:

新的强制参数max_幕式_timesteps
将update_mode替换为batch_size和update_frequency
记忆删除
baseline_mode删除
基线参数更改并重命名为critic_network
baseline_optimizer改名为critic_optimizer
gae_lambda删除(暂时)

PPOAgent参数:

删除step_optimizer，隐式定义为“adam”，添加learning_rate

TRPOAgent参数:

删除了cg_*和ls_*参数

VPGAgent参数:

删除了优化器，隐式定义为“adam”，添加了learning_rate

环境:

环境属性状态和动作现在是函数状态()和动作()
int类型的状态/动作需要一个条目num_values(而不是num_actions)
Environment.max_episode_timesteps()的新函数

Contrib环境:

ALE, MazeExp, OpenSim, Gym, Retro, PyGame和ViZDoom移到了tensorforce.environments
删除了其他环境实现(将来可能会升级)

跑步者:

改进了运行程序和并行运行程序的run() API
ThreadedRunner删除

其他:

删除了示例文件夹(包括配置)，除了quickstart.py
新的基准文件夹，以取代部分旧的示例文件夹

三千の世界

发布了122 篇原创文章 · 获赞 54 · 访问量 5万+

私信关注

QUANT[21]Tensorforce版本更新记录 Update notes.md

Update notes文末翻译版本

Latest

Version 0.5.2

Version 0.5.1

Version 0.5.0

更新记录

最新的

版本0.5.2

版本0.5.1

版本0.5.0

代理:

代理参数:

范围重命名

政策梯度代理参数:

PPOAgent参数:

TRPOAgent参数:

VPGAgent参数:

环境:

Contrib环境:

跑步者:

其他:

猜你喜欢