[Optimization control] Based on the matlab strategy iteration algorithm to solve the optimization problem of the fault-tolerant tracking control of the reconstructed manipulator [including Matlab source code 2682]

1. Strategy iteration algorithm

There are two common iterative training algorithms for reinforcement learning: policy iteration algorithm and value iteration algorithm. In this paper, the policy iteration algorithm is mainly described.

Let’s start with a short-answer question. The picture below shows a square grid. The state space of each position is {1, 2, 3, 4}, where the position of 3 is a trap, and the position of 4 has a gold coin. There is a robot looking for gold coins starting from the position of state 1. The reward of falling into the trap is -1, the reward of finding the gold coin is 1, and the reward of moving between other positions is 0. The optional action space is {up, down, left, right}. Through this simple question, learn to strengthen learning principles of learning.
insert image description here
The learning process of reinforcement learning, my personal understanding is to update the value function of each state through continuous attempts (the value of each state represents the pros and cons of the current state, if the state value is large, choose an action from other states, transfer It is a correct choice to reach this state), and then dynamically adjust the strategy through the updated value function. After adjusting the strategy, update the value function, iteratively update, and finally complete a strategy that meets the requirements. In this process, two main processes are abstracted, the first is called strategy evaluation, and the second is called strategy improvement.

In response to the simple questions given above, some simple concepts are explained first:

The value function of each state:
represents the pros and cons of the robot when it is in this state.

The current strategy for the problem:
represents the next action chosen when the robot is in a certain state. For the selected next action, it can be deterministic, for example, when the robot is in position 1, it is definite to only choose to go to the right. It can also be probabilistic, you can choose to go to the right with a probability of 0.5, and choose to go down with a probability of 0.5. Of course, deterministic strategy selection is a special case of probabilistic strategy selection. The deterministic strategy is described below

Policy evaluation:
Policy evaluation is to calculate the value function of each state in the state space in some way. Since there are many transition relations between state spaces, it is very difficult to directly calculate the value function of a certain state, and an
iterative method is generally used.

Strategy improvement:
The improvement of the strategy, that is, to optimize the current strategy and modify the current strategy through the current information.

################################ This strategy and value function of the process of strategy assessment
.
insert image description hereFor this simple example, a stable value function is obtained through one-step calculation, but for most problems, it takes multiple iterations to obtain a stable value function.

############################### The process of policy improvement
For this simple example, the policy is improved in a greedy way, by The stable value function calculated in the previous policy evaluation process allows each state to choose the action that maximizes the benefit of the action when choosing the next action.
insert image description here
Summary
The process of reinforcement learning policy iteration algorithm is to continuously repeat the process of policy evaluation and policy improvement until the entire policy converges (the value function and policy no longer change greatly)

2. Part of the source code

clear all
close all
clc

global Q Q1 R l1 l2 wc0 lr P k k1 site;

k=5;
Q=k eyes(2);
Q1=k
eyes(4);
R=0.1*eye(2);
sita=1;

lr=0.001;
l1=2;
l2=20eye(4);
P=0.6
eye(2);
k1=1;
wc0=[20 30 40 20 30 40 50 40 50 55];

xw0=[1 1 0 0 2 -2 0 0 wc0 0 0];

options = odeset(‘OutputFcn’,@odeplot);
[t,xw]= ode15s(‘plant2’,[0 60],xw0,options);

U=[];
% % config a %%

% y1d=0.4sin(0.3t)-0.1cos(0.5t);
% y2d=0.3cos(0.6t)+0.6sin(0.2t);
% y3d=(3cos((3t)/10))/25 + sin(t/2)/20;
% y4d=(3cos(t/5))/25-(9sin((3*t)/5))/50;
%

% %  config b %%  

y1d=0.2cos(0.5t)+0.2sin(0.4t);
y2d=0.3cos(0.2t)-0.4sin(0.6t);
y3d=(2cos((2t)/5))/25 - sin(t/2)/10;
y4d=-(6cos((3t)/5))/25-(3*sin(t/5))/50;

U=[];
%
for i=1:size(xw,1)
% % config a %%
% x1d=0.4sin(0.3t(i))-0.1cos(0.5t(i));
% x2d=0.3cos(0.6t(i))+0.6sin(0.2t(i));
% x3d=(3cos((3t(i))/10))/25 + sin(t(i)/2)/20;
% x4d=(3cos(t(i)/5))/25 - (9sin((3t(i))/5))/50;
%
% dx1d=(3
cos((3t(i))/10))/25 + sin(t(i)/2)/20;
% dx2d=(3
cos(t(i)/5))/25 - (9sin((3t(i))/5))/50;
% dx3d=cos(t(i)/2)/40-(9sin((3t(i))/10))/250;
% dx4d=-(27cos((3t(i))/5))/250-(3*sin(t(i)/5))/125;

% % config b %%

x1d=0.2cos(0.5t(i))+0.2sin(0.4t(i));
x2d=0.3cos(0.2t(i))-0.4sin(0.6t(i));
x3d=(2cos((2t(i))/5))/25 - sin(t(i)/2)/10;
x4d=-(6cos((3t(i))/5))/25-(3*sin(t(i)/5))/50;

dx1d=0.3cos(0.2t(i))-0.4sin(0.6t(i));
dx2d=-(6cos((3t(i))/5))/25-(3sin(t(i)/5))/50;
dx3d=-cos(t(i)/2)/20-(4
sin((2t(i))/5))/125;
dx4d=(18
sin((3t(i))/5))/125-(3cos(t(i)/5))/250;

dxd=[dx1d;dx2d;dx3d;dx4d];

e1=xw(i,1)-x1d;
e2=xw(i,2)-x2d;
e3=xw(i,3)-x3d;
e4=xw(i,4)-x4d;
e=[e1 e2 e3 e4];

sigma=[e1^2 e1 e2 e1 e3 e1 e4 e2^2 e2 e3 e2 e3 e3^2 e3 e4 e4^2];
d_sigma=[2 e1 0 0 0;e2 e1 0 0;e3 0 e1 0;e4 0 0 e1;0 2 e2 0 0;0 e3 e2 0;0 e4 0 e2;0 0 2 e3 0;0 0 e4 e3 ;0 0 0 2 e4];

% congif a %
%
% Md=[0.36cos(x2d)+0.6066 0.18cos(x2d)+0.1233;
% 0.18cos(x2d)+0.1233 0.1233];
% Cd=[-0.36
sin(x2d)x4d -0.18sin(x2d)x4d;
% 0.18
sin(x2d)(x3d-x4d) 0.18sin(x2d)x3d];
% Nd=[-5.88
sin(x1d+x2d)-17.64sin(x1d);
% -5.88
sin(x1d+x2d)];

3. Running results

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

4. Matlab version and references

1 matlab version
2014a

2 References
[1] Reinforcement Learning Notes (2) - Strategy Iteration Algorithm

3 Remarks
Introduction This part is taken from the Internet and is for reference only. If there is any infringement, please contact to delete

Guess you like

Origin blog.csdn.net/TIQCmatlab/article/details/131125602