在学习了jcjohnson的pytorch-examples后,记录一下使用Pytorch编写简单的神经网络的一些体会。
Pytorch Module
在Pytorch中,nn包定义了一系列Modules,这些Modules
可以当做神经网络中的基本层,一个Module
接收输入Variables
然后计算出输出的Variables
。
Modules
可以包含其他Modules
(如torch.nn.Sequential
就可以包含torch.nn.Linear
),以树状图形式存储,在神经网络中经常使用的Sequential,Linear layers,Convolution layers,Dropout layers等都是Modules
的子类。
Pytorch nn
这里,使用Pytorch的nn
包实现一个两层的的神经网络。
首先,定义变量的维度。
import torch
from torch.autograd import Variable
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
然后使用随机初始化的Tensors
来存储输入,输出,并将其封装为Variables
。
# Create random Tensors to hold inputs and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)
将神经网络的Modules
到torch.nn.Sequential容器中,其中每一个Module
都通过一个线性函数从输入计算出输出,并将其weight和bias参数保存在中间变量中。
# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Variables for its weight and bias.
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)
之后我们需要通过loss function来评估该模型的学习能力,其中nn
包封装了一些常用的loss function,在本例中,使用了最小均方差(MSE)作为loss function。
# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(size_average=False)
learning_rate = 1e-4
上述操作结束后,我们就可以进行参数的学习了。在每次迭代中,首先通过输入计算出输出y_pred,然后计算出预测值与实际值之间的MSE,之后先将参数的梯度置为0,计算出学习参数的梯度后,通过学习率进行参数优化。
for t in range(500):
# Forward pass: compute predicted y by passing x to the model. Module objects
# override the __call__ operator so you can call them like functions. When
# doing so you pass a Variable of input data to the Module and it produces
# a Variable of output data.
y_pred = model(x)
# Compute and print loss. We pass Variables containing the predicted and true
# values of y, and the loss function returns a Variable containing the loss.
loss = loss_fn(y_pred, y)
print(t, loss.data[0])
# Zero the gradients before running the backward pass.
model.zero_grad()
# Backward pass: compute gradient of the loss with respect to all the learnable
# parameters of the model. Internally, the parameters of each Module are stored
# in Variables with requires_grad=True, so this call will compute gradients for
# all learnable parameters in the model.
loss.backward()
# Update the weights using gradient descent. Each parameter is a Variable, so
# we can access its data and gradients like we did before.
for param in model.parameters():
param.data -= learning_rate * param.grad.data
上面代码中通过学习率手动更新学习参数值,Pytorch中optim包封装了一些优化算法并且提供了调用接口。下面代码中,通过使用optim
包中的Adam算法来优化模型。
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
# Forward pass: compute predicted y by passing x to the model.
y_pred = model(x)
# Compute and print loss.
loss = loss_fn(y_pred, y)
print(t, loss.data[0])
# Before the backward pass, use the optimizer object to zero all of the
# gradients for the variables it will update (which are the learnable weights
# of the model)
optimizer.zero_grad()
# Backward pass: compute gradient of the loss with respect to model parameters
loss.backward()
# Calling the step function on an Optimizer makes an update to its parameters
optimizer.step()
定制 NN Modules
有时我们会想去自己定制一些更加复杂神经网络的层,而不是去使用已有的层。那么就可以自己编写nn.Module
的子类,然后定义forward
方法,在该方法中,接收输入Variable
,执行定义在Module
或者Variable
上的任意操作,返回输出Variable
下面是一个定制的Module
例子
import torch
from torch.autograd import Variable
class TwoLayerNet(torch.nn.Module):
def __init__(self, D_in, H, D_out):
"""
In the constructor we instantiate two nn.Linear modules and assign them as
member variables.
"""
super(TwoLayerNet, self).__init__()
self.linear1 = torch.nn.Linear(D_in, H)
self.linear2 = torch.nn.Linear(H, D_out)
def forward(self, x):
"""
In the forward function we accept a Variable of input data and we must return
a Variable of output data. We can use Modules defined in the constructor as
well as arbitrary operators on Variables.
"""
h_relu = self.linear1(x).clamp(min=0)
y_pred = self.linear2(h_relu)
return y_pred
在TwoLayerNet
中,封装了两层Linear layers
,然后在forward方法中执行了定义在Linear
和Variable
上的常规方法。
之后,就可以使用TwoLayerNet
来构造模型。
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs, and wrap them in Variables
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)
# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)
# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x)
# Compute and print loss
loss = criterion(y_pred, y)
print(t, loss.data[0])
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()