[Pytorch]Autograd

Date: 2023.03.03 Updated: 2023.03.03

카테고리: Pytorch

1. 자동미분(Autograd)

1) AUTOGRAD (자동 미분)

autograd 패키지는 Tensor의 모든 연산에 대해 자동 미분 제공
이는 코드를 어떻게 작성하여 실행하느냐에 따라 역전파가 정의된다는 뜻
backprop를 위한 미분값을 자동으로 계산

2) Tensor

data: tensor형태의 데이터
grad: data가 겨쳐온 layer에 대한 미분값 저장
grad_fn: 미분값을 계산한 함수에 대한 정보 저장 (어떤 함수에 대해서 backprop 했는지)
requires_grad 속성을 True로 설정하면, 해당 텐서에서 이루어지는 모든 연산들을 추적하기 시작
계산이 완료된 후, .backward()를 호출하면 자동으로 gradient를 계산할 수 있으며, .grad 속성에 누적됨
기록을 추적하는 것을 중단하게 하려면, .detach()를 호출하여 연산기록으로부터 분리
기록을 추적하는 것을 방지하기 위해 코드 블럭을 with torch.no_grad():로 감싸면 gradient는 필요없지만, requires_grad=True로 설정되어 학습 가능한 매개변수를 갖는 모델을 평가(evaluate)할 때 유용
Autograd 구현에서 매우 중요한 클래스 : Function 클래스

Autograd 예시 1)

import torch

# requires_grad를 설정할 때만 기울기 추적
x = torch.tensor([3.0,4.0], requires_grad = True)
y = torch.tensor([1.0,2.0], requires_grad = True)
z = x + y

print(z) 
print(z.grad_fn)

tensor([4., 6.], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x7f6f4369b6d0>

out = z.mean()
print(out)
print(out.grad_fn)

tensor(5., grad_fn=<MeanBackward0>)
<MeanBackward0 object at 0x7f6f43654790>

out.backward()
print(x.grad)
print(y.grad)
print(z.grad)

tensor([0.5000, 0.5000])
tensor([0.5000, 0.5000])
None

import torch

# requires_grad를 설정할 때만 기울기 추적
x = torch.tensor([3.0, 4.0], requires_grad=True)
y = torch.tensor([1.0, 2.0], requires_grad=True)
z = x + y

print(z) # [4.0, 6.0]
print(z.grad_fn) # 더하기(add)

out = z.mean()
print(out) # 5.0
print(out.grad_fn) # 평균(mean)

out.backward() # scalar에 대하여 가능
print(x.grad)
print(y.grad)
print(z.grad) # leaf variable에 대해서만 gradient 추적이 가능하다. 따라서 None. 즉, x,y 에 대해서만 gradient 값 추적 가능

tensor([4., 6.], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x7f6f43654640>

tensor(5., grad_fn=<MeanBackward0>)
<MeanBackward0 object at 0x7f6f43654790>

tensor([0.5000, 0.5000])
tensor([0.5000, 0.5000])
None

일반적으로 모델을 학습할 때는 기울기(gradient)를 추적한다.
하지만, 학습된 모델을 사용할 때는 파라미터를 업데이트하지 않으므로, 기울기를 추적하지 않는 것이 일반적이다.

temp = torch.tensor([3.0, 4.0], requires_grad = True)
print(temp.requires_grad)
print((temp**2).requires_grad)

True
True

with torch.no_grad():
    temp = torch.tensor([3.0,4.0], requires_grad = True)
    print(temp.requires_grad)
    print((temp**2).requires_grad)

True
False

temp = torch.tensor([3.0, 4.0], requires_grad=True)
print(temp.requires_grad)
print((temp ** 2).requires_grad)

# 기울기 추적을 하지 않기 때문에 계산 속도가 더 빠르다.
with torch.no_grad():
    temp = torch.tensor([3.0, 4.0], requires_grad=True)
    print(temp.requires_grad)
    print((temp ** 2).requires_grad)

True
True
True
False

Autograd 예시 2)

import torch

x = torch.ones(3,3, requires_grad = True)
print(x)

y = x + 5
print(y)
print(y.grad_fn) ## +5 연산에 대해서 back ward할 수 있는 오브젝트가 있다: AddBackward0

z = y*y*2
out = z.mean()

print(z, out) ## z는 더하기 연산에 대한 addbackward에 더하여 Mulbackward가 붙었고,
              ## out은 평균을 구해줬으므로 72, out에도 gradient fucntion이 붙었는데 meanbackward가 붙음)

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], requires_grad=True)
        
tensor([[6., 6., 6.],
        [6., 6., 6.],
        [6., 6., 6.]], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x7f6f4366f4f0>

tensor([[72., 72., 72.],
        [72., 72., 72.],
        [72., 72., 72.]], grad_fn=<MulBackward0>) tensor(72., grad_fn=<MeanBackward0>)

requires_grad_(...)는 기존 텐서의 requires_grad값을 바꿔치기(in-place)하여 변경

a = torch.randn(3,3)
a = ((a * 3)/(a - 1))
print(a.requires_grad)

a.requires_grad_(True)
print(a.requires_grad)

b = (a * a).sum()
print(b.grad_fn) ## b가 a*ad의 sum연산이니 SumBackward가 리턴됨

False
True
<SumBackward0 object at 0x7f6f43671400>

이제 기울기 즉, Gradient 값을 계산해야한다. requires_grad = True 로 설정되어 있으니 .backward()를 통해 역전파 계산이 가능하다.(Backpropagation)

out.backward()
print(x.grad)

tensor([[2.6667, 2.6667, 2.6667],
        [2.6667, 2.6667, 2.6667],
        [2.6667, 2.6667, 2.6667]])

이처럼. 역전파를 진행하는 명령어를 통해서 실행했고, \(\frac{\partial (out)}{\partial x}\) 값이 바로 x.grad이다.

x = torch.rand(3, requires_grad = True)

y = x * 2
while y.data.norm() < 1000:  ## x값에서 2를 곱한 y값에 2를 계속 곱해서 norm이 1000미만일때까지 곱함함
    y = y * 2

print(y)

tensor([ 589.7665,  299.5266, 1531.2843], grad_fn=<MulBackward0>)

v = torch.tensor([0.1, 1.0, 0.0001], dtype = torch.float)
y.backward(v)

print(x.grad)

tensor([2.0480e+02, 2.0480e+03, 2.0480e-01])

Training을 하는 것이 아닌 Evaluation(평가)를 할 때는, gradient값이 업데이트되면 안된다. 즉, Test set을 이용하는데 있어서 미분 값은 업데이트하지 않고 유지시켜야 한다. with torch.no_grad() 메서드를 사용하여 gradient의 업데이트를 하지 않는다.

print(x.requires_grad)          ## True
print((x**2).requires_grad)     ## True

with torch.no_grad():
    print((x**2).requires_grad) ## False

True
True
False

detach(): 내용물(contant)은 같지만, reuquire_grad가 다른 새로운 Tensor를 가져올 때 사용하는 메서드이다.

print(x.requires_grad)
y = x.detach()
print(y.requires_grad)
print(x.eq(y).all())

True
False
tensor(True)

Autograd 흐름 다시 보기(1)

계산 흐름
1. forward방향은 \(a \rightarrow b \rightarrow c \rightarrow out\) 같다.
2. 그럼 \(\quad \frac{\partial out}{\partial a}\) = ?
backward()를 통해
1. \(a \leftarrow b \leftarrow c \leftarrow out\)을 계산하면
2. \(\frac{\partial out}{\partial a}\)값이 a.grad에 채워짐

import torch

a = torch.ones(2,2)
print(a)

tensor([[1., 1.],
        [1., 1.]])

a = torch.ones(2,2, requires_grad = True)
print(a)

print("a:data:", a.data)
print("a.grad:", a.grad)
print("a.grad_fn", a.grad_fn)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
a:data: tensor([[1., 1.],
        [1., 1.]])
a.grad: None
a.grad_fn None

\(b = a + 2\) 계산

b = a + 2
print(b)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)

\(c = b^2\) 계산

c = b**2
print(c)

tensor([[9., 9.],
        [9., 9.]], grad_fn=<PowBackward0>)

마지막으로 out을 계산

out = c.sum()
print(out)

tensor(36., grad_fn=<SumBackward0>)

역전파를 위해 .backward()진행

print(out)
out.backward()

tensor(36., grad_fn=<SumBackward0>)

미분값들 확인
- a의 `grad_fn이 None인 이유: 직접적으로 계산한 부분이 없었기 때문

a 확인

print("a:data:", a.data)
print("a.grad:", a.grad)  ## 바뀐걸 볼 수 있음음
print("a.grad_fn", a.grad_fn)

a:data: tensor([[1., 1.],
        [1., 1.]])
a.grad: tensor([[6., 6.],
        [6., 6.]])
a.grad_fn None

b 확인

print("b:data:", b.data)
print("b.grad:", b.grad)
print("b.grad_fn", b.grad_fn)

b:data: tensor([[3., 3.],
        [3., 3.]])
b.grad: None
b.grad_fn <AddBackward0 object at 0x7f6f4366fb80>

c 확인

print("c:data:", c.data)
print("c.grad:", c.grad)
print("c.grad_fn", c.grad_fn)

print("c:data:", c.data)
print("c.grad:", c.grad)
print("c.grad_fn", c.grad_fn)

c:data: tensor([[9., 9.],
        [9., 9.]])
c.grad: None
c.grad_fn <PowBackward0 object at 0x7f6f43642e50>

out 확인

print("out:data:", out.data)
print("out.grad:", out.grad)
print("out.grad_fn", out.grad_fn)

out:data: tensor(36.)
out.grad: None
out.grad_fn <SumBackward0 object at 0x7f6f43698ee0>

Autograd 흐름 다시 보기(2)

grad값을 넣어서 backward
아래의 코드에서 .grad값이 None은 gradient값이 필요하지 않기 때문

x = torch.ones(3, requires_grad = True)
y = (x**2)
z = y**2 + x
out = z.sum()
print(out)

tensor(6., grad_fn=<SumBackward0>)

grad = torch.Tensor([0.1,1,100])
z.backward(grad)

print("x:data:", x.data)
print("x.grad:", x.grad)
print("x.grad_fn", x.grad_fn)

x:data: tensor([1., 1., 1.])
x.grad: tensor([  0.5000,   5.0000, 500.0000])
x.grad_fn None

print("y:data:", y.data)
print("y.grad:", y.grad)
print("y.grad_fn", y.grad_fn)

y:data: tensor([1., 1., 1.])
y.grad: None
y.grad_fn <PowBackward0 object at 0x7f6f436b35b0>

print("z:data:", z.data)
print("z.grad:", z.grad)
print("z.grad_fn", z.grad_fn)

z:data: tensor([2., 2., 2.])
z.grad: None
z.grad_fn <AddBackward0 object at 0x7f6f43642c10>

3) nn & nn.functional

두 패키지가 같은 기능이지만 방식이 조금 다름
위의 autograd 관련 작업들을 두 패키지를 통해 진행할 수 있음
텐서를 직접 다룰 때 requires_grad와 같은 방식으로 진행할 수 있음
결론적으로, torch.nn은 attribute를 활용해 state를 저장하고 활용하고,
torch.nn.functional로 구현한 함수의 경우에는 인스턴스화 시킬 필요 없이 사용이 가능

nn패키치

주로 가중치(weights), 편향(bias)값들이 내부에서 자동으로 생성되는 레이어들을 사용할 때
- 따라서, weight값들을 직접 선언 안함
예시
- Containers
- Convolution Layers
- Pooling layers
- Padding Layers
- Non-linear Activations (weighted sum, nonlinearity)
- Non-linear Activations (other)
- Normalization Layers
- Recurrent Layers
- Transformer Layers
- Linear Layers
- Dropout Layers
- Sparse Layers
- Distance Functions
- Loss Functions
- ..
https://pytorch.org/docs/stable/nn.html

import torch
import torch.nn as nn

# torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)
m1 = nn.Conv2d(16, 33, 3, stride = 2) ## in channel수, out channel수, kernel사이즈, stride
m2 = nn.Conv2d(in_channels = 16, out_channels = 33, kernel_size = 3, stride = 2)
m3 = nn.Conv2d(in_channels = 16, out_channels = 33, kernel_size = (3,5), stride = (2,1), padding = (4,2))
m4 = nn.Conv2d(in_channels = 16, out_channels = 33, kernel_size = (3,5), stride = (2,1), padding = (4,2), dilation = (3,1)) # 딜레이션. 간격(3,1)사이즈로 간격을 벌린상태로 진행 -> 다소 spars함

inp = torch.randn(20,16,50,100)

out1 = m1(inp)
out2 = m2(inp)
out3 = m3(inp)
out4 = m4(inp)

print(out1.size())
print(out2.size())
print(out3.size())
print(out4.size())

torch.Size([20, 33, 24, 49])
torch.Size([20, 33, 24, 49])
torch.Size([20, 33, 28, 100])
torch.Size([20, 33, 26, 100])

# With square kernels and equal stride
m = nn.Conv2d(16, 33, 3, stride=2)
# non-square kernels and unequal stride and with padding
m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
# non-square kernels and unequal stride and with padding and dilation
m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
input = torch.randn(20, 16, 50, 100)
output = m(input)

nn.functional 패키지

가중치를 직접 선언하여 인자로 넣어줘야 함.
예시)
- Convolution functions
- Pooling functions
- Non-linear activation functions
- Normalization functions
- Linear functions
- Dropout functions
- Sparse functions
- Distance functions
- Loss functions
- ..
https://pytorch.org/docs/stable/nn.functional.html

import torch
import torch.nn.functional as F

filters = torch.randn(8,4,3,3)

inputs = torch.randn(1,4,5,5)
conv = F.conv2d(inputs, filters, padding = 1)
print(conv.shape) ## filter 통과후 1,4,5,5 -> 1,8,5,5가 됨

torch.Size([1, 8, 5, 5])

Torchvision

transforms: 전처리할 때 사용하는 메소드
transforms에서 제공하는 클래스 이외에 일반적으로 클래스를 따로 만들어 전처리 단계를 진행
- 아래의 코드에서 다양한 전처리 기술 확인
- https://pytorch.org/docs/stable/torchvision/transforms.html

import torch
import torchvision.transforms as transforms

예시)
- DataLoader의 인자로 들어갈 transform을 미리 정의할 수 있음
- Compose를 통해 리스트 안에 순서대로 전처리 진행
- 대표적인 예로, ToTensor()를 하는 이유는 torchvision이 PIL Image형태로만 입력을 받기 때문에 데이터 처리를 위해서 Tensor형으로 변환해야함

transforms = transforms.Compose([transforms.ToTensor(),
                                 transforms.Normalize(mean = (0.5,), std = (0.5,))])

utils.data

Dataset에는 다양한 데이터셋이 존재
- MNIST, CIFAR10, …
DataLoader, Dataset을 통해 batch_size, train여부, transform등을 인자로 넣어 데이터를 어떻게 load할 것인지 정해줄 수 있음

import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define the transformations to apply to the MNIST data
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

# Download and load the MNIST training dataset
trainset = datasets.MNIST('data', 
                          train=True, 
                          download=True, 
                          transform=transform)

# Download and load the MNIST test dataset
testset = datasets.MNIST('data', 
                         train=False, 
                         download=True, 
                         transform=transform)

# Create a data loader for the training set
train_loader = DataLoader(trainset, batch_size=32, shuffle=True, num_workers=2)

# Create a data loader for the test set
test_loader = DataLoader(testset, batch_size=32, shuffle=False, num_workers=2)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz
100%
9912422/9912422 [00:00<00:00, 19825012.40it/s]
Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz
100%
28881/28881 [00:00<00:00, 254300.83it/s]
Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz
100%
1648877/1648877 [00:00<00:00, 5650441.89it/s]
Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz
100%
4542/4542 [00:00<00:00, 36172.35it/s]
Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw

Batch_size만큼 데이터를 하나씩 가져옴

# Load a batch of images and labels from the training loader
images, labels = next(iter(train_loader))

# Print the size of the batch
print(images.size())  # should output torch.Size([32, 1, 28, 28])
                                 # 32는 batch size
                                 # 1은 channel 수
                                 # 28,28은 image의 width와 height임
                                 
print(labels.size())  # should output torch.Size([32])

torch.Size([32, 1, 28, 28])
torch.Size([32])

(중요) torch에서는 channel(채널)이 앞에 옴

channel first
tensorflow, keras 등에서는 channel이 뒤에 옴(channel last)

데이터 확인(Visualization)

import matplotlib.pyplot as plt
plt.style.use('seaborn-white')

torch_image = torch.squeeze(images[0]) # squeeze로 차원하나 없앰
print(torch_image.shape) # torch.Size([28, 28])

image = torch_image.numpy()
print(image.shape) # (28, 28)

label = labels[0].numpy()
print(label.shape) # ()

print(label) # array(1)

plt.title(label)
plt.imshow(image, 'gray')
plt.show()

torch.Size([28, 28])
(28, 28)
()
array(1)

4) 여러 가지 Layer 구현하기

먼저 여러 가지 패키지들을 import해 줘야 한다.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

nn.Conv2d

in_channels: channel의 갯수
out_channels: 출력 채널의 갯수
kernel_size: 커널(필터) 사이즈
텐서플로우, 케라스와 다르게 레이어의 input인자에도 값을 집어 넣어줘야함

nn.Conv2d(in_channels = 1, out_channels = 20, kernel_size = 5, stride = 1)

Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))

layer = nn.Conv2d(1,20,5,1).to(torch.device('cpu'))
print(layer)

Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))

Weight 확인

weight = layer.weight
print(weight.shape)

torch.Size([20, 1, 5, 5])

weight는 detach()를 통해 꺼내줘야 numpy()변환이 가능

weight = weight.detach()

weight = weight.numpy()
print(weight.shape)

(20, 1, 5, 5)

Weight를 하나 뽑아서 시각화

plt.imshow(weight[0,0,:,:], 'jet')
plt.colorbar()
plt.show()

print(images.shape)

input_image = torch.unsqueeze(images[0], dim = 0)
output_data = layer(input_image)
output = output_data.data
output_arr = output.numpy()
print(output_arr.shape)

torch.Size([32, 1, 28, 28])
(1, 20, 24, 24)

plt.figure(figsize = (15,30))
plt.subplot(131)
plt.title("Input")
plt.imshow(image, 'gray')

plt.subplot(132)
plt.title('Weight')
plt.imshow(weight[0,0,:,:], 'jet')

plt.subplot(133)
plt.title("Output")
plt.imshow(output_arr[0,0,:,:], 'gray')

Pooling

F.max_pool2d
- stride
- kernel_size
torch.nn.MaxPool2d 도 많이 사용

print(image.shape)

pool = F.max_pool2d(output, 2, 2)
print(pool.shape)

pool_arr = pool.numpy()
print(pool_arr.shape)

plt.figure(figsize =(10,15))

plt.subplot(121)
plt.title('Input')
plt.imshow(image, 'gray')
plt.show

plt.subplot(122)
plt.title('Output')
plt.imshow(pool_arr[0,0,:,:], 'gray')
plt.show

(28, 28) # image.shape
torch.Size([1, 20, 12, 12]) # pool.shape
(1, 20, 12, 12) # pool_arr.shape

Linear Layer

1d만 가능, .view()를 통해 1D로 펼쳐줘야함

image = torch.from_numpy(image)
print(image.shape)

flatten = image.view(1,28*28)
print(flatten.shape)

lin = nn.Linear(784,10)(flatten)
print(lin.shape)

print(lin)

plt.imshow(lin.detach().numpy(), 'jet')
plt.colorbar()
plt.show()

torch.Size([1, 784]) # image.shape
torch.Size([1, 10]) # lin.shape
tensor([[-0.0548, -0.7653,  0.5531, -0.4567,  0.6094, -0.4746, -0.6054, -0.1089,
          0.2765, -0.4428]], grad_fn=<AddmmBackward0>) # lin

Softmax

with torch.no_grad():
    flatten = image.view(1,28*28)
    lin = nn.Linear(784,10)(flatten)
    softmax = F.softmax(lin, dim = 1)
    
print(softmax)  
print(np.sum(softmax.numpy()))

tensor([[0.0602, 0.1017, 0.0979, 0.0664, 0.1015, 0.0915, 0.1041, 0.1077, 0.1192,
         0.1498]])
         
1.0000001         

ReLU

ReLU 함수를 적용하는 레이어
nn.ReLU로도 사용 가능

inputs = torch.randn(4,3,28,28).to(device)
print(inputs.shape)

layer = nn.Conv2d(3,20,5,1).to(device)
output = F.relu(layer(inputs))
print(output.shape)

torch.Size([4, 3, 28, 28])
torch.Size([4, 20, 24, 24])

Reference

pytorch 튜토리얼

Meaningful