Stanford에서 강의하는 CS231n의 Assignment 2: Dropout 을 정리한 글입니다.
Inline Question 1
What happens if we do not divide the values being passed through inverse dropout by p in the dropout layer? Why does that happen?
Your Answer : 드롭아웃 값을 p로 나누지 않으면 학습시 출력 값의 평균이 감소하게 됩니다. 따라서, 이를 보정하기 위해 드롭아웃 값에 p를 곱해줍니다.
Inline Question 2
Compare the validation and training accuracies with and without dropout -- what do your results suggest about dropout as a regularizer?
Your Answer :
Train Accuracy: Dropout을 적용하지 않은 모델의 정확도가 더 빠르게 수렴합니다. 이는 학습이 너무 쉬워 학습 데이터에 과적합 될 수 있음을 시사합니다.
Validation Accuracy: Dropout을 적용한 모델의 Validation Accuracy가 더 높습니다. 이는 학습 단계에서 Dropout을 사용해 학습 난이도를 높혀 학습 데이터에 과적합되지 않도록 조절했기 때문에 데이터의 일반적인 특징을 더 잘 학습할 수 있었기 때문입니다.
Code
먼저, dropout 의 forward 함수를 구현합니다. x와 같은 shape의 난수(0~1)를 생성한 뒤, 하이퍼 파라미터인 p(probability) 보다 작은 값은 0, 나머지를 1로 하는 마스크를 생성한 뒤, 입력과 element-wise 하게 곱해줍니다. 이후, dropout을 적용하지 않았을 때와 입력의 평균을 맞춰주기 위해 p를 곱해줍니다. test시에는 dropout을 적용하지 않습니다.
def dropout_forward(x, dropout_param):
"""
Performs the forward pass for (inverted) dropout.
Inputs:
- x: Input data, of any shape
- dropout_param: A dictionary with the following keys:
- p: Dropout parameter. We keep each neuron output with probability p.
- mode: 'test' or 'train'. If the mode is train, then perform dropout;
if the mode is test, then just return the input.
- seed: Seed for the random number generator. Passing seed makes this
function deterministic, which is needed for gradient checking but not
in real networks.
Outputs:
- out: Array of the same shape as x.
- cache: tuple (dropout_param, mask). In training mode, mask is the dropout
mask that was used to multiply the input; in test mode, mask is None.
NOTE: Please implement **inverted** dropout, not the vanilla version of dropout.
See http://cs231n.github.io/neural-networks-2/#reg for more details.
NOTE 2: Keep in mind that p is the probability of **keep** a neuron
output; this might be contrary to some sources, where it is referred to
as the probability of dropping a neuron output.
"""
p, mode = dropout_param["p"], dropout_param["mode"]
if "seed" in dropout_param:
np.random.seed(dropout_param["seed"])
mask = None
out = None
if mode == "train":
#######################################################################
# TODO: Implement training phase forward pass for inverted dropout. #
# Store the dropout mask in the mask variable. #
#######################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
mask = (np.random.random_sample(x.shape) < p) / p
out = mask * x
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
#######################################################################
# END OF YOUR CODE #
#######################################################################
elif mode == "test":
#######################################################################
# TODO: Implement the test phase forward pass for inverted dropout. #
#######################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
out = x
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
#######################################################################
# END OF YOUR CODE #
#######################################################################
cache = (dropout_param, mask)
out = out.astype(x.dtype, copy=False)
return out, cache
backward는 다음과 같습니다. dropout을 통해 제거된 입력 값은 backpropagation 과정에서 제외합니다. Test시에는 dropout을 사용하지 않으니, upstream gradient가 그대로 pass 합니다.
def dropout_backward(dout, cache):
"""
Perform the backward pass for (inverted) dropout.
Inputs:
- dout: Upstream derivatives, of any shape
- cache: (dropout_param, mask) from dropout_forward.
"""
dropout_param, mask = cache
mode = dropout_param["mode"]
dx = None
if mode == "train":
#######################################################################
# TODO: Implement training phase backward pass for inverted dropout #
#######################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
dx = dout * mask
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
#######################################################################
# END OF YOUR CODE #
#######################################################################
elif mode == "test":
dx = dout
return dx
'Stanford CS231n' 카테고리의 다른 글
| CS231n Assignment 3(1) : RNN_Captioning (1) | 2024.12.18 |
|---|---|
| CS231n Assignment2(4) : Convolutional Neural Networks (0) | 2024.12.18 |
| CS231n Assignment 2(2) : BatchNormalization (0) | 2024.12.18 |
| CS231n Assignment 2(1) : Multi-Layer Fully Connected Neural Networks (1) | 2024.12.17 |
| CS231n Assignment 1(3) : Implement a Softmax classifier (0) | 2024.12.08 |