CS231n Assignment 2(3) : Dropout

Stanford에서 강의하는 CS231n의 Assignment 2: Dropout 을 정리한 글입니다.

Inline Question 1

What happens if we do not divide the values being passed through inverse dropout by p in the dropout layer? Why does that happen?

Your Answer : 드롭아웃 값을 p로 나누지 않으면 학습시 출력 값의 평균이 감소하게 됩니다. 따라서, 이를 보정하기 위해 드롭아웃 값에 p를 곱해줍니다.

Inline Question 2

Compare the validation and training accuracies with and without dropout -- what do your results suggest about dropout as a regularizer?

Your Answer :

Train Accuracy: Dropout을 적용하지 않은 모델의 정확도가 더 빠르게 수렴합니다. 이는 학습이 너무 쉬워 학습 데이터에 과적합 될 수 있음을 시사합니다.

Validation Accuracy: Dropout을 적용한 모델의 Validation Accuracy가 더 높습니다. 이는 학습 단계에서 Dropout을 사용해 학습 난이도를 높혀 학습 데이터에 과적합되지 않도록 조절했기 때문에 데이터의 일반적인 특징을 더 잘 학습할 수 있었기 때문입니다.

Code

먼저, dropout 의 forward 함수를 구현합니다. x와 같은 shape의 난수(0~1)를 생성한 뒤, 하이퍼 파라미터인 p(probability) 보다 작은 값은 0, 나머지를 1로 하는 마스크를 생성한 뒤, 입력과 element-wise 하게 곱해줍니다. 이후, dropout을 적용하지 않았을 때와 입력의 평균을 맞춰주기 위해 p를 곱해줍니다. test시에는 dropout을 적용하지 않습니다.

def dropout_forward(x, dropout_param):
    """
    Performs the forward pass for (inverted) dropout.

    Inputs:
    - x: Input data, of any shape
    - dropout_param: A dictionary with the following keys:
      - p: Dropout parameter. We keep each neuron output with probability p.
      - mode: 'test' or 'train'. If the mode is train, then perform dropout;
        if the mode is test, then just return the input.
      - seed: Seed for the random number generator. Passing seed makes this
        function deterministic, which is needed for gradient checking but not
        in real networks.

    Outputs:
    - out: Array of the same shape as x.
    - cache: tuple (dropout_param, mask). In training mode, mask is the dropout
      mask that was used to multiply the input; in test mode, mask is None.

    NOTE: Please implement **inverted** dropout, not the vanilla version of dropout.
    See http://cs231n.github.io/neural-networks-2/#reg for more details.

    NOTE 2: Keep in mind that p is the probability of **keep** a neuron
    output; this might be contrary to some sources, where it is referred to
    as the probability of dropping a neuron output.
    """
    p, mode = dropout_param["p"], dropout_param["mode"]
    if "seed" in dropout_param:
        np.random.seed(dropout_param["seed"])

    mask = None
    out = None

    if mode == "train":
        #######################################################################
        # TODO: Implement training phase forward pass for inverted dropout.   #
        # Store the dropout mask in the mask variable.                        #
        #######################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        mask = (np.random.random_sample(x.shape) < p) / p
        out = mask * x

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        #######################################################################
        #                           END OF YOUR CODE                          #
        #######################################################################
    elif mode == "test":
        #######################################################################
        # TODO: Implement the test phase forward pass for inverted dropout.   #
        #######################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        out = x

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        #######################################################################
        #                            END OF YOUR CODE                         #
        #######################################################################

    cache = (dropout_param, mask)
    out = out.astype(x.dtype, copy=False)

    return out, cache

backward는 다음과 같습니다. dropout을 통해 제거된 입력 값은 backpropagation 과정에서 제외합니다. Test시에는 dropout을 사용하지 않으니, upstream gradient가 그대로 pass 합니다.

def dropout_backward(dout, cache):
    """
    Perform the backward pass for (inverted) dropout.

    Inputs:
    - dout: Upstream derivatives, of any shape
    - cache: (dropout_param, mask) from dropout_forward.
    """
    dropout_param, mask = cache
    mode = dropout_param["mode"]

    dx = None
    if mode == "train":
        #######################################################################
        # TODO: Implement training phase backward pass for inverted dropout   #
        #######################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        dx = dout * mask

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        #######################################################################
        #                          END OF YOUR CODE                           #
        #######################################################################
    elif mode == "test":
        dx = dout
    return dx

'Stanford CS231n' 카테고리의 다른 글

CS231n Assignment 3(1) : RNN_Captioning (1)	2024.12.18
CS231n Assignment2(4) : Convolutional Neural Networks (0)	2024.12.18
CS231n Assignment 2(2) : BatchNormalization (0)	2024.12.18
CS231n Assignment 2(1) : Multi-Layer Fully Connected Neural Networks (1)	2024.12.17
CS231n Assignment 1(3) : Implement a Softmax classifier (0)	2024.12.08

Inline Question 1

Inline Question 2

Code

'Stanford CS231n' 카테고리의 다른 글

티스토리툴바