CS231n Assignment2(4) : Convolutional Neural Networks

Code

Convolution forward를 naive 하게 구현하는 함수입니다.

Filter의 개수만큼 반복하며, activation map을 2중 포문으로 돌며 값을 하나씩 채워넣습니다. 이를 통해 convolution layer의 계산량을 구할 수 있습니다. (= Filter Size(H*W) x number of Filter x Output Size)

def conv_forward_naive(x, w, b, conv_param):
    """
    A naive implementation of the forward pass for a convolutional layer.

    The input consists of N data points, each with C channels, height H and
    width W. We convolve each input with F different filters, where each filter
    spans all C channels and has height HH and width WW.

    Input:
    - x: Input data of shape (N, C, H, W)
    - w: Filter weights of shape (F, C, HH, WW)
    - b: Biases, of shape (F,)
    - conv_param: A dictionary with the following keys:
      - 'stride': The number of pixels between adjacent receptive fields in the
        horizontal and vertical directions.
      - 'pad': The number of pixels that will be used to zero-pad the input.


    During padding, 'pad' zeros should be placed symmetrically (i.e equally on both sides)
    along the height and width axes of the input. Be careful not to modfiy the original
    input x directly.

    Returns a tuple of:
    - out: Output data, of shape (N, F, H', W') where H' and W' are given by
      H' = 1 + (H + 2 * pad - HH) / stride
      W' = 1 + (W + 2 * pad - WW) / stride
    - cache: (x, w, b, conv_param)
    """
    out = None
    ###########################################################################
    # TODO: Implement the convolutional forward pass.                         #
    # Hint: you can use the function np.pad for padding.                      #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    N, C, H, W = x.shape
    F, C, HH, WW = w.shape
    stride = conv_param['stride']
    pad = conv_param['pad']

    out_h = 1 + (H + 2 * pad - HH) // stride
    out_w = 1 + (W + 2 * pad - WW) // stride
    
    # Padding
    x_pad = np.pad(x, [(0,0), (0,0), (pad, pad), (pad, pad)])
    out = np.zeros(shape=(N ,F ,out_h, out_w))

    # Sliding the filter
    for sample in range(N):
      for fil in range(F):
        for h in range(out_h):
          start_h = stride * h
          end_h = stride * h + HH
          for wi in range(out_w):
            start_w = stride * wi
            end_w = stride * wi + WW
            x_conv = x_pad[sample,:,start_h:end_h,start_w:end_w]
            out[sample,fil,h,wi] = np.sum(x_conv * w[fil]) + b[fil]

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    cache = (x, w, b, conv_param)
    return out, cache

convolution backward를 naive 하게 구현하는 함수입니다. dx는 upstream gradient인 dout과 local gradient인 w를 곱해서 계산합니다. dw는 dout과 x를 곱해서 계산합니다. 마지막으로, dx_pad(상하좌우로 1칸씩 패딩해준것) 를 적절히 indexing하여 dx를 계산합니다.

def conv_backward_naive(dout, cache):
    """
    A naive implementation of the backward pass for a convolutional layer.

    Inputs:
    - dout: Upstream derivatives.
    - cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive

    Returns a tuple of:
    - dx: Gradient with respect to x
    - dw: Gradient with respect to w
    - db: Gradient with respect to b
    """
    dx, dw, db = None, None, None
    ###########################################################################
    # TODO: Implement the convolutional backward pass.                        #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    x, w, _, conv_param = cache
    stride = conv_param['stride']
    pad = conv_param['pad']
    _, _, H, W = x.shape
    _, _, HH, WW = w.shape
    N, F, out_h, out_w = dout.shape

    x_pad = np.pad(x, [(0,0), (0,0), (pad, pad), (pad,pad)])
    dw = np.zeros(shape = w.shape)
    dx = np.zeros(shape = x.shape)
    dx_pad = np.pad(dx, [(0,0), (0,0), (pad,pad), (pad,pad)])

    db = np.sum(dout, axis=(0,2,3))
    for sample in range(N):
      for fil in range(F):
        for h in range(out_h):
          start_h = stride * h
          end_h = stride * h + HH
          for wi in range(out_w):
            start_w = stride * wi
            end_w = stride * wi + WW
            x_conv = x_pad[sample, :, start_h:end_h, start_w:end_w]
            dx_pad[sample, :, start_h:end_h, start_w:end_w] += dout[sample, fil, h, wi] * w[fil]
            dw[fil] += dout[sample, fil, h, wi] * x_conv

    dx = dx_pad[:, :, 1:H+1, 1:W+1]

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return dx, dw, db

다음은 max pooling의 forward 함수입니다. 입력 (N, C, H, W) 를 (N, C, H/pool_h, W/pool_w) 로 downsample 하기 위한 함수입니다. (N, C)를 반복하며 이미지에 접근한 뒤, 적절한 pooling 영역을 지나며 max함수로 최대값을 추출해 out matrix에 채워넣습니다. output size는 convolution output size의 계산과 유사합니다.

def max_pool_forward_naive(x, pool_param):
    """
    A naive implementation of the forward pass for a max-pooling layer.

    Inputs:
    - x: Input data, of shape (N, C, H, W)
    - pool_param: dictionary with the following keys:
      - 'pool_height': The height of each pooling region
      - 'pool_width': The width of each pooling region
      - 'stride': The distance between adjacent pooling regions

    No padding is necessary here, eg you can assume:
      - (H - pool_height) % stride == 0
      - (W - pool_width) % stride == 0

    Returns a tuple of:
    - out: Output data, of shape (N, C, H', W') where H' and W' are given by
      H' = 1 + (H - pool_height) / stride
      W' = 1 + (W - pool_width) / stride
    - cache: (x, pool_param)
    """
    out = None
    ###########################################################################
    # TODO: Implement the max-pooling forward pass                            #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    pool_height = pool_param['pool_height']
    pool_width = pool_param['pool_width']
    stride = pool_param['stride']

    N, C, H, W = x.shape
    out_h = 1 + (H-pool_height) // stride
    out_w = 1 + (W-pool_width) // stride

    out = np.zeros(shape = (N, C, out_h, out_w))

    for sample in range(N):
      for channel in range(C):
        for h in range(out_h):
          start_h = stride * h
          end_h = stride * h + pool_height
          for w in range(out_w):
            start_w = stride * w
            end_w = stride * w + pool_width

            x_pool = x[sample, channel, start_h:end_h, start_w:end_w]
            out[sample, channel, h, w] = np.max(x_pool)

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    cache = (x, pool_param)
    return out, cache

다음은 max pooling의 backward 함수입니다. pooling layer에는 가중치가 없습니다. 따라서, dx 만 구하면 됩니다.

PxP 크기의 Max pooling을 하면 output은 1개의 값입니다. 즉, PxP - 1 개의 영역은 gradient가 0이고, output으로 나온 1개의 값에 upstream gradient를 그대로 흘려주면 됩니다.

def max_pool_backward_naive(dout, cache):
    """
    A naive implementation of the backward pass for a max-pooling layer.

    Inputs:
    - dout: Upstream derivatives
    - cache: A tuple of (x, pool_param) as in the forward pass.

    Returns:
    - dx: Gradient with respect to x
    """
    dx = None
    ###########################################################################
    # TODO: Implement the max-pooling backward pass                           #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    x, pool_param = cache

    pool_height = pool_param['pool_height']
    pool_width = pool_param['pool_width']
    stride = pool_param['stride']

    N, C, _, _ = x.shape
    _, _, out_h, out_w = dout.shape

    dx = np.zeros(shape=x.shape)

    for sample in range(N):
      for channel in range(C):
        for h in range(out_h):
          start_h = stride * h
          end_h = stride * h + pool_height
          for w in range(out_w):
            start_w = stride * w
            end_w = stride * w + pool_width

            x_local = x[sample, channel, start_h:end_h, start_w:end_w]
            x_bool = x_local >= np.max(x_local)

            dx[sample, channel, start_h:end_h, start_w:end_w] += dout[sample, channel, h, w] * x_bool

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return dx

다음은 이미지 데이터를 batch normalization 하는 spatial batchnorm 함수입니다. 이 함수의 입력은 (N, C, H, W) 입니다. 이를 (N`, C) 로 reshape 한 뒤, 기존에 작성해놓은 batchnorm_forward 의 입력으로 사용합니다. batch norm은 배치 내에 같은 C dimension을 갖는 값들 끼리 정규화하는 방법 입니다. 따라서, 채널(C)를 기준으로 나머지 값들을 몰아넣어도 상관 없습니다.

def spatial_batchnorm_forward(x, gamma, beta, bn_param):
    """
    Computes the forward pass for spatial batch normalization.

    Inputs:
    - x: Input data of shape (N, C, H, W)
    - gamma: Scale parameter, of shape (C,)
    - beta: Shift parameter, of shape (C,)
    - bn_param: Dictionary with the following keys:
      - mode: 'train' or 'test'; required
      - eps: Constant for numeric stability
      - momentum: Constant for running mean / variance. momentum=0 means that
        old information is discarded completely at every time step, while
        momentum=1 means that new information is never incorporated. The
        default of momentum=0.9 should work well in most situations.
      - running_mean: Array of shape (D,) giving running mean of features
      - running_var Array of shape (D,) giving running variance of features

    Returns a tuple of:
    - out: Output data, of shape (N, C, H, W)
    - cache: Values needed for the backward pass
    """
    out, cache = None, None

    ###########################################################################
    # TODO: Implement the forward pass for spatial batch normalization.       #
    #                                                                         #
    # HINT: You can implement spatial batch normalization by calling the      #
    # vanilla version of batch normalization you implemented above.           #
    # Your implementation should be very short; ours is less than five lines. #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    N, C, H, W = x.shape
    x_vec = np.transpose(x, (0, 2, 3, 1)).reshape(-1, C)
    out, cache = batchnorm_forward(x_vec, gamma, beta, bn_param)
    out = np.transpose(np.reshape(out, (N,H,W,C)), (0, 3, 1, 2))

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################

    return out, cache

다음은 spatial batchnorm의 backward 함수입니다.

batchnorm layer에서 학습하는 파라미터는 beta와 gamma가 있습니다. 마찬가지로, (-1, C) 로 reshape한 뒤, 기존 작성했던 batchnorm backward 함수를 그대로 사용합니다.

def spatial_batchnorm_backward(dout, cache):
    """
    Computes the backward pass for spatial batch normalization.

    Inputs:
    - dout: Upstream derivatives, of shape (N, C, H, W)
    - cache: Values from the forward pass

    Returns a tuple of:
    - dx: Gradient with respect to inputs, of shape (N, C, H, W)
    - dgamma: Gradient with respect to scale parameter, of shape (C,)
    - dbeta: Gradient with respect to shift parameter, of shape (C,)
    """
    dx, dgamma, dbeta = None, None, None

    ###########################################################################
    # TODO: Implement the backward pass for spatial batch normalization.      #
    #                                                                         #
    # HINT: You can implement spatial batch normalization by calling the      #
    # vanilla version of batch normalization you implemented above.           #
    # Your implementation should be very short; ours is less than five lines. #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    N, C, H, W = dout.shape
    dout_vec = np.transpose(dout, (0, 2, 3, 1)).reshape(-1, C)
    dx, dgamma, dbeta = batchnorm_backward(dout_vec, cache)
    dx = np.transpose(np.reshape(dx, (N,H,W,C)), (0,3,1,2))

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################

    return dx, dgamma, dbeta

다음은 spatial groupnorm의 forward 함수입니다. group norm은 각 샘플 별 C dimension을 group 지어 normalization 하는 방법입니다.

먼저, 입력을 (N, C, H, W) -> (N, G, C//G, H, W)로 reshape 합니다.

그런 다음, (axis=2, 3, 4) 에 대해 평균과 분산을 구합니다. # (N, G, 1, 1, 1)

이후 입력 x를 평균과 분산을 이용해 normalization 시킨 후, 다시 원래 shape으로 되돌립니다.

def spatial_groupnorm_forward(x, gamma, beta, G, gn_param):
    """
    Computes the forward pass for spatial group normalization.
    In contrast to layer normalization, group normalization splits each entry
    in the data into G contiguous pieces, which it then normalizes independently.
    Per feature shifting and scaling are then applied to the data, in a manner identical to that of batch normalization and layer normalization.

    Inputs:
    - x: Input data of shape (N, C, H, W)
    - gamma: Scale parameter, of shape (1, C, 1, 1)
    - beta: Shift parameter, of shape (1, C, 1, 1)
    - G: Integer mumber of groups to split into, should be a divisor of C
    - gn_param: Dictionary with the following keys:
      - eps: Constant for numeric stability

    Returns a tuple of:
    - out: Output data, of shape (N, C, H, W)
    - cache: Values needed for the backward pass
    """
    out, cache = None, None
    eps = gn_param.get("eps", 1e-5)
    ###########################################################################
    # TODO: Implement the forward pass for spatial group normalization.       #
    # This will be extremely similar to the layer norm implementation.        #
    # In particular, think about how you could transform the matrix so that   #
    # the bulk of the code is similar to both train-time batch normalization  #
    # and layer normalization!                                                #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****


    N, C, H, W = x.shape
    x = np.reshape(x, [N, G, C // G, H, W])

    mean = np.mean(x, axis = (2,3,4), keepdims = True)
    var = np.var(x, axis = (2,3,4), keepdims = True)

    x_normal = (x - mean) / np.sqrt(var + eps)
    x_normal = np.reshape(x_normal, [N, C, H, W])
    out = x_normal * gamma + beta

    cache = (x, mean, var, gamma, G)


    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return out, cache

다음은 spatial groupnorm의 backward 함수입니다. 전체적으로 batchnorm, layernorm의 backward와 비슷합니다.

def spatial_groupnorm_backward(dout, cache):
    """
    Computes the backward pass for spatial group normalization.

    Inputs:
    - dout: Upstream derivatives, of shape (N, C, H, W)
    - cache: Values from the forward pass

    Returns a tuple of:
    - dx: Gradient with respect to inputs, of shape (N, C, H, W)
    - dgamma: Gradient with respect to scale parameter, of shape (1, C, 1, 1)
    - dbeta: Gradient with respect to shift parameter, of shape (1, C, 1, 1)
    """
    dx, dgamma, dbeta = None, None, None

    ###########################################################################
    # TODO: Implement the backward pass for spatial group normalization.      #
    # This will be extremely similar to the layer norm implementation.        #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    
    eps = 1e-5
    x, mean, var, gamma, G = cache
    N, C, H, W = dout.shape

    M = x.shape[2] * x.shape[3] * x.shape[4]
    x_normal = (x - mean) / np.sqrt(var + eps)

    dgamma = np.sum(dout * np.reshape(x_normal, dout.shape), axis = (0, 2, 3), keepdims = True)
    dbeta = np.sum(dout, axis = (0, 2, 3), keepdims = True)
    dx_normal = dout * gamma

    dlvar = np.sum(np.reshape(dx_normal, x.shape) * (x - mean) * -0.5 * (var + eps) **-1.5, axis = (2, 3, 4), keepdims = True)

    dlmean = np.sum(np.reshape(dx_normal, x.shape) * -1 / np.sqrt(var + eps), axis = (2, 3, 4), keepdims = True) + dlvar * np.sum(-2 * (x - mean), axis = (2, 3, 4), keepdims = True) / M

    dx = np.reshape(dx_normal, x.shape) * 1 / np.sqrt(var + eps) + dlvar * 2 * (x - mean) / M + dlmean / M
    
    dx = np.reshape(dx, dout.shape)


    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return dx, dgamma, dbeta

'Stanford CS231n' 카테고리의 다른 글

CS231n Assignment 3(2) : Image Captioning with Transformers (0)	2024.12.19
CS231n Assignment 3(1) : RNN_Captioning (1)	2024.12.18
CS231n Assignment 2(3) : Dropout (0)	2024.12.18
CS231n Assignment 2(2) : BatchNormalization (0)	2024.12.18
CS231n Assignment 2(1) : Multi-Layer Fully Connected Neural Networks (1)	2024.12.17

Code

'Stanford CS231n' 카테고리의 다른 글

티스토리툴바