zoukankan      html  css  js  c++  java
  • CS231n assignment2 Q3 Dropout

    Dropout

    see Geoffrey E. Hinton et al, "Improving neural networks by preventing co-adaptation of feature detectors", arXiv 2012

    完成前向传播

    def dropout_forward(x, dropout_param):
        """
        Performs the forward pass for (inverted) dropout.
    
        Inputs:
        - x: Input data, of any shape
        - dropout_param: A dictionary with the following keys:
          - p: Dropout parameter. We keep each neuron output with probability p.
          - mode: 'test' or 'train'. If the mode is train, then perform dropout;
            if the mode is test, then just return the input.
          - seed: Seed for the random number generator. Passing seed makes this
            function deterministic, which is needed for gradient checking but not
            in real networks.
    
        Outputs:
        - out: Array of the same shape as x.
        - cache: tuple (dropout_param, mask). In training mode, mask is the dropout
          mask that was used to multiply the input; in test mode, mask is None.
    
        NOTE: Please implement **inverted** dropout, not the vanilla version of dropout.
        See http://cs231n.github.io/neural-networks-2/#reg for more details.
    
        NOTE 2: Keep in mind that p is the probability of **keep** a neuron
        output; this might be contrary to some sources, where it is referred to
        as the probability of dropping a neuron output.
        """
        p, mode = dropout_param['p'], dropout_param['mode']
        if 'seed' in dropout_param:
            np.random.seed(dropout_param['seed'])
    
        mask = None
        out = None
    
        if mode == 'train':
            #######################################################################
            # TODO: Implement training phase forward pass for inverted dropout.   #
            # Store the dropout mask in the mask variable.                        #
            #######################################################################
            keep_prob = 1 - p
            mask = (np.random.rand(*x.shape) < keep_prob) / keep_prob
            #首先,代码 (np.random.rand(*x.shape),表示根据输入数据矩阵x,亦即经过”激活”后的得分,生成一个相同shape的随机矩阵,其为均匀分布的随机样本[0,1)。然后将其与可被保留神经元的概率 keep_prob 做比较,就可以得到一个随机真值表作为随机失活遮罩(mask)。原始的办法是:由于在训练模式时,我们丢掉了部分的激活值,数值调整 out = mask * x 后造成整体分布的期望值的下降,因此在预测时就需要乘上一个概率 1/keep_prob,才能保持分布的统一。不过,我们用一种叫做inverted dropout的技巧,就是如上面代码所示,直接在训练模式下多除以一个概率 keep_prob,那么在测试模式下就不用做任何操作了,直接让数据通过dropout层即可。
            out = mask * x
            #######################################################################
            #                           END OF YOUR CODE                          #
            #######################################################################
        elif mode == 'test':
            #######################################################################
            # TODO: Implement the test phase forward pass for inverted dropout.   #
            #######################################################################
            out = x
            #######################################################################
            #                            END OF YOUR CODE                         #
            #######################################################################
    
        cache = (dropout_param, mask)
        out = out.astype(x.dtype, copy=False)
    
        return out, cache
    

    Running tests with p = 0.25
    Mean of input: 10.000207878477502
    Mean of train-time output: 9.998198947788465
    Mean of test-time output: 10.000207878477502
    Fraction of train-time output set to zero: 0.250168
    Fraction of test-time output set to zero: 0.0

    Running tests with p = 0.4
    Mean of input: 10.000207878477502
    Mean of train-time output: 9.976910758765856
    Mean of test-time output: 10.000207878477502
    Fraction of train-time output set to zero: 0.401368
    Fraction of test-time output set to zero: 0.0

    Running tests with p = 0.7
    Mean of input: 10.000207878477502
    Mean of train-time output: 9.98254739313744
    Mean of test-time output: 10.000207878477502
    Fraction of train-time output set to zero: 0.700496
    Fraction of test-time output set to zero: 0.0

    完成后向传播

    def dropout_backward(dout, cache):
        """
        Perform the backward pass for (inverted) dropout.
    
        Inputs:
        - dout: Upstream derivatives, of any shape
        - cache: (dropout_param, mask) from dropout_forward.
        """
        dropout_param, mask = cache
        mode = dropout_param['mode']
    
        dx = None
        if mode == 'train':
            #######################################################################
            # TODO: Implement training phase backward pass for inverted dropout   #
            #######################################################################
            dx = mask * dout
            #梯度反向传播时使用同样的 mask将被遮罩的梯度置零。
            #######################################################################
            #                          END OF YOUR CODE                           #
            #######################################################################
        elif mode == 'test':
            dx = dout
        return dx
    

    dx relative error: 5.445612718272284e-11

    带有dropout的全连接网络:

    class FullyConnectedNet(object):
        """
        A fully-connected neural network with an arbitrary number of hidden layers,
        ReLU nonlinearities, and a softmax loss function. This will also implement
        dropout and batch/layer normalization as options. For a network with L layers,
        the architecture will be
    
        {affine - [batch/layer norm] - relu - [dropout]} x (L - 1) - affine - softmax
    
        where batch/layer normalization and dropout are optional, and the {...} block is
        repeated L - 1 times.
    
        Similar to the TwoLayerNet above, learnable parameters are stored in the
        self.params dictionary and will be learned using the Solver class.
        """
    
        def __init__(self, hidden_dims, input_dim=3*32*32, num_classes=10,
                     dropout=1, normalization=None, reg=0.0,
                     weight_scale=1e-2, dtype=np.float32, seed=None):
            """
            Initialize a new FullyConnectedNet.
    
            Inputs:
            - hidden_dims: A list of integers giving the size of each hidden layer.
            - input_dim: An integer giving the size of the input.
            - num_classes: An integer giving the number of classes to classify.
            - dropout: Scalar between 0 and 1 giving dropout strength. If dropout=1 then
              the network should not use dropout at all.
            - normalization: What type of normalization the network should use. Valid values
              are "batchnorm", "layernorm", or None for no normalization (the default).
            - reg: Scalar giving L2 regularization strength.
            - weight_scale: Scalar giving the standard deviation for random
              initialization of the weights.
            - dtype: A numpy datatype object; all computations will be performed using
              this datatype. float32 is faster but less accurate, so you should use
              float64 for numeric gradient checking.
            - seed: If not None, then pass this random seed to the dropout layers. This
              will make the dropout layers deteriminstic so we can gradient check the
              model. 默认无随机种子,若有会传递给dropout层。
            """
            self.normalization = normalization
            self.use_dropout = dropout != 1
            self.reg = reg
            self.num_layers = 1 + len(hidden_dims)
            self.dtype = dtype
            self.params = {}
    
            ############################################################################
            # TODO: Initialize the parameters of the network, storing all values in    #
            # the self.params dictionary. Store weights and biases for the first layer #
            # in W1 and b1; for the second layer use W2 and b2, etc. Weights should be #
            # initialized from a normal distribution centered at 0 with standard       #
            # deviation equal to weight_scale. Biases should be initialized to zero.   #
            #                                                                          #
            # When using batch normalization, store scale and shift parameters for the #
            # first layer in gamma1 and beta1; for the second layer use gamma2 and     #
            # beta2, etc. Scale parameters should be initialized to ones and shift     #
            # parameters should be initialized to zeros.                               #
            ############################################################################
            #初始化所有隐藏层的参数
            in_dim = input_dim #D
            for i,h_dim in enumerate(hidden_dims): #(0,H1)(1,H2)
                self.params['W%d' %(i+1,)] = weight_scale * np.random.randn(in_dim,h_dim)
                self.params['b%d' %(i+1,)] = np.zeros((h_dim,))
                if self.normalization=='batchnorm':
                    self.params['gamma%d' %(i+1,)] = np.ones((h_dim,)) #初始化为1
                    self.params['beta%d' %(i+1,)] = np.zeros((h_dim,)) #初始化为0
                in_dim = h_dim #将该层的列数传递给下一层的行数
                
            #初始化所有输出层的参数
            self.params['W%d' %(self.num_layers,)] = weight_scale * np.random.randn(in_dim,num_classes)
            self.params['b%d' %(self.num_layers,)] = np.zeros((num_classes,))
            ############################################################################
            #                             END OF YOUR CODE                             #
            ############################################################################
    
            #  当开启 dropout 时,我们需要在每一个神经元层中传递一个相同的 dropout 参数字典 self.dropout_param ,以保证每一层的神经元们 都知晓失活概率p和当前神经网络的模式状态mode(训练/测试)。 
            self.dropout_param = {} #dropout的参数字典
            if self.use_dropout:
                self.dropout_param = {'mode': 'train', 'p': dropout}
                if seed is not None:
                    self.dropout_param['seed'] = seed
    
            #  当开启批量归一化时,我们要定义一个BN算法的参数列表 self.bn_params , 以用来跟踪记录每一层的平均值和标准差。其中,第0个元素 self.bn_params[0] 表示前向传播第1个BN层的参数,第1个元素 self.bn_params[1] 表示前向传播 第2个BN层的参数,以此类推。
            self.bn_params = [] #BN的参数字典
            if self.normalization=='batchnorm':
                self.bn_params = [{'mode': 'train'} for i in range(self.num_layers - 1)]
            if self.normalization=='layernorm':
                self.bn_params = [{} for i in range(self.num_layers - 1)]
    
            # Cast all parameters to the correct datatype
            for k, v in self.params.items():
                self.params[k] = v.astype(dtype)
    
    
        def loss(self, X, y=None):
            """
            Compute loss and gradient for the fully-connected net.
    
            Input / output: Same as TwoLayerNet above.
            """
            X = X.astype(self.dtype)
            mode = 'test' if y is None else 'train'
    
            # Set train/test mode for batchnorm params and dropout param since they
            # behave differently during training and testing.
            if self.use_dropout:
                self.dropout_param['mode'] = mode
            if self.normalization=='batchnorm':
                for bn_param in self.bn_params:
                    bn_param['mode'] = mode
            scores = None
            ############################################################################
            # TODO: Implement the forward pass for the fully-connected net, computing  #
            # the class scores for X and storing them in the scores variable.          #
            #                                                                          #
            # When using dropout, you'll need to pass self.dropout_param to each       #
            # dropout forward pass.                                                    #
            #                                                                          #
            # When using batch normalization, you'll need to pass self.bn_params[0] to #
            # the forward pass for the first batch normalization layer, pass           #
            # self.bn_params[1] to the forward pass for the second batch normalization #
            # layer, etc.                                                              #
            ############################################################################
            fc_mix_cache = {} # # 初始化每层前向传播的缓冲字典
            if self.use_dropout: # 如果开启了dropout,初始化其对应的缓冲字典
                dp_cache = {}
            # 从第一个隐藏层开始循环每一个隐藏层,传递数据out,保存每一层的缓冲cache
            out = X
            for i in range(self.num_layers - 1): # 在每个hidden层中循环
                w,b = self.params['W%d' %(i+1,)],self.params['b%d' %(i+1,)]
                if self.normalization == 'batchnorm':
                    gamma = self.params['gamma%d' %(i+1,)]
                    beta = self.params['beta%d' %(i+1,)]
                    out,fc_mix_cache[i] = affine_bn_relu_forward(out,w,b,gamma,beta,self.bn_params[i])
                else:
                    out,fc_mix_cache[i] = affine_relu_forward(out,w,b)
                if self.use_dropout:
                    out,dp_cache[i] = dropout_forward(out,self.dropout_param)
            #最后的输出层
            w = self.params['W%d' %(self.num_layers,)]
            b = self.params['b%d' %(self.num_layers,)]
            out,out_cache = affine_forward(out,w,b)
            scores = out
            ############################################################################
            #                             END OF YOUR CODE                             #
            ############################################################################
    
            # If test mode return early
            if mode == 'test':
                return scores
    
            loss, grads = 0.0, {}
            ############################################################################
            # TODO: Implement the backward pass for the fully-connected net. Store the #
            # loss in the loss variable and gradients in the grads dictionary. Compute #
            # data loss using softmax, and make sure that grads[k] holds the gradients #
            # for self.params[k]. Don't forget to add L2 regularization!               #
            #                                                                          #
            # When using batch/layer normalization, you don't need to regularize the scale   #
            # and shift parameters.                                                    #
            #                                                                          #
            # NOTE: To ensure that your implementation matches ours and you pass the   #
            # automated tests, make sure that your L2 regularization includes a factor #
            # of 0.5 to simplify the expression for the gradient.                      #
            ############################################################################
            loss,dout = softmax_loss(scores,y)
            loss += 0.5 * self.reg * np.sum(self.params['W%d' %(self.num_layers,)] ** 2)
            # 在输出层处梯度的反向传播,顺便把梯度保存在梯度字典 grad 中:
            dout,dw,db = affine_backward(dout,out_cache)
            grads['W%d' %(self.num_layers,)] = dw + self.reg * self.params['W%d' %(self.num_layers,)]
            grads['b%d' %(self.num_layers,)] = db
            # 在每一个隐藏层处梯度的反向传播,不仅顺便更新了梯度字典 grad,还迭代算出了损失值loss
            for i in range(self.num_layers - 1):
                ri = self.num_layers - 2 - i #倒数第ri+1隐藏层
                loss += 0.5 * self.reg * np.sum(self.params['W%d' %(ri+1,)] ** 2) #迭代地补上每层的正则项给loss
                if self.use_dropout:
                    dout = dropout_backward(dout,dp_cache[ri])
                if self.normalization == 'batchnorm':
                    dout,dw,db,dgamma,dbeta = affine_bn_relu_backward(dout,fc_mix_cache[ri])
                    grads['gamma%d' %(ri+1,)] = dgamma
                    grads['beta%d' %(ri+1,)] = dbeta
                else:
                    dout,dw,db = affine_relu_backward(dout,fc_mix_cache[ri])
                grads['W%d' %(ri+1,)] = dw + self.reg * self.params['W%d' %(ri+1,)]
                grads['b%d' %(ri+1,)] = db
            ############################################################################
            #                             END OF YOUR CODE                             #
            ############################################################################
    
            return loss, grads
    

    Running check with dropout = 1
    Initial loss: 2.3004790897684924
    W1 relative error: 1.48e-07
    W2 relative error: 2.21e-05
    W3 relative error: 3.53e-07
    b1 relative error: 5.38e-09
    b2 relative error: 2.09e-09
    b3 relative error: 5.80e-11

    Running check with dropout = 0.75
    Initial loss: 2.2924325088330475
    W1 relative error: 2.74e-08
    W2 relative error: 2.98e-09
    W3 relative error: 4.29e-09
    b1 relative error: 7.78e-10
    b2 relative error: 3.36e-10
    b3 relative error: 1.65e-10

    Running check with dropout = 0.5
    Initial loss: 2.3042759220785896
    W1 relative error: 3.11e-07
    W2 relative error: 1.84e-08
    W3 relative error: 5.35e-08
    b1 relative error: 5.37e-09
    b2 relative error: 2.99e-09
    b3 relative error: 1.13e-10

    dropout可以视为一种正则化手段

    1
    (Iteration 1 / 125) loss: 7.856643
    (Epoch 0 / 25) train acc: 0.260000; val_acc: 0.184000
    (Epoch 1 / 25) train acc: 0.404000; val_acc: 0.259000
    (Epoch 2 / 25) train acc: 0.468000; val_acc: 0.248000
    (Epoch 3 / 25) train acc: 0.526000; val_acc: 0.247000
    (Epoch 4 / 25) train acc: 0.646000; val_acc: 0.273000
    (Epoch 5 / 25) train acc: 0.686000; val_acc: 0.257000
    (Epoch 6 / 25) train acc: 0.690000; val_acc: 0.260000
    (Epoch 7 / 25) train acc: 0.758000; val_acc: 0.255000
    (Epoch 8 / 25) train acc: 0.832000; val_acc: 0.264000
    (Epoch 9 / 25) train acc: 0.856000; val_acc: 0.268000
    (Epoch 10 / 25) train acc: 0.914000; val_acc: 0.289000
    (Epoch 11 / 25) train acc: 0.922000; val_acc: 0.293000
    (Epoch 12 / 25) train acc: 0.948000; val_acc: 0.307000
    (Epoch 13 / 25) train acc: 0.960000; val_acc: 0.313000
    (Epoch 14 / 25) train acc: 0.972000; val_acc: 0.311000
    (Epoch 15 / 25) train acc: 0.964000; val_acc: 0.309000
    (Epoch 16 / 25) train acc: 0.966000; val_acc: 0.295000
    (Epoch 17 / 25) train acc: 0.984000; val_acc: 0.306000
    (Epoch 18 / 25) train acc: 0.988000; val_acc: 0.332000
    (Epoch 19 / 25) train acc: 0.996000; val_acc: 0.318000
    (Epoch 20 / 25) train acc: 0.992000; val_acc: 0.313000
    (Iteration 101 / 125) loss: 0.000961
    (Epoch 21 / 25) train acc: 0.996000; val_acc: 0.311000
    (Epoch 22 / 25) train acc: 0.994000; val_acc: 0.304000
    (Epoch 23 / 25) train acc: 0.998000; val_acc: 0.308000
    (Epoch 24 / 25) train acc: 1.000000; val_acc: 0.316000
    (Epoch 25 / 25) train acc: 0.998000; val_acc: 0.320000
    0.25
    (Iteration 1 / 125) loss: 11.299055
    (Epoch 0 / 25) train acc: 0.234000; val_acc: 0.187000
    (Epoch 1 / 25) train acc: 0.382000; val_acc: 0.228000
    (Epoch 2 / 25) train acc: 0.490000; val_acc: 0.247000
    (Epoch 3 / 25) train acc: 0.534000; val_acc: 0.228000
    (Epoch 4 / 25) train acc: 0.648000; val_acc: 0.298000
    (Epoch 5 / 25) train acc: 0.676000; val_acc: 0.316000
    (Epoch 6 / 25) train acc: 0.752000; val_acc: 0.285000
    (Epoch 7 / 25) train acc: 0.774000; val_acc: 0.252000
    (Epoch 8 / 25) train acc: 0.818000; val_acc: 0.288000
    (Epoch 9 / 25) train acc: 0.844000; val_acc: 0.326000
    (Epoch 10 / 25) train acc: 0.864000; val_acc: 0.311000
    (Epoch 11 / 25) train acc: 0.920000; val_acc: 0.293000
    (Epoch 12 / 25) train acc: 0.922000; val_acc: 0.282000
    (Epoch 13 / 25) train acc: 0.960000; val_acc: 0.303000
    (Epoch 14 / 25) train acc: 0.966000; val_acc: 0.290000
    (Epoch 15 / 25) train acc: 0.948000; val_acc: 0.277000
    (Epoch 16 / 25) train acc: 0.970000; val_acc: 0.324000
    (Epoch 17 / 25) train acc: 0.950000; val_acc: 0.295000
    (Epoch 18 / 25) train acc: 0.970000; val_acc: 0.316000
    (Epoch 19 / 25) train acc: 0.972000; val_acc: 0.296000
    (Epoch 20 / 25) train acc: 0.990000; val_acc: 0.293000
    (Iteration 101 / 125) loss: 0.556808
    (Epoch 21 / 25) train acc: 0.990000; val_acc: 0.303000
    (Epoch 22 / 25) train acc: 0.990000; val_acc: 0.306000
    (Epoch 23 / 25) train acc: 0.992000; val_acc: 0.301000
    (Epoch 24 / 25) train acc: 0.994000; val_acc: 0.303000
    (Epoch 25 / 25) train acc: 0.998000; val_acc: 0.289000

    这张图真的能看出来什么吗。。train上的准确率几乎相同,validation上的准确率也差不多。。

  • 相关阅读:
    第一次作业-准备篇
    个人作业——软件工程实践总结&个人技术博客
    个人技术总结-spring boot编写接口和数据返回
    个人作业——软件评测
    结对第二次作业——某次疫情统计可视化的实现
    结对第一次—疫情统计可视化(原型设计)
    软工实践寒假作业(2/2)
    软工实践寒假作业(1/2)
    个人作业-软件工程实践总结
    Android实现多图选择
  • 原文地址:https://www.cnblogs.com/bernieloveslife/p/10190741.html
Copyright ? 2011-2022 开发猿


http://www.vxiaotou.com