用 Python 从零开始创建神经网络(十八):模型对象(Model Object)

news2024/12/31 21:42:27

模型对象(Model Object)

  • 引言
  • 到目前为止的完整代码:

引言

我们构建了一个可以执行前向传播、反向传播以及精度测量等辅助任务的模型。通过编写相当多的代码并在一些较大的代码块中进行修改,我们实现了这些功能。此时,将模型本身转化为一个对象的做法开始显得更有意义,特别是当我们希望保存和加载这个对象以用于未来的预测任务时。此外,我们还可以利用这个对象减少一些常见代码行,使得与当前代码库的协作更加便捷,同时也更容易构建新的模型。为了完成模型对象的转换,我们将使用我们最近工作的模型,即使用正弦数据的回归模型:

from nnfs.datasets import sine_data

X, y = sine_data()

有了数据之后,我们制作模型类的第一步就是添加我们想要的各层。因此,我们可以通过以下操作来开始我们的模型类:

# Model class
class Model:
    def __init__(self):
        # Create a list of network objects
        self.layers = []
        
    # Add objects to the model
    def add(self, layer):
        self.layers.append(layer)

这样,我们就可以使用模型对象的添加方法来添加图层。仅这一点就能大大提高可读性。让我们添加一些图层:

# Instantiate the model
model = Model()

# Add layers
model.add(Layer_Dense(1, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 1))
model.add(Activation_Linear())

我们现在也可以查询这个模型:

print(model.layers)
>>>
[<__main__.Layer_Dense object at 0x000001D1EB2A2900>, 
<__main__.Activation_ReLU object at 0x000001D1EB2A2180>, 
<__main__.Layer_Dense object at 0x000001D1EB2A3F20>, 
<__main__.Activation_ReLU object at 0x000001D1EB2B9220>, 
<__main__.Layer_Dense object at 0x000001D1EB2BB800>, 
<__main__.Activation_Linear object at 0x000001D1EB2BBA40>]

除了添加层,我们还想为模型设置损失函数和优化器。为此,我们将创建一个名为 set 的方法:

# Set loss and optimizer
def set(self, *, loss, optimizer):
    self.loss = loss
    self.optimizer = optimizer

在参数定义中使用星号(*)表示后续的参数(在本例中是lossoptimizer)为关键字参数。由于这些参数没有默认值,因此它们是必需的关键字参数,也就是说必须通过名称和值的形式传递,从而使代码更加易读。

现在,我们可以将一个调用此方法的语句添加到我们新创建的模型对象中,并传递loss和optimizer对象:

# Create dataset
X, y = sine_data()

# Instantiate the model
model = Model()

# Add layers
model.add(Layer_Dense(1, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 1))
model.add(Activation_Linear())

# Set loss and optimizer objects
model.set(
    loss=Loss_MeanSquaredError(),
    optimizer=Optimizer_Adam(learning_rate=0.005, decay=1e-3),
    )

设置好模型的层、损失函数和优化器后,下一步就是训练了,因此我们要添加一个 train 方法。现在,我们先将其作为一个占位符,不久后再进行填充:

# Train the model
def train(self, X, y, *, epochs=1, print_every=1):
    # Main training loop
    for epoch in range(1, epochs+1):
        # Temporary
        pass

然后,我们可以在模型定义中添加对 train 方法的调用。我们将传递训练数据、epochs 的数量(10000,我们目前使用的是),以及打印训练摘要的频率。我们不需要或不希望每一步都打印,因此我们将对其进行配置:

# Create dataset
X, y = sine_data()

# Instantiate the model
model = Model()

# Add layers
model.add(Layer_Dense(1, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 1))
model.add(Activation_Linear())

# Set loss and optimizer objects
model.set(
    loss=Loss_MeanSquaredError(),
    optimizer=Optimizer_Adam(learning_rate=0.005, decay=1e-3),
    )

model.train(X, y, epochs=10000, print_every=100)

要进行训练,我们需要执行前向传播。在对象中执行前向传播稍微复杂一些,因为我们需要在层的循环中完成此操作,并且需要知道前一层的输出以正确地传递数据。查询前一层的一个问题是,第一层没有“前一层”。我们定义的第一层是第一隐含层。因此,我们的一个选择是创建一个“输入层”。这被认为是神经网络中的一层,但没有与之相关的权重和偏置。输入层仅包含训练数据,我们仅在循环迭代层时将其用作第一层的“前一层”。我们将创建一个新类,并像调用Layer_Dense类一样调用它,称为Layer_Input

# Input "layer"
class Layer_Input:
    # Forward pass
    def forward(self, inputs):
        self.output = inputs

forward方法将训练样本设置为self.output。这一属性与其他层是通用的。这里没有必要实现反向传播方法,因为我们永远不会用到它。现在可能看起来创建这个类有点多余,但希望很快你就会明白我们将如何使用它。接下来,我们要为模型的每一层设置前一层和后一层的属性。我们将在Model类中创建一个名为finalize的方法:

	# Finalize the model
	def finalize(self):
	    # Create and set the input layer
	    self.input_layer = Layer_Input()
	    # Count all the objects
	    layer_count = len(self.layers)
	    # Iterate the objects
	    for i in range(layer_count):
	        # If it's the first layer,
	        # the previous layer object is the input layer
	        if i == 0:
	            self.layers[i].prev = self.input_layer
	            self.layers[i].next = self.layers[i+1]
	        # All layers except for the first and the last
	        elif i < layer_count - 1:
	            self.layers[i].prev = self.layers[i-1]
	            self.layers[i].next = self.layers[i+1]
	        # The last layer - the next object is the loss
	        else:
	            self.layers[i].prev = self.layers[i-1]
	            self.layers[i].next = self.loss

这段代码创建了一个输入层,并为模型对象的self.layers列表中的每一层设置了nextprev引用。我们创建了Layer_Input类,以便在循环中为第一隐藏层设置prev属性,因为我们将以统一的方式调用所有层。对于最后一层,其next层将是我们已经创建的损失函数。

现在,我们已经为模型对象执行前向传播所需的层信息准备就绪,让我们添加一个forward方法。我们将同时在训练时和之后仅进行预测(也称为模型推理)时使用这个forward方法。以下是在Model类中继续添加的代码:

# Forward pass
class Model:
	...
    # Performs forward pass
    def forward(self, X):
        # Call forward method on the input layer
        # this will set the output property that
        # the first layer in "prev" object is expecting
        self.input_layer.forward(X)
        # Call forward method of every object in a chain
        # Pass output of the previous object as a parameter
        for layer in self.layers:
            layer.forward(layer.prev.output)
        # "layer" is now the last object from the list,
        # return its output
        return layer.output

在这种情况下,我们传入输入数据 X X X,然后简单地通过 Model 对象中的 input_layer 处理该数据,这会在该对象中创建一个 output 属性。从这里开始,我们迭代 self.layers 中的层,这些层从第一个隐藏层开始。对于每一层,我们对上一层的输出数据 layer.prev.output 执行前向传播。对于第一个隐藏层,layer.prevself.input_layer。调用每一层的 forward 方法时会创建该层的 output 属性,然后该属性会作为输入传递到下一层的 forward 方法调用中。一旦我们遍历了所有层,就会返回最后一层的输出。

这就是一次前向传播。现在,让我们将这个前向传播方法调用添加到 Model 类的 train 方法中:

# Forward pass
class Model:
	...
    # Train the model
    def train(self, X, y, *, epochs=1, print_every=1):
        # Main training loop
        for epoch in range(1, epochs+1):
            # Perform the forward pass
            output = self.forward(X)
            # Temporary
            print(output)
            sys.exit()

到目前为止的完整Model类:

# Model class
class Model:
    def __init__(self):
        # Create a list of network objects
        self.layers = []
        
    # Add objects to the model
    def add(self, layer):
        self.layers.append(layer)
    
    # Set loss and optimizer
    def set(self, *, loss, optimizer):
        self.loss = loss
        self.optimizer = optimizer
    
    # Train the model
    def train(self, X, y, *, epochs=1, print_every=1):
        # Main training loop
        for epoch in range(1, epochs+1):
            # Perform the forward pass
            output = self.forward(X)
            # Temporary
            print(output)
            sys.exit()

    # Finalize the model
    def finalize(self):
        # Create and set the input layer
        self.input_layer = Layer_Input()
        # Count all the objects
        layer_count = len(self.layers)
        # Iterate the objects
        for i in range(layer_count):
            # If it's the first layer,
            # the previous layer object is the input layer
            if i == 0:
                self.layers[i].prev = self.input_layer
                self.layers[i].next = self.layers[i+1]
            # All layers except for the first and the last
            elif i < layer_count - 1:
                self.layers[i].prev = self.layers[i-1]
                self.layers[i].next = self.layers[i+1]
            # The last layer - the next object is the loss
            else:
                self.layers[i].prev = self.layers[i-1]
                self.layers[i].next = self.loss

    # Performs forward pass
    def forward(self, X):
        # Call forward method on the input layer
        # this will set the output property that
        # the first layer in "prev" object is expecting
        self.input_layer.forward(X)
        # Call forward method of every object in a chain
        # Pass output of the previous object as a parameter
        for layer in self.layers:
            layer.forward(layer.prev.output)
        # "layer" is now the last object from the list,
        # return its output
        return layer.output

最后,我们可以在主代码中添加 finalize 方法调用(请记住,除其他事项外,该方法还能让模型的图层知道它们的上一层和下一层)。

# Create dataset
X, y = sine_data()

# Instantiate the model
model = Model()

# Add layers
model.add(Layer_Dense(1, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 1))
model.add(Activation_Linear())

# Set loss and optimizer objects
model.set(
    loss=Loss_MeanSquaredError(),
    optimizer=Optimizer_Adam(learning_rate=0.005, decay=1e-3),
    )

# Finalize the model
model.finalize()

model.train(X, y, epochs=10000, print_every=100)
>>>
[[ 0.00000000e+00]
[-1.13209149e-08]
[-2.26418297e-08]
...
[-1.12869511e-05]
[-1.12982725e-05]
[-1.13095930e-05]]

此时,我们已经在Model类中覆盖了模型的前向传播。我们仍需要计算损失和准确率,并进行反向传播。在此之前,我们需要知道哪些层是“可训练的”,也就是说这些层具有我们可以调整的权重和偏置。为此,我们需要检查层是否有weightsbiases属性。我们可以通过以下代码进行检查:

			# 如果层包含一个名为“weights”的属性,  
			# 那么它是一个可训练层 -  
			# 将其添加到可训练层列表中  
			# 我们不需要检查偏置 -  
			# 检查权重已经足够了  
			if hasattr(self.layers[i], 'weights'):
				self.trainable_layers.append(self.layers[i])

其中, i i i 是层列表中某一层的索引。我们将把这段代码添加到 finalize 方法中。以下是目前该方法的完整代码:

    # Finalize the model
    def finalize(self):
        # Create and set the input layer
        self.input_layer = Layer_Input()
        # Count all the objects
        layer_count = len(self.layers)
        # Initialize a list containing trainable layers:
        self.trainable_layers = []
        # Iterate the objects
        for i in range(layer_count):
            # If it's the first layer,
            # the previous layer object is the input layer
            if i == 0:
                self.layers[i].prev = self.input_layer
                self.layers[i].next = self.layers[i+1]
            # All layers except for the first and the last
            elif i < layer_count - 1:
                self.layers[i].prev = self.layers[i-1]
                self.layers[i].next = self.layers[i+1]
            # The last layer - the next object is the loss
            # Also let's save aside the reference to the last object
            # whose output is the model's output
            else:
                self.layers[i].prev = self.layers[i-1]
                self.layers[i].next = self.loss
                self.output_layer_activation = self.layers[i]
            
            # 如果层包含一个名为“weights”的属性,  
            # 那么它是一个可训练层 -  
            # 将其添加到可训练层列表中  
            # 我们不需要检查偏置 -  
            # 检查权重已经足够了  
            if hasattr(self.layers[i], 'weights'):
            	self.trainable_layers.append(self.layers[i])

接下来,我们将修改普通 Loss 类,使其包含以下内容:

# Common loss class
class Loss:
	...        
    # Calculates the data and regularization losses
    # given model output and ground truth values
    def calculate(self, output, y):
        # Calculate sample losses
        sample_losses = self.forward(output, y)
        # Calculate mean loss
        data_loss = np.mean(sample_losses)
        # Return the data and regularization losses
        return data_loss, self.regularization_loss()   
        
    # Set/remember trainable layers
    def remember_trainable_layers(self, trainable_layers):
        self.trainable_layers = trainable_layers

commonLoss 类中的 remember_trainable_layers 方法“告知”损失对象哪些是 Model 对象中的可训练层。在单次调用期间,calculate 方法已被修改为还会返回 self.regularization_loss() 的值。regularization_loss 方法目前需要一个层对象,但随着在 remember_trainable_layers 方法中设置了 self.trainable_layers 属性,我们现在可以迭代所有可训练层,以计算整个模型的正则化损失,而不是每次仅针对一个层进行计算:

# Common loss class
class Loss:
	...
	# Regularization loss calculation
    def regularization_loss(self):        
        # 0 by default
        regularization_loss = 0
        # Calculate regularization loss
        # iterate all trainable layers
        for layer in self.trainable_layers:
            # L1 regularization - weights
            # calculate only when factor greater than 0
            if layer.weight_regularizer_l1 > 0:
                regularization_loss += layer.weight_regularizer_l1 * np.sum(np.abs(layer.weights))
            # L2 regularization - weights
            if layer.weight_regularizer_l2 > 0:
                regularization_loss += layer.weight_regularizer_l2 * np.sum(layer.weights * layer.weights)
            # L1 regularization - biases
            # calculate only when factor greater than 0
            if layer.bias_regularizer_l1 > 0:
                regularization_loss += layer.bias_regularizer_l1 * np.sum(np.abs(layer.biases))
            # L2 regularization - biases
            if layer.bias_regularizer_l2 > 0:
                regularization_loss += layer.bias_regularizer_l2 * np.sum(layer.biases * layer.biases)
        return regularization_loss

为了计算准确率,我们需要预测结果。目前,根据模型的类型,预测需要不同的代码。例如,对于 softmax 分类器,我们使用 np.argmax(),但对于回归,由于输出层使用线性激活函数,预测结果直接为输出值。理想情况下,我们需要一个预测方法,该方法能够为我们的模型选择合适的预测方式。为此,我们将在每个激活函数类中添加一个 predictions 方法:

# Softmax activation
class Activation_Softmax:
	...            
    # Calculate predictions for outputs
    def predictions(self, outputs):
        return np.argmax(outputs, axis=1)
# Sigmoid activation
class Activation_Sigmoid:
	...
    # Calculate predictions for outputs
    def predictions(self, outputs):
        return (outputs > 0.5) * 1
# Linear activation
class Activation_Linear:
    ...
    # Calculate predictions for outputs
    def predictions(self, outputs):
        return outputs

predictions 函数内部进行的所有计算与之前章节中针对适当模型所执行的计算相同。尽管我们没有计划将 ReLU 激活函数用于输出层的激活函数,但我们为了完整性仍会在此处包含它:

# ReLU activation
class Activation_ReLU:  
	...
    # Calculate predictions for outputs
    def predictions(self, outputs):
        return outputs

我们仍然需要在 Model 对象中为最终层的激活函数设置一个引用。之后我们可以调用 predictions 方法,该方法将根据输出计算并返回预测值。我们将在 Model 类的 finalize 方法中设置这一引用。

# Model class
class Model:
	...
	# Finalize the model
    def finalize(self):
    	...
			# The last layer - the next object is the loss
            # Also let's save aside the reference to the last object
            # whose output is the model's output
            else:
                self.layers[i].prev = self.layers[i-1]
                self.layers[i].next = self.loss
                self.output_layer_activation = self.layers[i]

就像不同的预测方法一样,我们也需要以不同的方式计算准确率。我们将以类似于特定损失类对象实现的方式来实现这一功能——创建特定的准确率类及其对象,并将它们与模型关联。

首先,我们会编写一个通用的 Accuracy 类,该类目前只包含一个方法 calculate,用于返回根据比较结果计算的准确率。我们已经在代码中添加了对 self.compare 方法的调用,但这个方法目前还不存在。我们将在继承自 Accuracy 类的其他类中创建该方法。现在只需要知道这个方法会返回一个由 TrueFalse 值组成的列表,指示预测是否与真实值匹配。接下来,我们计算这些值的平均值(True 被视为1,False 被视为0),并将其作为准确率返回。代码如下:

# Common accuracy class
class Accuracy:
    # Calculates an accuracy
    # given predictions and ground truth values
    def calculate(self, predictions, y):
        # Get comparison results
        comparisons = self.compare(predictions, y)
        # Calculate an accuracy
        accuracy = np.mean(comparisons)
        # Return accuracy
        return accuracy 

接下来,我们可以使用这个通用的 Accuracy 类,通过继承它并进一步构建针对特定类型模型的功能。通常情况下,每个这些类都会包含两个方法:init(不要与 Python 类的 __init__ 方法混淆)用于从模型对象内部进行初始化,以及 compare 用于执行比较计算。

对于回归模型,init 方法将计算准确率的精度(与我们之前为回归模型编写并在训练循环之前运行的内容相同)。compare 方法将包含我们在训练循环中实际实现的比较代码,使用 self.precision。需要注意的是,初始化时不会重新计算精度,除非通过将 reinit 参数设置为 True 强制重新计算。这种设计允许多种用例,包括独立设置 self.precision、在需要时调用 init(例如,在模型创建过程中从外部调用),甚至多次调用 init(这将在后续某些情况下非常有用):

# Accuracy calculation for regression model
class Accuracy_Regression(Accuracy):
    def __init__(self):
        # Create precision property
        self.precision = None
    # Calculates precision value
    # based on passed in ground truth
    def init(self, y, reinit=False):
        if self.precision is None or reinit:
            self.precision = np.std(y) / 250
    # Compares predictions to the ground truth values
    def compare(self, predictions, y):
        return np.absolute(predictions - y) < self.precision

然后,我们可以通过在 Model 类的 set 方法中,以与当前设置损失函数和优化器相同的方式设置准确率对象。

# Model class
class Model:
	...
	# Set loss, optimizer and accuracy
	def set(self, *, loss, optimizer, accuracy):
		self.loss = loss
		self.optimizer = optimizer
		self.accuracy = accuracy

然后,我们可以在完成前向传播代码之后,将损失和准确率的计算添加到模型中。需要注意的是,我们还在 train 方法的开头通过 self.accuracy.init(y) 初始化准确率,并且可以多次调用,如之前提到的那样。在回归准确率的情况下,这将在第一次调用时进行一次精度计算。以下是实现了损失和准确率计算的 train 方法代码:

# Model class
class Model:
	...
    # Train the model
    def train(self, X, y, *, epochs=1, print_every=1):
        # Initialize accuracy object
        self.accuracy.init(y)
        # Main training loop
        for epoch in range(1, epochs+1):
            # Perform the forward pass
            output = self.forward(X)
            # Calculate loss
            data_loss, regularization_loss = self.loss.calculate(output, y)
            loss = data_loss + regularization_loss
            # Get predictions and calculate an accuracy
            predictions = self.output_layer_activation.predictions(output)
            accuracy = self.accuracy.calculate(predictions, y)

最后,我们将在 finalize 方法中通过调用先前创建的 remember_trainable_layers 方法并传入 Loss 类的对象来实现(self.loss.remember_trainable_layers(self.trainable_layers))。以下是目前为止的完整模型类代码:

# Model class
class Model:
    def __init__(self):
        # Create a list of network objects
        self.layers = []
        
    # Add objects to the model
    def add(self, layer):
        self.layers.append(layer)
    
    # Set loss, optimizer and accuracy
    def set(self, *, loss, optimizer, accuracy):
        self.loss = loss
        self.optimizer = optimizer
        self.accuracy = accuracy
            
    # Finalize the model
    def finalize(self):
        # Create and set the input layer
        self.input_layer = Layer_Input()
        # Count all the objects
        layer_count = len(self.layers)
        # Initialize a list containing trainable layers:
        self.trainable_layers = []
        # Iterate the objects
        for i in range(layer_count):
            # If it's the first layer,
            # the previous layer object is the input layer
            if i == 0:
                self.layers[i].prev = self.input_layer
                self.layers[i].next = self.layers[i+1]
            # All layers except for the first and the last
            elif i < layer_count - 1:
                self.layers[i].prev = self.layers[i-1]
                self.layers[i].next = self.layers[i+1]
            # The last layer - the next object is the loss
            # Also let's save aside the reference to the last object
            # whose output is the model's output
            else:
                self.layers[i].prev = self.layers[i-1]
                self.layers[i].next = self.loss
                self.output_layer_activation = self.layers[i]
            # 如果层包含一个名为“weights”的属性,  
            # 那么它是一个可训练层 -  
            # 将其添加到可训练层列表中  
            # 我们不需要检查偏置 -  
            # 检查权重已经足够了  
            if hasattr(self.layers[i], 'weights'):
            	self.trainable_layers.append(self.layers[i])
	        # Update loss object with trainable layers
	        self.loss.remember_trainable_layers(self.trainable_layers)

    # Train the model
    def train(self, X, y, *, epochs=1, print_every=1):
        # Initialize accuracy object
        self.accuracy.init(y)
        # Main training loop
        for epoch in range(1, epochs+1):
            # Perform the forward pass
            output = self.forward(X)
            # Calculate loss
            data_loss, regularization_loss = self.loss.calculate(output, y)
            loss = data_loss + regularization_loss
            # Get predictions and calculate an accuracy
            predictions = self.output_layer_activation.predictions(output)
            accuracy = self.accuracy.calculate(predictions, y)
            
    # Performs forward pass
    def forward(self, X):
        # Call forward method on the input layer
        # this will set the output property that
        # the first layer in "prev" object is expecting
        self.input_layer.forward(X)
        # Call forward method of every object in a chain
        # Pass output of the previous object as a parameter
        for layer in self.layers:
            layer.forward(layer.prev.output)
        # "layer" is now the last object from the list,
        # return its output
        return layer.output

Loss 类的全部代码:

# Common loss class
class Loss:
    # Regularization loss calculation
    def regularization_loss(self):        
        # 0 by default
        regularization_loss = 0
        # Calculate regularization loss
        # iterate all trainable layers
        for layer in self.trainable_layers:
            # L1 regularization - weights
            # calculate only when factor greater than 0
            if layer.weight_regularizer_l1 > 0:
                regularization_loss += layer.weight_regularizer_l1 * np.sum(np.abs(layer.weights))
            # L2 regularization - weights
            if layer.weight_regularizer_l2 > 0:
                regularization_loss += layer.weight_regularizer_l2 * np.sum(layer.weights * layer.weights)
            # L1 regularization - biases
            # calculate only when factor greater than 0
            if layer.bias_regularizer_l1 > 0:
                regularization_loss += layer.bias_regularizer_l1 * np.sum(np.abs(layer.biases))
            # L2 regularization - biases
            if layer.bias_regularizer_l2 > 0:
                regularization_loss += layer.bias_regularizer_l2 * np.sum(layer.biases * layer.biases)
        return regularization_loss

    # Set/remember trainable layers
    def remember_trainable_layers(self, trainable_layers):
       self.trainable_layers = trainable_layers

    # Calculates the data and regularization losses
    # given model output and ground truth values
    def calculate(self, output, y):
        # Calculate sample losses
        sample_losses = self.forward(output, y)
        # Calculate mean loss
        data_loss = np.mean(sample_losses)
        # Return the data and regularization losses
        return data_loss, self.regularization_loss() 

现在我们已经完成了完整的前向传播并计算了损失和准确率,接下来可以开始反向传播。在 Model 类中的 backward 方法在结构上与 forward 方法类似,只是顺序相反并使用不同的参数。按照之前训练方法中的反向传播,我们需要调用损失对象的 backward 方法来创建 dinputs 属性。接着,我们将按照相反的顺序遍历所有层,调用它们的 backward 方法,并将下一层(正常顺序中的下一层)的 dinputs 属性作为参数传入,从而有效地反向传播由该下一层返回的梯度。请记住,我们已经将损失对象设置为最后一层(输出层)的下一层。

# Model class
class Model:
	...
    # Performs backward pass
    def backward(self, output, y):
        # First call backward method on the loss
        # this will set dinputs property that the last
        # layer will try to access shortly
        self.loss.backward(output, y)
        # Call backward method going through all the objects
        # in reversed order passing dinputs as a parameter
        for layer in reversed(self.layers):
            layer.backward(layer.next.dinputs)

接下来,我们将在 train 方法的末尾调用该 backward 方法:

			# Perform backward pass
			self.backward(output, y)

在完成反向传播之后,最后一个操作是进行优化。之前,我们针对每一个可训练的层多次调用优化器对象的 update_params 方法。现在,我们需要通过遍历可训练层的列表并在循环中调用 update_params() 方法,使这段代码更加通用:

			# Optimize (update parameters)
            self.optimizer.pre_update_params()
            for layer in self.trainable_layers:
                self.optimizer.update_params(layer)
            self.optimizer.post_update_params()

然后我们可以输出有用的信息——此时,train 方法的最后一个参数就派上了用场:

			# Print a summary
            if not epoch % print_every:
                print(f'epoch: {epoch}, ' +
                      f'acc: {accuracy:.3f}, ' +
                      f'loss: {loss:.3f} (' +
                      f'data_loss: {data_loss:.3f}, ' +
                      f'reg_loss: {regularization_loss:.3f}), ' +
                      f'lr: {self.optimizer.current_learning_rate}')
# Model class
class Model:
	...
    # Train the model
    def train(self, X, y, *, epochs=1, print_every=1):
        # Initialize accuracy object
        self.accuracy.init(y)
        # Main training loop
        for epoch in range(1, epochs+1):
            # Perform the forward pass
            output = self.forward(X)
            # Calculate loss
            data_loss, regularization_loss = self.loss.calculate(output, y)
            loss = data_loss + regularization_loss
            # Get predictions and calculate an accuracy
            predictions = self.output_layer_activation.predictions(output)
            accuracy = self.accuracy.calculate(predictions, y)
            # Perform backward pass
            self.backward(output, y)
            # Optimize (update parameters)
            self.optimizer.pre_update_params()
            for layer in self.trainable_layers:
                self.optimizer.update_params(layer)
            self.optimizer.post_update_params()
            # Print a summary
            if not epoch % print_every:
                print(f'epoch: {epoch}, ' +
                      f'acc: {accuracy:.3f}, ' +
                      f'loss: {loss:.3f} (' +
                      f'data_loss: {data_loss:.3f}, ' +
                      f'reg_loss: {regularization_loss:.3f}), ' +
                      f'lr: {self.optimizer.current_learning_rate}')

现在,我们可以将精度类对象传入模型,并测试模型的性能:

>>>
epoch: 100, acc: 0.006, loss: 0.085 (data_loss: 0.085, reg_loss: 0.000), lr: 0.004549590536851684
epoch: 200, acc: 0.032, loss: 0.035 (data_loss: 0.035, reg_loss: 0.000), lr: 0.004170141784820684
...
epoch: 9900, acc: 0.934, loss: 0.000 (data_loss: 0.000, reg_loss: 0.000), lr: 0.00045875768419121016
epoch: 10000, acc: 0.970, loss: 0.000 (data_loss: 0.000, reg_loss: 0.000), lr: 0.00045458678061641964

我们的新模型表现良好,现在我们能够通过 Model 类更轻松地创建新模型。我们需要继续修改这些类,以支持全新的模型。例如,我们尚未处理二元逻辑回归。为此,我们需要添加两点内容。首先,我们需要计算分类准确率:

# Accuracy calculation for classification model
class Accuracy_Categorical(Accuracy):
    # No initialization is needed
    def init(self, y):
        pass
    # Compares predictions to the ground truth values
    def compare(self, predictions, y):
        if len(y.shape) == 2:
            y = np.argmax(y, axis=1)
        return predictions == y

这与分类的准确率计算相同,只是将其封装到一个类中,并增加了一个切换参数。当该类与二元交叉熵模型一起使用时,这个切换参数会禁用将独热编码转换为稀疏标签的操作,因为该模型始终需要真实值是一个二维数组,并且它们未进行独热编码。需要注意的是,这里并未执行任何初始化,但该方法需要存在,因为它将在 Model 类的 train 方法中调用。接下来,我们需要添加的是使用验证数据对模型进行验证的能力。验证只需要执行前向传播并计算损失(仅数据损失)。我们将修改 Loss 类的 calculate 方法,以使其也能够计算验证损失:

# Common loss class
class Loss:
	...
    # Calculates the data and regularization losses
    # given model output and ground truth values
    def calculate(self, output, y, *, include_regularization=False):
        # Calculate sample losses
        sample_losses = self.forward(output, y)
        # Calculate mean loss
        data_loss = np.mean(sample_losses)
        # If just data loss - return it
        if not include_regularization:
            return data_loss
        # Return the data and regularization losses
        return data_loss, self.regularization_loss() 

我们新增了一个参数和条件,以仅返回数据损失,因为在这种情况下不会使用正则化损失。为了运行它,我们将以与训练数据相同的方式传递预测值和目标值。默认情况下,我们不会返回正则化损失,这意味着我们需要更新 train 方法中对该方法的调用,以在训练期间包含正则化损失:

			# Calculate loss
            data_loss, regularization_loss = self.loss.calculate(output, y, include_regularization=True)

然后我们可以将验证代码添加到 Model 类中的 train 方法中。我们向函数添加了 validation_data 参数,该参数接受一个包含验证数据(样本和目标)的元组;添加了一个 if 语句检查是否存在验证数据;如果存在,则执行代码对这些数据进行前向传播,按照与训练期间相同的方式计算损失和准确率,并打印结果:

# Model class
class Model:
	...
	# Train the model
    def train(self, X, y, *, epochs=1, print_every=1, validation_data=None):
		...
        # If there is the validation data
        if validation_data is not None:
            # For better readability
            X_val, y_val = validation_data
            # Perform the forward pass
            output = self.forward(X_val)
            # Calculate the loss
            loss = self.loss.calculate(output, y_val)
            # Get predictions and calculate an accuracy
            predictions = self.output_layer_activation.predictions(output)
            accuracy = self.accuracy.calculate(predictions, y_val)
            # Print a summary
            print(f'validation, ' +
                  f'acc: {accuracy:.3f}, ' +
                  f'loss: {loss:.3f}')

现在我们可以通过以下代码创建测试数据并测试二元逻辑回归模型:

# Create train and test dataset
X, y = spiral_data(samples=100, classes=2)
X_test, y_test = spiral_data(samples=100, classes=2)

# Reshape labels to be a list of lists
# Inner list contains one output (either 0 or 1)
# per each output neuron, 1 in this case
y = y.reshape(-1, 1)
y_test = y_test.reshape(-1, 1)

# Instantiate the model
model = Model()

# Add layers
model.add(Layer_Dense(2, 64, weight_regularizer_l2=5e-4, bias_regularizer_l2=5e-4))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 1))
model.add(Activation_Sigmoid())

# Set loss, optimizer and accuracy objects
model.set(
    loss=Loss_BinaryCrossentropy(),
    optimizer=Optimizer_Adam(decay=5e-7),
    accuracy=Accuracy_Categorical()
    )

# Finalize the model
model.finalize()

# Train the model
model.train(X, y, validation_data=(X_test, y_test), epochs=10000, print_every=100)
>>>
epoch: 100, acc: 0.625, loss: 0.675 (data_loss: 0.674, reg_loss: 0.001), lr: 0.0009999505024501287
epoch: 200, acc: 0.630, loss: 0.669 (data_loss: 0.668, reg_loss: 0.001), lr: 0.0009999005098992651
...
epoch: 9900, acc: 0.905, loss: 0.312 (data_loss: 0.276, reg_loss: 0.037), lr: 0.0009950748768967994
epoch: 10000, acc: 0.905, loss: 0.312 (data_loss: 0.275, reg_loss: 0.036), lr: 0.0009950253706593885
validation, acc: 0.775, loss: 0.423

现在,我们已经简化了前向传播和反向传播代码,包括验证过程,这是重新引入Dropout的好时机。回顾一下,Dropout是一种通过禁用或过滤掉某些神经元来正则化和提高模型泛化能力的方法。如果在我们的模型中使用Dropout,那么在进行验证和推理(预测)时,我们需要确保不使用Dropout。在之前的代码中,通过在验证过程中不调用Dropout的前向传播方法实现了这一点。这里,我们有一个通用方法,用于同时执行训练和验证的前向传播,因此需要一种不同的方法来关闭Dropout——即在训练过程中通知各层,并让它们“决定”是否包括计算。我们要做的第一件事是为所有层和激活函数类的前向传播方法添加一个布尔参数training,因为我们需要以统一的方式调用它们:

	# Forward pass
	def forward(self, inputs, training):

当我们不处于训练模式时,可以在Layer_Dropout类中将输出直接设置为输入,并在不改变输出的情况下从方法中返回:

		# If not in the training mode - return values
		if not training:
			self.output = inputs.copy()
			return

我们在培训时,会让dropout参与进来:

# Dropout
class Layer_Dropout:        
	...
    # Forward pass
    def forward(self, inputs, training):
        # Save input values
        self.inputs = inputs
        # If not in the training mode - return values
        if not training:
            self.output = inputs.copy()
            return
        # Generate and save scaled mask
        self.binary_mask = np.random.binomial(1, self.rate, size=inputs.shape) / self.rate
        # Apply mask to output values
        self.output = inputs * self.binary_mask

接下来,我们修改Model类的forward方法,添加training参数,并调用各层的forward方法以传递该参数的值:

# Model class
class Model:
	...
    # Performs forward pass
    def forward(self, X, training):
        # Call forward method on the input layer
        # this will set the output property that
        # the first layer in "prev" object is expecting
        self.input_layer.forward(X, training)
        # Call forward method of every object in a chain
        # Pass output of the previous object as a parameter
        for layer in self.layers:
            layer.forward(layer.prev.output, training)
        # "layer" is now the last object from the list,
        # return its output
        return layer.output

我们还需要更新Model类中的train方法,因为在调用forward方法时,training参数需要被设置为True

			# Perform the forward pass
			output = self.forward(X, training=True)

然后在验证过程中将其设置为False

			# Perform the forward pass
			output = self.forward(X_val, training=False)
# Model class
class Model:
	...
    # Train the model
    def train(self, X, y, *, epochs=1, print_every=1, validation_data=None):
        # Initialize accuracy object
        self.accuracy.init(y)
        # Main training loop
        for epoch in range(1, epochs+1):
            # Perform the forward pass
            output = self.forward(X, training=True)
            # Calculate loss
            data_loss, regularization_loss = self.loss.calculate(output, y, include_regularization=True)
            loss = data_loss + regularization_loss
            # Get predictions and calculate an accuracy
            predictions = self.output_layer_activation.predictions(output)
            accuracy = self.accuracy.calculate(predictions, y)
            # Perform backward pass
            self.backward(output, y)
            # Optimize (update parameters)
            self.optimizer.pre_update_params()
            for layer in self.trainable_layers:
                self.optimizer.update_params(layer)
            self.optimizer.post_update_params()
            # Print a summary
            if not epoch % print_every:
                print(f'epoch: {epoch}, ' +
                      f'acc: {accuracy:.3f}, ' +
                      f'loss: {loss:.3f} (' +
                      f'data_loss: {data_loss:.3f}, ' +
                      f'reg_loss: {regularization_loss:.3f}), ' +
                      f'lr: {self.optimizer.current_learning_rate}')
        # If there is the validation data
        if validation_data is not None:
            # For better readability
            X_val, y_val = validation_data
            # Perform the forward pass
            output = self.forward(X_val, training=False)
            # Calculate the loss
            loss = self.loss.calculate(output, y_val)
            # Get predictions and calculate an accuracy
            predictions = self.output_layer_activation.predictions(output)
            accuracy = self.accuracy.calculate(predictions, y_val)
            # Print a summary
            print(f'validation, ' +
                  f'acc: {accuracy:.3f}, ' +
                  f'loss: {loss:.3f}')

最后,我们需要处理Model类中结合了Softmax激活和CrossEntropy损失的类。这里的挑战在于,之前我们是为每个模型单独手动定义前向传播和后向传播的。然而,现在我们在计算的两个方向上都有循环,对输出和梯度的计算有统一的方式,以及其他改进。我们不能简单地移除Softmax激活和Categorical Cross-Entropy损失并用一个结合了两者的对象替代它们。按照目前的代码,这种方式是行不通的,因为我们以特定的方式处理输出激活函数和损失函数。

由于结合对象仅优化了后向传播的部分,我们决定让前向传播保持不变,仍然使用单独的Softmax激活和Categorical Cross-Entropy损失对象,只处理后向传播部分。

首先,我们需要自动确定当前模型是否是一个分类器,以及它是否使用了Softmax激活和Categorical Cross-Entropy损失。这可以通过检查最后一层对象的类名(这是一个激活函数对象)以及损失函数对象的类名来实现。我们将在finalize方法的末尾添加此检查:

        # If output activation is Softmax and
        # loss function is Categorical Cross-Entropy
        # create an object of combined activation
        # and loss function containing
        # faster gradient calculation
        if isinstance(self.layers[-1], Activation_Softmax) and isinstance(self.loss, Loss_CategoricalCrossentropy):
            # Create an object of combined activation
            # and loss functions
            self.softmax_classifier_output = Activation_Softmax_Loss_CategoricalCrossentropy()

为了进行此检查,我们使用了 Python 的isinstance函数。如果给定对象是指定类的实例,isinstance函数将返回True。如果两个检查都返回True,我们将设置一个新属性,该属性包含Activation_Softmax_Loss_CategoricalCrossentropy类的对象。

我们还需要在Model类的构造函数中,将此属性初始化为None值:

        # Softmax classifier's output object
        self.softmax_classifier_output = None

最后一步是在反向传播期间检查这个对象是否已设置,如果已设置则使用它。为此,我们需要稍微修改当前的反向传播代码以单独处理这种情况。

首先,我们调用组合对象的backward方法;然后,由于我们不会调用激活函数对象(即层列表中的最后一个对象)的backward方法,因此需要用在激活/损失对象中计算出的梯度来设置该对象的dinputs属性。最后,我们可以对除最后一层以外的所有层进行迭代并执行它们的反向传播操作:

        # If softmax classifier
        if self.softmax_classifier_output is not None:
            # First call backward method
            # on the combined activation/loss
            # this will set dinputs property
            self.softmax_classifier_output.backward(output, y)
            # Since we'll not call backward method of the last layer
            # which is Softmax activation
            # as we used combined activation/loss
            # object, let's set dinputs in this object
        self.layers[-1].dinputs = self.softmax_classifier_output.dinputs
        # Call backward method going through
        # all the objects but last
        # in reversed order passing dinputs as a parameter
        for layer in reversed(self.layers[:-1]):
            layer.backward(layer.next.dinputs)
        return

到目前为止的完整模型类代码如下:

# Model class
class Model:
    def __init__(self):
        # Create a list of network objects
        self.layers = []
        # Softmax classifier's output object
        self.softmax_classifier_output = None
        
    # Add objects to the model
    def add(self, layer):
        self.layers.append(layer)
    
    # Set loss, optimizer and accuracy
    def set(self, *, loss, optimizer, accuracy):
        self.loss = loss
        self.optimizer = optimizer
        self.accuracy = accuracy
            
    # Finalize the model
    def finalize(self):
        # Create and set the input layer
        self.input_layer = Layer_Input()
        # Count all the objects
        layer_count = len(self.layers)
        # Initialize a list containing trainable layers:
        self.trainable_layers = []
        # Iterate the objects
        for i in range(layer_count):
            # If it's the first layer,
            # the previous layer object is the input layer
            if i == 0:
                self.layers[i].prev = self.input_layer
                self.layers[i].next = self.layers[i+1]
            # All layers except for the first and the last
            elif i < layer_count - 1:
                self.layers[i].prev = self.layers[i-1]
                self.layers[i].next = self.layers[i+1]
            # The last layer - the next object is the loss
            # Also let's save aside the reference to the last object
            # whose output is the model's output
            else:
                self.layers[i].prev = self.layers[i-1]
                self.layers[i].next = self.loss
                self.output_layer_activation = self.layers[i]
            # If layer contains an attribute called "weights",
            # it's a trainable layer -
            # add it to the list of trainable layers
            # We don't need to check for biases -
            # checking for weights is enough 
            if hasattr(self.layers[i], 'weights'):
            	self.trainable_layers.append(self.layers[i])
            # Update loss object with trainable layers
            self.loss.remember_trainable_layers(self.trainable_layers)
        # If output activation is Softmax and
        # loss function is Categorical Cross-Entropy
        # create an object of combined activation
        # and loss function containing
        # faster gradient calculation
        if isinstance(self.layers[-1], Activation_Softmax) and isinstance(self.loss, Loss_CategoricalCrossentropy):
            # Create an object of combined activation
            # and loss functions
            self.softmax_classifier_output = Activation_Softmax_Loss_CategoricalCrossentropy()

    # Train the model
    def train(self, X, y, *, epochs=1, print_every=1, validation_data=None):
        # Initialize accuracy object
        self.accuracy.init(y)
        # Main training loop
        for epoch in range(1, epochs+1):
            # Perform the forward pass
            output = self.forward(X, training=True)
            # Calculate loss
            data_loss, regularization_loss = self.loss.calculate(output, y, include_regularization=True)
            loss = data_loss + regularization_loss
            # Get predictions and calculate an accuracy
            predictions = self.output_layer_activation.predictions(output)
            accuracy = self.accuracy.calculate(predictions, y)
            # Perform backward pass
            self.backward(output, y)
            # Optimize (update parameters)
            self.optimizer.pre_update_params()
            for layer in self.trainable_layers:
                self.optimizer.update_params(layer)
            self.optimizer.post_update_params()
            # Print a summary
            if not epoch % print_every:
                print(f'epoch: {epoch}, ' +
                      f'acc: {accuracy:.3f}, ' +
                      f'loss: {loss:.3f} (' +
                      f'data_loss: {data_loss:.3f}, ' +
                      f'reg_loss: {regularization_loss:.3f}), ' +
                      f'lr: {self.optimizer.current_learning_rate}')
        # If there is the validation data
        if validation_data is not None:
            # For better readability
            X_val, y_val = validation_data
            # Perform the forward pass
            output = self.forward(X_val, training=False)
            # Calculate the loss
            loss = self.loss.calculate(output, y_val)
            # Get predictions and calculate an accuracy
            predictions = self.output_layer_activation.predictions(output)
            accuracy = self.accuracy.calculate(predictions, y_val)
            # Print a summary
            print(f'validation, ' +
                  f'acc: {accuracy:.3f}, ' +
                  f'loss: {loss:.3f}')

    # Performs forward pass
    def forward(self, X, training):
        # Call forward method on the input layer
        # this will set the output property that
        # the first layer in "prev" object is expecting
        self.input_layer.forward(X, training)
        # Call forward method of every object in a chain
        # Pass output of the previous object as a parameter
        for layer in self.layers:
            layer.forward(layer.prev.output, training)
        # "layer" is now the last object from the list,
        # return its output
        return layer.output

    # Performs backward pass
    def backward(self, output, y):
        # If softmax classifier
        if self.softmax_classifier_output is not None:
            # First call backward method
            # on the combined activation/loss
            # this will set dinputs property
            self.softmax_classifier_output.backward(output, y)
            # Since we'll not call backward method of the last layer
            # which is Softmax activation
            # as we used combined activation/loss
            # object, let's set dinputs in this object
            self.layers[-1].dinputs = self.softmax_classifier_output.dinputs
            # Call backward method going through
            # all the objects but last
            # in reversed order passing dinputs as a parameter
            for layer in reversed(self.layers[:-1]):
                layer.backward(layer.next.dinputs)
            return
        # First call backward method on the loss
        # this will set dinputs property that the last
        # layer will try to access shortly
        self.loss.backward(output, y)
        # Call backward method going through all the objects
        # in reversed order passing dinputs as a parameter
        for layer in reversed(self.layers):
            layer.backward(layer.next.dinputs)

此外,我们将不再需要Activation_Softmax_Loss_CategoricalCrossentropy类的初始化器和前向传播方法,因此我们可以将它们移除,仅保留反向传播方法:

# Softmax classifier - combined Softmax activation
# and cross-entropy loss for faster backward step
class Activation_Softmax_Loss_CategoricalCrossentropy():  
    ...
    # Backward pass
    def backward(self, dvalues, y_true):
        # Number of samples
        samples = len(dvalues)     
        # Copy so we can safely modify
        self.dinputs = dvalues.copy()
        # Calculate gradient
        self.dinputs[range(samples), y_true] -= 1
        # Normalize gradient
        self.dinputs = self.dinputs / samples

现在我们可以通过使用 Dropout 来测试更新后的 Model 对象:

# Create dataset
X, y = spiral_data(samples=1000, classes=3)
X_test, y_test = spiral_data(samples=100, classes=3)
# Instantiate the model
model = Model()
# Add layers
model.add(Layer_Dense(2, 512, weight_regularizer_l2=5e-4, bias_regularizer_l2=5e-4))
model.add(Activation_ReLU())
model.add(Layer_Dropout(0.1))
model.add(Layer_Dense(512, 3))
model.add(Activation_Softmax())
# Set loss, optimizer and accuracy objects
model.set(
    loss=Loss_CategoricalCrossentropy(),
    optimizer=Optimizer_Adam(learning_rate=0.05, decay=5e-5),
    accuracy=Accuracy_Categorical()
    )
# Finalize the model
model.finalize()
# Train the model
model.train(X, y, validation_data=(X_test, y_test), epochs=10000, print_every=100)
>>>
epoch: 100, acc: 0.716, loss: 0.726 (data_loss: 0.666, reg_loss: 0.060), lr:
0.04975371909050202
epoch: 200, acc: 0.787, loss: 0.615 (data_loss: 0.538, reg_loss: 0.077), lr:
0.049507401356502806
...
epoch: 9900, acc: 0.861, loss: 0.436 (data_loss: 0.389, reg_loss: 0.046),
lr: 0.0334459346466437
epoch: 10000, acc: 0.880, loss: 0.394 (data_loss: 0.347, reg_loss: 0.047),
lr: 0.03333444448148271
validation, acc: 0.867, loss: 0.379

看起来一切都按预期工作。现在有了这个 Model 类,我们可以定义新的模型,而无需重复编写大量代码。重复编写代码不仅令人厌烦,还更容易出现一些难以察觉的小错误。


到目前为止的完整代码:

import numpy as np
import nnfs
from nnfs.datasets import sine_data, spiral_data
import sys

nnfs.init()

# Dense layer
class Layer_Dense:
    # Layer initialization
    def __init__(self, n_inputs, n_neurons,
                 weight_regularizer_l1=0, weight_regularizer_l2=0,
                 bias_regularizer_l1=0, bias_regularizer_l2=0):
        # Initialize weights and biases
        # self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        self.weights = 0.1 * np.random.randn(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))
        # Set regularization strength
        self.weight_regularizer_l1 = weight_regularizer_l1
        self.weight_regularizer_l2 = weight_regularizer_l2
        self.bias_regularizer_l1 = bias_regularizer_l1
        self.bias_regularizer_l2 = bias_regularizer_l2
    
    # Forward pass
    def forward(self, inputs, training):
        # Remember input values
        self.inputs = inputs
        # Calculate output values from inputs, weights and biases
        self.output = np.dot(inputs, self.weights) + self.biases
        
    # Backward pass
    def backward(self, dvalues):
        # Gradients on parameters
        self.dweights = np.dot(self.inputs.T, dvalues)
        self.dbiases = np.sum(dvalues, axis=0, keepdims=True)
        # Gradients on regularization
        # L1 on weights
        if self.weight_regularizer_l1 > 0:
            dL1 = np.ones_like(self.weights)
            dL1[self.weights < 0] = -1
            self.dweights += self.weight_regularizer_l1 * dL1
        # L2 on weights
        if self.weight_regularizer_l2 > 0:
            self.dweights += 2 * self.weight_regularizer_l2 * self.weights
        # L1 on biases
        if self.bias_regularizer_l1 > 0:
            dL1 = np.ones_like(self.biases)
            dL1[self.biases < 0] = -1
            self.dbiases += self.bias_regularizer_l1 * dL1
        # L2 on biases
        if self.bias_regularizer_l2 > 0:
            self.dbiases += 2 * self.bias_regularizer_l2 * self.biases
        # Gradient on values
        self.dinputs = np.dot(dvalues, self.weights.T)
        
        
# Dropout
class Layer_Dropout:        
    # Init
    def __init__(self, rate):
        # Store rate, we invert it as for example for dropout
        # of 0.1 we need success rate of 0.9
        self.rate = 1 - rate
        
    # Forward pass
    def forward(self, inputs, training):
        # Save input values
        self.inputs = inputs
        # If not in the training mode - return values
        if not training:
            self.output = inputs.copy()
            return
        # Generate and save scaled mask
        self.binary_mask = np.random.binomial(1, self.rate, size=inputs.shape) / self.rate
        # Apply mask to output values
        self.output = inputs * self.binary_mask
        
    # Backward pass
    def backward(self, dvalues):
        # Gradient on values
        self.dinputs = dvalues * self.binary_mask
        

# Input "layer"
class Layer_Input:
    # Forward pass
    def forward(self, inputs, training):
        self.output = inputs

        
# ReLU activation
class Activation_ReLU:  
    # Forward pass
    def forward(self, inputs, training):
        # Remember input values
        self.inputs = inputs
        # Calculate output values from inputs
        self.output = np.maximum(0, inputs)
        
    # Backward pass
    def backward(self, dvalues):
        # Since we need to modify original variable,
        # let's make a copy of values first
        self.dinputs = dvalues.copy()
        # Zero gradient where input values were negative
        self.dinputs[self.inputs <= 0] = 0
        
    # Calculate predictions for outputs
    def predictions(self, outputs):
        return outputs
        
        
# Softmax activation
class Activation_Softmax:
    # Forward pass
    def forward(self, inputs, training):
        # Remember input values
        self.inputs = inputs
        # Get unnormalized probabilities
        exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
        # Normalize them for each sample
        probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)
        self.output = probabilities
        
    # Backward pass
    def backward(self, dvalues):
        # Create uninitialized array
        self.dinputs = np.empty_like(dvalues)
        # Enumerate outputs and gradients
        for index, (single_output, single_dvalues) in enumerate(zip(self.output, dvalues)):
            # Flatten output array
            single_output = single_output.reshape(-1, 1)
            # Calculate Jacobian matrix of the output and
            jacobian_matrix = np.diagflat(single_output) - np.dot(single_output, single_output.T)
            # Calculate sample-wise gradient
            # and add it to the array of sample gradients
            self.dinputs[index] = np.dot(jacobian_matrix, single_dvalues)
            
    # Calculate predictions for outputs
    def predictions(self, outputs):
        return np.argmax(outputs, axis=1)
        
      
# Sigmoid activation
class Activation_Sigmoid:
    # Forward pass
    def forward(self, inputs, training):
        # Save input and calculate/save output
        # of the sigmoid function
        self.inputs = inputs
        self.output = 1 / (1 + np.exp(-inputs))
        
    # Backward pass
    def backward(self, dvalues):
        # Derivative - calculates from output of the sigmoid function
        self.dinputs = dvalues * (1 - self.output) * self.output
    
    # Calculate predictions for outputs
    def predictions(self, outputs):
        return (outputs > 0.5) * 1
        
# Linear activation
class Activation_Linear:
    # Forward pass
    def forward(self, inputs, training):
        # Just remember values
        self.inputs = inputs
        self.output = inputs
        
    # Backward pass
    def backward(self, dvalues):
        # derivative is 1, 1 * dvalues = dvalues - the chain rule
        self.dinputs = dvalues.copy()
    
    # Calculate predictions for outputs
    def predictions(self, outputs):
        return outputs
        
        
# SGD optimizer
class Optimizer_SGD:
    # Initialize optimizer - set settings,
    # learning rate of 1. is default for this optimizer
    def __init__(self, learning_rate=1., decay=0., momentum=0.):
        self.learning_rate = learning_rate
        self.current_learning_rate = learning_rate
        self.decay = decay
        self.iterations = 0
        self.momentum = momentum
        
    # Call once before any parameter updates
    def pre_update_params(self):
        if self.decay:
            self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))
    
    # Update parameters
    def update_params(self, layer):
        # If we use momentum
        if self.momentum:
            # If layer does not contain momentum arrays, create them
            # filled with zeros
            if not hasattr(layer, 'weight_momentums'):
                layer.weight_momentums = np.zeros_like(layer.weights)
                # If there is no momentum array for weights
                # The array doesn't exist for biases yet either.
                layer.bias_momentums = np.zeros_like(layer.biases)
            # Build weight updates with momentum - take previous
            # updates multiplied by retain factor and update with
            # current gradients
            weight_updates = self.momentum * layer.weight_momentums - self.current_learning_rate * layer.dweights
            layer.weight_momentums = weight_updates
            
            # Build bias updates
            bias_updates = self.momentum * layer.bias_momentums - self.current_learning_rate * layer.dbiases
            layer.bias_momentums = bias_updates
        # Vanilla SGD updates (as before momentum update)
        else:
            weight_updates = -self.current_learning_rate * layer.dweights
            bias_updates = -self.current_learning_rate * layer.dbiases
        # Update weights and biases using either
        # vanilla or momentum updates
        layer.weights += weight_updates
        layer.biases += bias_updates
                
    # Call once after any parameter updates
    def post_update_params(self):
        self.iterations += 1        


# Adagrad optimizer
class Optimizer_Adagrad:
    # Initialize optimizer - set settings
    def __init__(self, learning_rate=1., decay=0., epsilon=1e-7):
        self.learning_rate = learning_rate
        self.current_learning_rate = learning_rate
        self.decay = decay
        self.iterations = 0
        self.epsilon = epsilon
        
    # Call once before any parameter updates
    def pre_update_params(self):
        if self.decay:
            self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))
    
    # Update parameters
    def update_params(self, layer):
        # If layer does not contain cache arrays,
        # create them filled with zeros
        if not hasattr(layer, 'weight_cache'):
            layer.weight_cache = np.zeros_like(layer.weights)
            layer.bias_cache = np.zeros_like(layer.biases)
        # Update cache with squared current gradients
        layer.weight_cache += layer.dweights**2
        layer.bias_cache += layer.dbiases**2
        # Vanilla SGD parameter update + normalization
        # with square rooted cache
        layer.weights += -self.current_learning_rate * layer.dweights / (np.sqrt(layer.weight_cache) + self.epsilon)
        layer.biases += -self.current_learning_rate * layer.dbiases / (np.sqrt(layer.bias_cache) + self.epsilon)
    
    # Call once after any parameter updates
    def post_update_params(self):
        self.iterations += 1
            
            
# RMSprop optimizer
class Optimizer_RMSprop:            
    # Initialize optimizer - set settings
    def __init__(self, learning_rate=0.001, decay=0., epsilon=1e-7, rho=0.9):
        self.learning_rate = learning_rate
        self.current_learning_rate = learning_rate
        self.decay = decay
        self.iterations = 0
        self.epsilon = epsilon
        self.rho = rho
    
    # Call once before any parameter updates
    def pre_update_params(self):
        if self.decay:
            self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))
    
    # Update parameters
    def update_params(self, layer):
        # If layer does not contain cache arrays,
        # create them filled with zeros
        if not hasattr(layer, 'weight_cache'):
            layer.weight_cache = np.zeros_like(layer.weights)
            layer.bias_cache = np.zeros_like(layer.biases)
        # Update cache with squared current gradients
        layer.weight_cache = self.rho * layer.weight_cache + (1 - self.rho) * layer.dweights**2
        layer.bias_cache = self.rho * layer.bias_cache + (1 - self.rho) * layer.dbiases**2
        
        # Vanilla SGD parameter update + normalization
        # with square rooted cache
        layer.weights += -self.current_learning_rate * layer.dweights / (np.sqrt(layer.weight_cache) + self.epsilon)
        layer.biases += -self.current_learning_rate * layer.dbiases / (np.sqrt(layer.bias_cache) + self.epsilon)
    
    # Call once after any parameter updates
    def post_update_params(self):
        self.iterations += 1
            

# Adam optimizer
class Optimizer_Adam:
    # Initialize optimizer - set settings
    def __init__(self, learning_rate=0.001, decay=0., epsilon=1e-7, beta_1=0.9, beta_2=0.999):
        self.learning_rate = learning_rate
        self.current_learning_rate = learning_rate
        self.decay = decay
        self.iterations = 0
        self.epsilon = epsilon
        self.beta_1 = beta_1
        self.beta_2 = beta_2
    
    # Call once before any parameter updates
    def pre_update_params(self):
        if self.decay:
            self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))        

    # Update parameters
    def update_params(self, layer):
        # If layer does not contain cache arrays,
        # create them filled with zeros
        if not hasattr(layer, 'weight_cache'):
            layer.weight_momentums = np.zeros_like(layer.weights)
            layer.weight_cache = np.zeros_like(layer.weights)
            layer.bias_momentums = np.zeros_like(layer.biases)
            layer.bias_cache = np.zeros_like(layer.biases)
        # Update momentum with current gradients
        layer.weight_momentums = self.beta_1 * layer.weight_momentums + (1 - self.beta_1) * layer.dweights
        layer.bias_momentums = self.beta_1 * layer.bias_momentums + (1 - self.beta_1) * layer.dbiases
        # Get corrected momentum
        # self.iteration is 0 at first pass
        # and we need to start with 1 here
        weight_momentums_corrected = layer.weight_momentums / (1 - self.beta_1 ** (self.iterations + 1))
        bias_momentums_corrected = layer.bias_momentums / (1 - self.beta_1 ** (self.iterations + 1))
        # Update cache with squared current gradients
        layer.weight_cache = self.beta_2 * layer.weight_cache + (1 - self.beta_2) * layer.dweights**2
        layer.bias_cache = self.beta_2 * layer.bias_cache + (1 - self.beta_2) * layer.dbiases**2
        # Get corrected cache
        weight_cache_corrected = layer.weight_cache / (1 - self.beta_2 ** (self.iterations + 1))
        bias_cache_corrected = layer.bias_cache / (1 - self.beta_2 ** (self.iterations + 1))
        # Vanilla SGD parameter update + normalization
        # with square rooted cache
        layer.weights += -self.current_learning_rate * weight_momentums_corrected / (np.sqrt(weight_cache_corrected) + self.epsilon)
        layer.biases += -self.current_learning_rate * bias_momentums_corrected / (np.sqrt(bias_cache_corrected) + self.epsilon)
                    
    # Call once after any parameter updates
    def post_update_params(self):
        self.iterations += 1
            
        
# Common loss class
class Loss:
    # Regularization loss calculation
    def regularization_loss(self):        
        # 0 by default
        regularization_loss = 0
        # Calculate regularization loss
        # iterate all trainable layers
        for layer in self.trainable_layers:
            # L1 regularization - weights
            # calculate only when factor greater than 0
            if layer.weight_regularizer_l1 > 0:
                regularization_loss += layer.weight_regularizer_l1 * np.sum(np.abs(layer.weights))
            # L2 regularization - weights
            if layer.weight_regularizer_l2 > 0:
                regularization_loss += layer.weight_regularizer_l2 * np.sum(layer.weights * layer.weights)
            # L1 regularization - biases
            # calculate only when factor greater than 0
            if layer.bias_regularizer_l1 > 0:
                regularization_loss += layer.bias_regularizer_l1 * np.sum(np.abs(layer.biases))
            # L2 regularization - biases
            if layer.bias_regularizer_l2 > 0:
                regularization_loss += layer.bias_regularizer_l2 * np.sum(layer.biases * layer.biases)
        return regularization_loss

    # Set/remember trainable layers
    def remember_trainable_layers(self, trainable_layers):
       self.trainable_layers = trainable_layers

    # Calculates the data and regularization losses
    # given model output and ground truth values
    def calculate(self, output, y, *, include_regularization=False):
        # Calculate sample losses
        sample_losses = self.forward(output, y)
        # Calculate mean loss
        data_loss = np.mean(sample_losses)
        # If just data loss - return it
        if not include_regularization:
            return data_loss
        # Return the data and regularization losses
        return data_loss, self.regularization_loss()   
    

# Cross-entropy loss
class Loss_CategoricalCrossentropy(Loss):
    # Forward pass
    def forward(self, y_pred, y_true):
        # Number of samples in a batch
        samples = len(y_pred)
        # Clip data to prevent division by 0
        # Clip both sides to not drag mean towards any value
        y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)
        # Probabilities for target values -
        # only if categorical labels
        if len(y_true.shape) == 1:
            correct_confidences = y_pred_clipped[
                range(samples),
                y_true
            ]
        # Mask values - only for one-hot encoded labels
        elif len(y_true.shape) == 2:
            correct_confidences = np.sum(y_pred_clipped * y_true, axis=1)
        # Losses
        negative_log_likelihoods = -np.log(correct_confidences)
        return negative_log_likelihoods
    
    # Backward pass
    def backward(self, dvalues, y_true):
        # Number of samples
        samples = len(dvalues)
        # Number of labels in every sample
        # We'll use the first sample to count them
        labels = len(dvalues[0])
        # If labels are sparse, turn them into one-hot vector
        if len(y_true.shape) == 1:
            y_true = np.eye(labels)[y_true]
        # Calculate gradient
        self.dinputs = -y_true / dvalues
        # Normalize gradient
        self.dinputs = self.dinputs / samples

        
# Softmax classifier - combined Softmax activation
# and cross-entropy loss for faster backward step
class Activation_Softmax_Loss_CategoricalCrossentropy():  
    # # Creates activation and loss function objects
    # def __init__(self):
    #     self.activation = Activation_Softmax()
    #     self.loss = Loss_CategoricalCrossentropy()
    # # Forward pass
    # def forward(self, inputs, y_true):
    #     # Output layer's activation function
    #     self.activation.forward(inputs)
    #     # Set the output
    #     self.output = self.activation.output
    #     # Calculate and return loss value
    #     return self.loss.calculate(self.output, y_true)
    # Backward pass
    def backward(self, dvalues, y_true):
        # Number of samples
        samples = len(dvalues)     
        # If labels are one-hot encoded,
        # turn them into discrete values
        if len(y_true.shape) == 2:
            y_true = np.argmax(y_true, axis=1)
        # Copy so we can safely modify
        self.dinputs = dvalues.copy()
        # Calculate gradient
        self.dinputs[range(samples), y_true] -= 1
        # Normalize gradient
        self.dinputs = self.dinputs / samples
        

# Binary cross-entropy loss
class Loss_BinaryCrossentropy(Loss): 
    # Forward pass
    def forward(self, y_pred, y_true):
        # Clip data to prevent division by 0
        # Clip both sides to not drag mean towards any value
        y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)
        # Calculate sample-wise loss
        sample_losses = -(y_true * np.log(y_pred_clipped) + (1 - y_true) * np.log(1 - y_pred_clipped))
        sample_losses = np.mean(sample_losses, axis=-1)
        # Return losses
        return sample_losses       
    
    # Backward pass
    def backward(self, dvalues, y_true):
        # Number of samples
        samples = len(dvalues)
        # Number of outputs in every sample
        # We'll use the first sample to count them
        outputs = len(dvalues[0])
        # Clip data to prevent division by 0
        # Clip both sides to not drag mean towards any value
        clipped_dvalues = np.clip(dvalues, 1e-7, 1 - 1e-7)
        # Calculate gradient
        self.dinputs = -(y_true / clipped_dvalues - (1 - y_true) / (1 - clipped_dvalues)) / outputs
        # Normalize gradient
        self.dinputs = self.dinputs / samples
        
        
# Mean Squared Error loss
class Loss_MeanSquaredError(Loss): # L2 loss
    # Forward pass
    def forward(self, y_pred, y_true):
        # Calculate loss
        sample_losses = np.mean((y_true - y_pred)**2, axis=-1)
        # Return losses
        return sample_losses
    
    # Backward pass
    def backward(self, dvalues, y_true):
        # Number of samples
        samples = len(dvalues)
        # Number of outputs in every sample
        # We'll use the first sample to count them
        outputs = len(dvalues[0])
        # Gradient on values
        self.dinputs = -2 * (y_true - dvalues) / outputs
        # Normalize gradient
        self.dinputs = self.dinputs / samples
        

# Mean Absolute Error loss
class Loss_MeanAbsoluteError(Loss): # L1 loss
    def forward(self, y_pred, y_true):
        # Calculate loss
        sample_losses = np.mean(np.abs(y_true - y_pred), axis=-1)
        # Return losses
        return sample_losses
    
    # Backward pass
    def backward(self, dvalues, y_true):
        # Number of samples
        samples = len(dvalues)
        # Number of outputs in every sample
        # We'll use the first sample to count them
        outputs = len(dvalues[0])
        # Calculate gradient
        self.dinputs = np.sign(y_true - dvalues) / outputs
        # Normalize gradient
        self.dinputs = self.dinputs / samples
     

# Common accuracy class
class Accuracy:
    # Calculates an accuracy
    # given predictions and ground truth values
    def calculate(self, predictions, y):
        # Get comparison results
        comparisons = self.compare(predictions, y)
        # Calculate an accuracy
        accuracy = np.mean(comparisons)
        # Return accuracy
        return accuracy     


# Accuracy calculation for classification model
class Accuracy_Categorical(Accuracy):
    # No initialization is needed
    def init(self, y):
        pass
    # Compares predictions to the ground truth values
    def compare(self, predictions, y):
        if len(y.shape) == 2:
            y = np.argmax(y, axis=1)
        return predictions == y
         
    
# Accuracy calculation for regression model
class Accuracy_Regression(Accuracy):
    def __init__(self):
        # Create precision property
        self.precision = None
    # Calculates precision value
    # based on passed in ground truth
    def init(self, y, reinit=False):
        if self.precision is None or reinit:
            self.precision = np.std(y) / 250
    # Compares predictions to the ground truth values
    def compare(self, predictions, y):
        return np.absolute(predictions - y) < self.precision

        
# Model class
class Model:
    def __init__(self):
        # Create a list of network objects
        self.layers = []
        # Softmax classifier's output object
        self.softmax_classifier_output = None
        
    # Add objects to the model
    def add(self, layer):
        self.layers.append(layer)
    
    # Set loss, optimizer and accuracy
    def set(self, *, loss, optimizer, accuracy):
        self.loss = loss
        self.optimizer = optimizer
        self.accuracy = accuracy
            
    # Finalize the model
    def finalize(self):
        # Create and set the input layer
        self.input_layer = Layer_Input()
        # Count all the objects
        layer_count = len(self.layers)
        # Initialize a list containing trainable layers:
        self.trainable_layers = []
        # Iterate the objects
        for i in range(layer_count):
            # If it's the first layer,
            # the previous layer object is the input layer
            if i == 0:
                self.layers[i].prev = self.input_layer
                self.layers[i].next = self.layers[i+1]
            # All layers except for the first and the last
            elif i < layer_count - 1:
                self.layers[i].prev = self.layers[i-1]
                self.layers[i].next = self.layers[i+1]
            # The last layer - the next object is the loss
            # Also let's save aside the reference to the last object
            # whose output is the model's output
            else:
                self.layers[i].prev = self.layers[i-1]
                self.layers[i].next = self.loss
                self.output_layer_activation = self.layers[i]
            # If layer contains an attribute called "weights",
            # it's a trainable layer -
            # add it to the list of trainable layers
            # We don't need to check for biases -
            # checking for weights is enough 
            if hasattr(self.layers[i], 'weights'):
            	self.trainable_layers.append(self.layers[i])
            # Update loss object with trainable layers
            self.loss.remember_trainable_layers(self.trainable_layers)
        # If output activation is Softmax and
        # loss function is Categorical Cross-Entropy
        # create an object of combined activation
        # and loss function containing
        # faster gradient calculation
        if isinstance(self.layers[-1], Activation_Softmax) and isinstance(self.loss, Loss_CategoricalCrossentropy):
            # Create an object of combined activation
            # and loss functions
            self.softmax_classifier_output = Activation_Softmax_Loss_CategoricalCrossentropy()

    # Train the model
    def train(self, X, y, *, epochs=1, print_every=1, validation_data=None):
        # Initialize accuracy object
        self.accuracy.init(y)
        # Main training loop
        for epoch in range(1, epochs+1):
            # Perform the forward pass
            output = self.forward(X, training=True)
            # Calculate loss
            data_loss, regularization_loss = self.loss.calculate(output, y, include_regularization=True)
            loss = data_loss + regularization_loss
            # Get predictions and calculate an accuracy
            predictions = self.output_layer_activation.predictions(output)
            accuracy = self.accuracy.calculate(predictions, y)
            # Perform backward pass
            self.backward(output, y)
            # Optimize (update parameters)
            self.optimizer.pre_update_params()
            for layer in self.trainable_layers:
                self.optimizer.update_params(layer)
            self.optimizer.post_update_params()
            # Print a summary
            if not epoch % print_every:
                print(f'epoch: {epoch}, ' +
                      f'acc: {accuracy:.3f}, ' +
                      f'loss: {loss:.3f} (' +
                      f'data_loss: {data_loss:.3f}, ' +
                      f'reg_loss: {regularization_loss:.3f}), ' +
                      f'lr: {self.optimizer.current_learning_rate}')
        # If there is the validation data
        if validation_data is not None:
            # For better readability
            X_val, y_val = validation_data
            # Perform the forward pass
            output = self.forward(X_val, training=False)
            # Calculate the loss
            loss = self.loss.calculate(output, y_val)
            # Get predictions and calculate an accuracy
            predictions = self.output_layer_activation.predictions(output)
            accuracy = self.accuracy.calculate(predictions, y_val)
            # Print a summary
            print(f'validation, ' +
                  f'acc: {accuracy:.3f}, ' +
                  f'loss: {loss:.3f}')

    # Performs forward pass
    def forward(self, X, training):
        # Call forward method on the input layer
        # this will set the output property that
        # the first layer in "prev" object is expecting
        self.input_layer.forward(X, training)
        # Call forward method of every object in a chain
        # Pass output of the previous object as a parameter
        for layer in self.layers:
            layer.forward(layer.prev.output, training)
        # "layer" is now the last object from the list,
        # return its output
        return layer.output

    # Performs backward pass
    def backward(self, output, y):
        # If softmax classifier
        if self.softmax_classifier_output is not None:
            # First call backward method
            # on the combined activation/loss
            # this will set dinputs property
            self.softmax_classifier_output.backward(output, y)
            # Since we'll not call backward method of the last layer
            # which is Softmax activation
            # as we used combined activation/loss
            # object, let's set dinputs in this object
            self.layers[-1].dinputs = self.softmax_classifier_output.dinputs
            # Call backward method going through
            # all the objects but last
            # in reversed order passing dinputs as a parameter
            for layer in reversed(self.layers[:-1]):
                layer.backward(layer.next.dinputs)
            return
        # First call backward method on the loss
        # this will set dinputs property that the last
        # layer will try to access shortly
        self.loss.backward(output, y)
        # Call backward method going through all the objects
        # in reversed order passing dinputs as a parameter
        for layer in reversed(self.layers):
            layer.backward(layer.next.dinputs) 
        


# # Create dataset
# X, y = sine_data()

# # Instantiate the model
# model = Model()

# # Add layers
# model.add(Layer_Dense(1, 64))
# model.add(Activation_ReLU())
# model.add(Layer_Dense(64, 64))
# model.add(Activation_ReLU())
# model.add(Layer_Dense(64, 1))
# model.add(Activation_Linear())

# # Set loss and optimizer objects
# model.set(
#     loss=Loss_MeanSquaredError(),
#     optimizer=Optimizer_Adam(learning_rate=0.005, decay=1e-3),
#     accuracy=Accuracy_Regression()
#     )

# # Finalize the model
# model.finalize()

# model.train(X, y, epochs=10000, print_every=100)
#########################################################################################
# # Create train and test dataset
# X, y = spiral_data(samples=100, classes=2)
# X_test, y_test = spiral_data(samples=100, classes=2)

# # Reshape labels to be a list of lists
# # Inner list contains one output (either 0 or 1)
# # per each output neuron, 1 in this case
# y = y.reshape(-1, 1)
# y_test = y_test.reshape(-1, 1)

# # Instantiate the model
# model = Model()

# # Add layers
# model.add(Layer_Dense(2, 64, weight_regularizer_l2=5e-4, bias_regularizer_l2=5e-4))
# model.add(Activation_ReLU())
# model.add(Layer_Dense(64, 1))
# model.add(Activation_Sigmoid())

# # Set loss, optimizer and accuracy objects
# model.set(
#     loss=Loss_BinaryCrossentropy(),
#     optimizer=Optimizer_Adam(decay=5e-7),
#     accuracy=Accuracy_Categorical()
#     )

# # Finalize the model
# model.finalize()

# # Train the model
# model.train(X, y, validation_data=(X_test, y_test), epochs=10000, print_every=100)
#########################################################################################
# Create dataset
X, y = spiral_data(samples=1000, classes=3)
X_test, y_test = spiral_data(samples=100, classes=3)
# Instantiate the model
model = Model()
# Add layers
model.add(Layer_Dense(2, 512, weight_regularizer_l2=5e-4, bias_regularizer_l2=5e-4))
model.add(Activation_ReLU())
model.add(Layer_Dropout(0.1))
model.add(Layer_Dense(512, 3))
model.add(Activation_Softmax())
# Set loss, optimizer and accuracy objects
model.set(
    loss=Loss_CategoricalCrossentropy(),
    optimizer=Optimizer_Adam(learning_rate=0.05, decay=5e-5),
    accuracy=Accuracy_Categorical()
    )
# Finalize the model
model.finalize()
# Train the model
model.train(X, y, validation_data=(X_test, y_test), epochs=10000, print_every=100)



本章的章节代码、更多资源和勘误表:https://nnfs.io/ch18

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2267681.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

Spring Boot教程之四十:使用 Jasypt 加密 Spring Boot 项目中的密码

如何使用 Jasypt 加密 Spring Boot 项目中的密码 在本文中&#xff0c;我们将学习如何加密 Spring Boot 应用程序配置文件&#xff08;如 application.properties 或 application.yml&#xff09;中的数据。在这些文件中&#xff0c;我们可以加密用户名、密码等。 您经常会遇到…

Quartz任务调度框架实现任务动态执行

说明&#xff1a;之前使用Quartz&#xff0c;都是写好Job&#xff0c;指定一个时间点&#xff0c;到点执行。最近有个需求&#xff0c;需要根据前端用户设置的时间点去执行&#xff0c;也就是说任务执行的时间点是动态变化的。本文介绍如何用Quartz任务调度框架实现任务动态执行…

Scala_【1】概述

第一章 语言特点环境搭建(Windows)idea编写scalaHelloWorld注意事项 Scala是一门以Java虚拟机&#xff08;JVM&#xff09;为运行环境并将面向对象和函数式编程的最佳特性结合在一起的静态类型编程语言 语言特点 Scala是一门多范式的编程语言&#xff0c;Scala支持面向对象和函…

StableAnimator模型的部署:复旦微软提出可实现高质量和高保真的ID一致性人类视频生成

文章目录 一、项目介绍二、项目部署模型的权重下载提取目标图像的关节点图像&#xff08;这个可以先不看先用官方提供的数据集进行生成&#xff09;提取人脸&#xff08;这个也可以先不看&#xff09;进行图片的生成 三、模型部署报错 一、项目介绍 由复旦、微软、虎牙、CMU的…

最新高性能多目标优化算法:多目标麋鹿优化算法(MOEHO)求解LRMOP1-LRMOP6及工程应用---盘式制动器设计,提供完整MATLAB代码

一、麋鹿优化算法 麋鹿优化算法&#xff08;Elephant Herding Optimization&#xff0c;EHO&#xff09;是2024年提出的一种启发式优化算法&#xff0c;该算法的灵感来源于麋鹿群的繁殖过程&#xff0c;包括发情期和产犊期。在发情期&#xff0c;麋鹿群根据公麋鹿之间的争斗分…

螺杆支撑座在运用中会出现哪些问题?

螺杆支撑座是一种用于支撑滚珠螺杆的零件&#xff0c;通常用于机床、数控机床、自动化生产线等高精度机械设备中。在运用中可能会出现多种问题&#xff0c;这些问题源于多个方面&#xff0c;以下是对可能出现的问题简单了解下&#xff1a; 1、安装不当&#xff1a;安装过程中没…

Unity3d UGUI如何优雅的实现Web框架(Vue/Rect)类似数据绑定功能(含源码)

前言 Unity3d的UGUI系统与Web前端开发中常见的数据绑定和属性绑定机制有所不同。UGUI是一个相对简单和基础的UI系统&#xff0c;并不内置像Web前端&#xff08;例如 Vue.js或React中&#xff09;那样的双向数据绑定或自动更新UI的机制。UGUI是一种比较传统的 UI 系统&#xff…

从0入门自主空中机器人-2-2【无人机硬件选型-PX4篇】

1. 常用资料以及官方网站 无人机飞控PX4用户使用手册&#xff08;无人机基本设置、地面站使用教程、软硬件搭建等&#xff09;&#xff1a;https://docs.px4.io/main/en/ PX4固件开源地址&#xff1a;https://github.com/PX4/PX4-Autopilot 飞控硬件、数传模块、GPS、分电板等…

Windows上缺少xaudio2_9.dll是什么原因?

一、文件丢失问题&#xff1a;Windows上缺少xaudio2_9.dll是什么原因&#xff1f; xaudio2_9.dll是DirectX音频处理库的一个组件&#xff0c;它支持游戏中的音频处理功能。当你在Windows系统上运行某些游戏或音频软件时&#xff0c;如果系统提示缺少xaudio2_9.dll文件&#xf…

冥想的实践

这是我某一天的正念和冥想实践&#xff0c;我对正念练习、冥想练习进行了分别的统计。 正念练习&#xff1a;1分钟**5次 冥想&#xff1a;15分钟10分钟 正念练习&#xff0c;基本在工作休息时间练习。当然&#xff0c;工作过程中&#xff0c;也有一部分时间会有正念的状态&am…

一个简单的机器学习实战例程,使用Scikit-Learn库来完成一个常见的分类任务——**鸢尾花数据集(Iris Dataset)**的分类

机器学习实战通常是将理论与实践结合&#xff0c;通过实际的项目或案例&#xff0c;帮助你理解并应用各种机器学习算法。下面是一个简单的机器学习实战例程&#xff0c;使用Scikit-Learn库来完成一个常见的分类任务——**鸢尾花数据集&#xff08;Iris Dataset&#xff09;**的…

Redis 实战篇 ——《黑马点评》(上)

《引言》 在进行了前面关于 Redis 基础篇及其客户端的学习之后&#xff0c;开始着手进行实战篇的学习。因内容很多&#xff0c;所以将会分为【 上 中 下 】三篇记录学习的内容与在学习的过程中解决问题的方法。Redis 实战篇的内容我写的很详细&#xff0c;为了能写的更好也付出…

Intent--组件通信

组件通信1 获取子活动的返回值 创建Activity时实现自动注册&#xff01;【Activity必须要注册才能使用】 默认 LinearLayout 布局&#xff0c;注意 xml 中约束布局的使用&#xff1b; 若需要更改 线性布局 只需要将标签更改为 LinearLayout 即可&#xff0c;记得 设置线性布局…

word参考文献第二行缩进对齐

刚添加完参考文献的格式是这样&#xff1a; ”段落“—>缩进修改、取消孤行控制 就可以变成

UE(虚幻)学习(一) UE5.3.2和VS2022的安装以及遇到的问题和一些CS8604、CA2017报错问题.

最近工作很多东西是UE搞的&#xff0c;工作安排上也稍微缓口气&#xff0c;来学学UE&#xff0c;因为同事都用的UE5.3&#xff0c;所以就从UE5.3开始吧&#xff0c;之前学习过UE4&#xff0c;放上两年什么都不记得了。还是需要做一些记录。 本来安装不想写什么&#xff0c;谁知…

【YOLOv3】源码(train.py)

概述 主要模块分析 参数解析与初始化 功能&#xff1a;解析命令行参数&#xff0c;设置训练配置项目经理制定详细的施工计划和资源分配日志记录与监控 功能&#xff1a;初始化日志记录器&#xff0c;配置监控系统项目经理使用监控和记录工具&#xff0c;实时跟踪施工进度和质量…

systemverilog语法:assertion summary

topics assertion 介绍 Property在验证中的应用 ended表示sequence执行结束。 property 立即断言不消耗时间&#xff0c;好像if…else…&#xff0c;关键字不含property. 并发断言消耗时间&#xff0c;关键字含property. 立即断言 并发断言

blender中合并的模型,在threejs中显示多个mesh;blender多材质烘培成一个材质

描述&#xff1a;在blender中合并的模型导出为glb&#xff0c;在threejs中导入仍显示多个mesh&#xff0c;并不是统一的整体&#xff0c;导致需要整体高亮或者使用DragControls等不能统一控制。 原因&#xff1a;模型有多个材质&#xff0c;在blender中合并的时候&#xff0c;…

关于最新MySQL9.0.1版本zip自配(通用)版下载、安装、环境配置

一、下载 从MySQL官网进行下载MySQL最新版本&#xff0c;滑到页面最下面点击社区免费版&#xff0c;&#xff08;不是企业版&#xff09; 点击完成后选择自己想要下载的版本&#xff0c;选择下载zip压缩&#xff0c;不用debug和其他的东西。 下载完成后进入解压&#xff0c;注…

4.银河麒麟V10(ARM) 离线安装 MySQL

1. 系统版本 [rootga-sit-cssjgj-db-01u ~]# nkvers ############## Kylin Linux Version ################# Release: Kylin Linux Advanced Server release V10 (Lance)Kernel: 4.19.90-52.39.v2207.ky10.aarch64Build: Kylin Linux Advanced Server release V10 (SP3) /(La…