计算机

Deep Learning with Python(2), Data Operation within the Neural Network

The core of deep learning involves manipulating tensors through various mathematical operations to transform input data into meaningful outputs, so it's very important to know about the tensors operation behind the neural network.

谢现实

Feb 28, 2025 — 3 min read

Tensors are multi-dimensional arrays used to represent data in deep learning models, such as images, videos, and text. You may be already familiar with matrices, which are rank-2 tensors: tensors are a generalization of matrices to an arbitrary number of dimensions. In the book ‘Deep Learning with Python’ we can learn that neural networks fundamentally involve tensor operations. Therefore, understanding what tensors are and how to operate on them is essential.

Scalars(Rank-0 Tensors)

A tensor that contains only one number is called a scalar (or scalar tensor, or rank-0 tensor, or 0D tensor).

import numpy as np
x = np.array(12)

x 
Out[]: array(12)

x.ndim 
Out[]: 0

Vectors(Rank-1 Tensors)

An array of numbers is called a vector(or rank-1 tensor, or 1D tensor). A rank-1 tensor is said to have exactly one axis. Following is a NumPy vector:

x = np.array([12, 3, 6, 14, 7])

x
Out[]: array([12, 3, 6, 14, 7])

x.ndim 
Out[]: 1

This vector has five entries and so is called a 5-dimensional vector. Don’t confuse a 5D vector with a 5D tensor! A 5D vector has only one axis and has five dimensions along its axis, whereas a 5D tensor has five axes (and may have any number of dimensions along each axis).

💡

Dimension of tensor is often called an axis.

Matrices(Rank-2 Tensors)

An array of vectors is a matrix(or rank-2 tensor, or 2D tensor). A matrix has two axes (often referred to as rows and columns). You can visually interpret a matrix as a rectangular grid of numbers. This is a NumPy matrix:

x = np.array([[5, 78, 2, 34, 0],
              [6, 79, 3, 35, 1], 
              [7, 80, 4, 36, 2]])

x.ndim 2
Out[]: 2

Rank-3 and Higher-Rank Tensors

If you pack such matrices in a new array, you obtain a rank-3 tensor (or 3D tensor).

 x = np.array([[[5, 78, 2, 34, 0],
                 [6, 79, 3, 35, 1], 
                 [7, 80, 4, 36, 2]], 
                [[5, 78, 2, 34, 0], 
                 [6, 79, 3, 35, 1], 
                 [7, 80, 4, 36, 2]], 
                [[5, 78, 2, 34, 0], 
                 [6, 79, 3, 35, 1], 
                 [7, 80, 4, 36, 2]]])

x.ndim 
Out[]: 3

By packing rank-3 tensors in an array, you can create a rank-4 tensor, and so on. In deep learning, you’ll generally manipulate tensors with ranks 0 to 4, although you may go up to 5 if you process video data. The data you’ll manipulate will almost always fall into one of the following categories:

Vector data—Rank-2 tensors of shape (samples, features), where each sample is a vector of numerical attributes (features)
Timeseries data or sequence data—Rank-3 tensors of shape (samples, timesteps, features), where each sample is a sequence (of length timesteps) of feature vectors.
Images—Rank-4 tensors of shape (samples, height, width, channels), where each sample is a 2D grid of pixels, and each pixel is represented by a vector of values (“channels”)
Video—Rank-5 tensors of shape (samples, frames, height, width, channels), where each sample is a sequence (of length frames) of images

Tensor Operation in Neural Network

In example of neural network, we built our model by stacking Dense layers on top of each other. A Keras layer instance looks like this:

keras.layers.Dense(512, activation="relu")

This layer can be interpreted as a function, which takes as input a matrix/vector and returns another matrix/vector—a new representation for the input tensor. 512 is the number of neurons (or units) in the dense layer. Specifically, the function is as follows:

output = relu(dot(input, W) + b)

We have three tensor operations here:

dot(input, W) is a dot product (dot) between the input tensor and a tensor named W .
An addition(+) between the resulting matrix and a vector b .
A relu operation: relu(x) is max(x, 0).

We can use Mathematical Representation to represent the Dense (fully connected) layer, the output is calculated as:

$\text{output} = \text{activation}(\mathbf{XW} + \mathbf{b})$

$\mathbf{X}$ : Input matrix of shape (batch_size, input_dim).
$\mathbf{W}$ : Weight matrix of shape (input_dim, units).
$\mathbf{b}$ : Bias vector of shape (units).
units: Specifies the number of columns in $\mathbf{W}$ and elements in $\mathbf{b}$ .