Deep Learning with Python(2), Data Operation within the Neural Network
The core of deep learning involves manipulating tensors through various mathematical operations to transform input data into meaningful outputs, so it's very important to know about the tensors operation behind the neural network.
Tensors are multi-dimensional arrays used to represent data in deep learning models, such as images, videos, and text. You may be already familiar with matrices, which are rank-2 tensors: tensors are a generalization of matrices to an arbitrary number of dimensions.
Scalars(Rank-0 Tensors)
A tensor that contains only one number is called a scalar (or scalar tensor, or rank-0 tensor, or 0D tensor).
import numpy as np
x = np.array(12)
x
Out[]: array(12)
x.ndim
Out[]: 0
Vectors(Rank-1 Tensors)
An array of numbers is called a vector(or rank-1 tensor, or 1D tensor). A rank-1 tensor is said to have exactly one axis. Following is a NumPy vector:
x = np.array([12, 3, 6, 14, 7])
x
Out[]: array([12, 3, 6, 14, 7])
x.ndim
Out[]: 1
This vector has five entries and so is called a 5-dimensional vector. Don’t confuse a 5D vector with a 5D tensor! A 5D vector has only one axis and has five dimensions along its axis, whereas a 5D tensor has five axes (and may have any number of dimensions along each axis).
Matrices(Rank-2 Tensors)
An array of vectors is a matrix(or rank-2 tensor, or 2D tensor). A matrix has two axes (often referred to as rows and columns). You can visually interpret a matrix as a rectangular grid of numbers. This is a NumPy matrix:
x = np.array([[5, 78, 2, 34, 0],
[6, 79, 3, 35, 1],
[7, 80, 4, 36, 2]])
x.ndim 2
Out[]: 2
Rank-3 and Higher-Rank Tensors
If you pack such matrices in a new array, you obtain a rank-3 tensor (or 3D tensor).
x = np.array([[[5, 78, 2, 34, 0],
[6, 79, 3, 35, 1],
[7, 80, 4, 36, 2]],
[[5, 78, 2, 34, 0],
[6, 79, 3, 35, 1],
[7, 80, 4, 36, 2]],
[[5, 78, 2, 34, 0],
[6, 79, 3, 35, 1],
[7, 80, 4, 36, 2]]])
x.ndim
Out[]: 3
By packing rank-3 tensors in an array, you can create a rank-4 tensor, and so on. In deep learning, you’ll generally manipulate tensors with ranks 0 to 4, although you may go up to 5 if you process video data. The data you’ll manipulate will almost always fall into one of the following categories:
- Vector data—Rank-2 tensors of shape
(samples, features)
, where each sample is a vector of numerical attributes (features
) - Timeseries data or sequence data—Rank-3 tensors of shape
(samples, timesteps, features)
, where each sample is a sequence (of lengthtimesteps
) of feature vectors. - Images—Rank-4 tensors of shape
(samples, height, width, channels)
, where each sample is a 2D grid of pixels, and each pixel is represented by a vector of values (“channels”) - Video—Rank-5 tensors of shape
(samples, frames, height, width, channels)
, where each sample is a sequence (of lengthframes
) of images