What is a convolutional neural network?

We have learned about artificial neural networks in this article. If our input is an image of 30*30*3 pixels, the input layer of the neural network will be 30*30*3 neurons. This number of neurons is still temporarily acceptable for the training process because the computational cost is not very large. But if our input is a picture 1000*1000*3, then the number of neurons is extremely large and the computation process will be much slower, and the complex network will easily lead to overfitting. On the other hand, With the input object is an image, then the pixels that are close together will bring useful information, but the neural network does not exploit this feature. Therefore, there is a new neural network, created by researchers: it is a convolutional neural network (CNN). So how is CNN different from ANN? In a nutshell, CNN does not read individual pixels like ANN, but it will read a series of pixels close to each other to include in the calculation. But to better understand how CNN works, we must go into understanding each component that makes up a new CNN network. Here we will learn new concepts like Convolutional Layer, Pooling Layer.

You are viewing: What is a Convolutional neural network

Convolutional Layers:

Convolutional Layer is an extremely important layer in CNN, it takes care of most of the network’s computing functions.


To explain about Convolutional Layer, I will go directly to the example to explain visually. The first is the concept of filter map: instead of connecting each pixel of the input image like ANN, CNN has panels. filter is used to apply to areas of the image. These filter maps are actually a 3-dimensional matrix of numbers. And what you need to be extremely careful about is that these numbers are the parameters that need to be learned. The filter map has the length and width of the hyperparameter (what is a hyperparameter, I will talk about it in this article), the height (I will call it depth) of the filter map will be equal to the depth of the previous layer.For example, looking at the example image above, we have 2 pink filter maps, W0 and W1. Each filter map is 3*3*3 in size. The length * width ( 3 * 3 ) is the hyperparameter, and the depth (Depth) will be equal to the depth of the input volumn, ie equal to 3.Stride: as mentioned above, filter map is used to slide on the input image, so what does slide mean here? It’s very simple, just translate the filter map in pixels from left to right line by line. Each such translation will be based on a value called stride.For example, stride = 1, then each time the filter map is shifted 1 pixel to the right, when the right edge is exhausted, it will go down 1 line and move on. And if stride = 2, each shift will be 2 pixels right, when the edge is exhausted, 2 lines down.

See also: Tai Game Ngoc Rong Online, Download Ngoc Rong Online

Padding: people will add 0 values ​​around the input class. For example, in the image above, our input layer is initially 5 * 5 * 3, but because the padding value = 1, it should be covered with a layer of 0 outside. From there, the size of the input layer = 7*7*3.Feature map: feature map is actually the result after the input layer is scanned by the filter map. For each time the filter map is applied to the input, a computation occurs. So what is this calculation process? In fact, it is the process of multiplying 2 matrices. For example, when the top layer of the filter map W0 is applied to the top layer of the input volume, the filtermap matrix w0 3*3 will be multiplied by the matrix 3*3 of input then add bias=1, and output=2 in feature map. How is the feature map size calculated? Each time a filter map slides all the way up the input volume, it will yield a layer of the feature map. So, the depth of the feature map will be equal to the number of filter maps. For example, in the image above, the depth of the feature map = the number of filter maps = 2. The length and width of the feature map will be calculated by the formula: (W + 2P – F)/S +1 . Where, W is the input size, P is padding, F is the filter map size, S is stride. For example, with the figure, W = 5, P = 1, F = 3, S = 1 , we can calculate the size. of the feature map is 3. We can see, if with normal ANN, a 7*7*3 image as above will need 7*7*3 = 147 parameters (if the second layer has only 1 neuron), but with CNN, less parameters are needed from the filter map and this number of parameters can be quantified.

Pooling layer

Usually between Convolutional Layers, people will insert a Pooling layer to reduce the number of parameters if the input is too large. There are many types of pooling layers, but I only recommend max pooling.


In the example above, the max pooling layer has size 2*2, stride = 2. For each time applied to the input, it will take the maximum value in that filter interval and put it in the output.

See also: What is Oppa – How to Call /oppa/ in Korean

ReLU layer

The concept of ReLU I have talked about in this article, come back to see it. Usually, after the feature map is computed, a ReLU layer is then placed, which applies the ReLU function to all the values ​​of the feature map.

Fully Connected Layer

To make the prediction results, the Neural network layer will be added after a CNN bearer. This neural layer is just a normal ANN, nothing more. So a regular CNN network will have the structure of layers like? Here is an example:

Category: FAQ

About Troubleinthepeace

Troubleinthepeace specializing in synthesizing information about daily life activities

View all posts by Troubleinthepeace →

Trả lời

Email của bạn sẽ không được hiển thị công khai.