Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

A gentle introduction to Convolutions (Visually explained)

Posted on Sep 26 Have you ever asked yourself how is it possible to identify all the horizontal lines in an image with just some scalar products between matrices? No? Me neither, but you can use convolutions to do that, and they can do other magic things… Obviously!The ability of computers to recognize faces, identify objects, and drive cars autonomously is based on this sort of mathematical operation called Convolution. This operation was first introduced in the 19th century by Siméon Denis Poisson, a French mathematician and physicist.But it wasn’t until the 1980s that convolution found its way into the field of computer vision, thanks to the pioneering work of researchers such as Yann LeCun, Geoff Hinton, and Yoshua Bengio.Since then, convolution has become a foundation of modern machine learning, enabling computers to process images, videos, and other forms of visual data with unparalleled accuracy and efficiency.In this article, we’ll explore the world of convolutions by even “convoluting” a duck, stick with me to the end to know the power of this beautiful tool.Convolution is a simple mathematical operation, it involves taking a small matrix, called kernel or filter, and sliding it over an input image, performing the dot product at each point where the filter overlaps with the image, and repeating this process for all pixels.The kernel is designed to highlight certain features of the input image, such as edges, corners, or textures, by detecting patterns of pixels that match certain criteria.You can perform convolution in 1D, 2D, and even in 3D.You can see from the GIF above that we are performing the dot product between matrices for every “jump” of the kernel and adding that result as a new pixel in the convolution.You can use convolution both for upsampling(increasing resolution) and for downsampling(decreasing resolution), we’ll see later a variant of convolution that is really great for upsampling and a method for downsampling.Remember: The goal of using convolution in deep learning is not to use them to predict an outcome, but to extract features that then will be used by FFNs layers to predict data.For this explanation, we just need two libraries, PyTorch and Matplotlib.Isn’t this kernel beautiful? Now it is time to talk about the part that you have been waiting for… The implementation of convolution.You should get a result similar to this:Don’t you think is better to use real images? We will a vertical kernel to identify all the vertical lines and a horizontal kernel to identify all horizontal lines, and the image will be this beautiful duck that is ready to be “convoluted”:Time to implement the convolution on this beautiful duck:Ready for the result?Pretty stunning as a result, what do you think?Here we are using a kernel invented by humans, in DL models the kernel will be learned by the network.Notice this: PyTorch and other DL frameworks implement actually a thing called “cross-correlation” and not convolution, but stick with me until the end to know more.The most important parameters are stride and padding, in this article, you’ll see covered both.padding: have you noticed in this GIF there are some sort of zeros on the borders? This is called padding, the convolution, in this case, has a padding of 1, the value from the padding can be any number but the best value to pick is usually zero.In the image above made with my beautiful handwriting skills you can see that we are skipping some numbers (I am using a kernel of size one for simplicity), this is because we have a stride > 1, stride is just the number of “jumps” that the kernel will do in a direction. The image above has a stride of 2. If you want to downsample you can just increase the stride.Suppose we apply a stride of 3 while using a 3x3 kernel and a 5x5 input — what would happen on the second jump?Exactly, we can’t make any operation in that part, PyTorch will omit the pixel when the kernel goes outside the image, the only solution is to add padding.You probably know the size of the output even before the output is given just by looking at the parameters, but this will become more difficult as the size of the parameters increases, here’s a formula to calculate the exact size of the output:Transposed convolution, also known as deconvolution, is a sort of convolution that is great for upsampling, with this type of convolution we start with a small image and receive as an output a bigger image.To do that just perform a scalar matrix multiplication between the kernel and every pixel of the image, like normal convolution even here we slide the kernel over the image, and in the result, you would sum the overlapped part.Similar to the formula that you have seen in the previous section there is a formula too, to calculate the output size using transposed convolutions.Actually, the deep learning models implement another thing that is not convolution but it is similar, and it’s called cross-correlation.The only difference between the two is that convolution uses an “inverted” kernel, rotated by 180°.Pytorch with F.conv2d() is implementing cross-correlation, If you want to implement a real convolution you can easily use the Scipy library or create the code on your own (just remember to rotate the kernel by 180°).The main goal of pooling is to stabilize the results and create a more stable network, this is because pooling increments the receptive fields (stay with me, I’ll explain later what it means and we it is useful).With a pooling layer, you want a pixel to explain the image as best as it can, this is made by doing an operation on a number of pixels to reduce that number to one, for example, a 4x4 block of pixels will be reduced to 1x1 block of pixels, this can be made by averaging or taking the maximum/minimum value.The pooling operation takes the same parameters as the operation of convolution, with a small difference, here we can choose what to do if our kernel goes “outside” the image (which can be caused by a too-large stride).There are different types of pooling operations:As you do with convolution, even here you slide a sort of kernel (formally called “spatial extent”), in the highlighted part you would take the average of all values, or the max value or the min value to represent a single pixel in the output.Usually, when you use pooling you would also set the stride to be the same as the spatial extent, as you can see in the GIF above with the same size of two for the spatial extent and for the stride.Receptive fields are a very important concept in psychology, signal processing and deep learning too. A receptive field is the quantity of data in the field of view of something, the receptive field of an FNN unit is one pixel.Having a receptive field of one makes the network non-robust to translation, resizing, rotations, etc. For example, let’s say that you take more photos of yourself in a park in the same position, these photos are similar to each other, but they have small differences, which can cause the network to not recognize you in both photos.So you want a pixel in the output to contain more information than just a single square in the input.In the image above you have some little changes between the two photos, a network without a pooling layer can struggle in identifying you in both photos (because ANN units have a receptive field of one).Do you remember that we have seen what would happen if we have too large stride and we go outside the image with the kernel?In a convolution you would increase padding, using pooling, you have in PyTorch a parameter called “ceil_mode”, if set to False, you will remove the pixel created when a part of the kernel is outside the image, if set to True, the operation of pooling will be performed only on the part covered by the Kernel, but in this case, the pixel will be added to the result.Example of CNN architectureThere is an alternative to pooling and it is to increase stride in the convolution operation, they both increase the receptive field of a neuron but there are some differences:There is no correct answer to whether you should pick one over another, pooling layers are more historical so you could find them in more architecture, but both are valid choices.Now you know what are convolutions and their variants and how to implement them in PyTorch, you know how convolutions are used in deep learning models and how to use pooling to your advantage.You have just nicked the surface of the implementation of convolutions in Deep Learning, now your job is to go on this path and start learning the beautiful thing that CNNs can gift you.I want to share some resources if you want to go more in-depth in this argument:Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well Confirm For further actions, you may consider blocking this person and/or reporting abuse Taskade - Aug 16 Maxim Saplin - Sep 7 Sm0ke - Sep 18 Praise Idowu - Sep 17 Once suspended, marcomoscatelli will not be able to comment or publish posts until their suspension is removed. Once unsuspended, marcomoscatelli will be able to comment and publish posts again. Once unpublished, all posts by marcomoscatelli will become hidden and only accessible to themselves. If marcomoscatelli is not suspended, they can still re-publish their posts from their dashboard. Note: Once unpublished, this post will become invisible to the public and only accessible to Marco Moscatelli. They can still re-publish the post if they are not suspended. Thanks for keeping DEV Community safe. Here is what you can do to flag marcomoscatelli: marcomoscatelli consistently posts content that violates DEV Community's code of conduct because it is harassing, offensive or spammy. Unflagging marcomoscatelli will restore default visibility to their posts. DEV Community — A constructive and inclusive social network for software developers. With you every step of your journey. Built on Forem — the open source software that powers DEV and other inclusive communities.Made with love and Ruby on Rails. DEV Community © 2016 - 2023. We're a place where coders share, stay up-to-date and grow their careers.



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

A gentle introduction to Convolutions (Visually explained)

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×