Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Convolutional Neural Networks from the Ground-Up, for the ML-Adjacent

Mike CvetFollowBetter Programming--ListenShareIn an earlier post, I walked through the Word2Vec algorithm and a reasonably bare-bones neural Network. In this post, I’m going to walk through the implementation of another kind of neural network — CNNs, which are effective for image recognition and classification tasks, among other things. Again, the Rust code referenced below does not leverage any ML libraries and implements this network from scratch. The code referenced in this post can be found here on GitHub.Convolutional Neural Networks are named after the mathematical concept of convolution, which, given two functions f and g, produce a third function that describes how the shape of one is modified by the other using the following integral:In the context of a CNN, the convolutional operation refers to the sliding of small kernels of weights across an image’s pixels, multiplying the two together and storing the result in an output feature map.Typical CNNs are variants of feed-forward neural nets, so most of the rest of the algorithm is what you might expect (I walk through this with examples here). I found a great resource for walking through the model architecture at a high level in this Towards Data Science post, and this one, (and in more depth — this paper).The network I’ll describe below was built to learn from the MNIST Database, which contains training and test data comprised of hand-written digits from the United States Census Bureau and high-school students. These digits are packed into 70 thousand 28x28 pixel grayscale images. After around five training epochs, this network can achieve a 98%+ predictive accuracy on handwritten digits from the MNIST dataset, though I initially couldn’t get better than 80% accuracy on my own handwriting. More on that later!Generally speaking, there are three kinds of layers in a Convolutional Neural Network, though there may be multiple layers of each type or additional Layer types, depending on the application:This neural network extracts patches from the images it's processing. The patch captures spatial relationships (meaning, it preserves the two-dimensional layout) between a subset of pixels within the image and uses this pixel subset as a feature in the model.The convolutional layer also defines kernels, which are learned weight matrices multiplied against patches to capture features from the underlying pixels.In this image, you can see a 2D window (the patch) sliding over the (blue) input image, which multiplies the patch against its corresponding kernel and stores the result in the (green) feature output matrix.Visualized differently, a kernel (whose weights were probably learned during back propagation) is multiplied against a patch at [0, 0] in blue. This results in an intermediate matrix which is summed and whose resulting value is stored in the final feature matrix at the corresponding top-left coordinates.This feature map is the output of this layer.The pooling layer, and using the max pooling technique in particular, is a dimensionality and noise reduction step. It extracts the dominant features within the input representation and ignores the rest. Similarly to the earlier convolutional layer, this pooling step extracts patches over its own input, which are the output matrices of the convolutional layer.The max pooling output is implemented by simply extracting the maximum value (in our case, convolved pixel magnitude) within each of these subsequent patches, further increasing the density of the representation of the input image.Visually, it looks like this:It can then be flattened into a column vector for either training or inference within the feed-forward neural network captured by the next fully connected layer.The fully-connected layer uses the densified input features from the earlier layers and maps them into the output layer, using a softmax activation function. This translates those inputs into a probability distribution over the classes represented by the N dimensions of the output layer — in our case, ten neurons — representing the digits from 0–9.This layer represents the feed-forward neural network or multi-layer perceptron, stage of this CNN. It’s called the fully-connected layer because, unlike earlier layers, every neuron in this layer is connected to every neuron in the previous layer. This is unlike the convolution and pooling layers described above, which focus on manipulating spatial segments of their input matrices.This relationship between features through each layer is visualized here:The MNIST dataset comes with sixty thousand training images and labels, and ten thousand test images and labels. Of course you could use all seventy thousand pairs for training and test on whatever you like — this is just how the files are partitioned.With a learning rate of 0.01, the model accuracy improves pretty quickly:And converges after a few epochs:I wanted to test this network on a classification task of my own handwritten digits. I took a photo of numbers I wrote by hand, inverted the colors, and scaled them to the same 28x28 format as the images in the MNIST dataset.The model accuracy is pretty good against the dataset, so I assumed it would do pretty well classifying my own, famously bad handwriting:Just 7/10.In the bigger picture, that’s pretty good for a hand-written image classifier, but I thought it would do better. My written 4s and 9s are often confused by humans (especially former math teachers). I’d previously learned to write my 7s with a crossbar to distinguish from my 1s — it’s uncommon in the US and it makes sense that it would confuse the model, given its US-based training data. Getting my 6 confused with a 5 is odd.Since I have this sample of my own handwriting, I can just fine-tune the model based on this small sample set — running back-propagation against these ten images and their filename labels. This is kind of like transfer learning. I configured the code to run backprop just three times for each digit, which seemed insignificant compared to the three hundred thousand training examples used to train this model so far.I enabled this behavior behind the--cheat flag.10/10 😎----Better ProgrammingI’m a DE at LinkedIn, helped build Twitter, was an early engineer at a couple startups with successful exits, and used to build native code debuggers for LinuxMike CvetinBetter Programming--VinitainBetter Programming--37Benoit RuizinBetter Programming--220Mike CvetinBetter Programming--1Ilias Papastratis--3Hunter Phillips--8Kaveh Kamali--2Dr. Ashish BamaniainLevel Up Coding--64Aaron MasterinTowards Data Science--47Dominik Andrzejczuk--HelpStatusAboutCareersBlogPrivacyTermsText to speechTeams



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

Convolutional Neural Networks from the Ground-Up, for the ML-Adjacent

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×