Image Classification using Convolutional Neural Networks

Image classification can be defined as the task of categorizing images into one or multiple predefined classes. Although the task of categorizing an image is instinctive and habitual to humans, it is much more challenging for an automated system to recognize and classify images. ### The Success of Neural Networks Among deep neural networks (DNN), the convolutional neural network (CNN) has demonstrated excellent results in computer vision tasks, especially in image classification. Convolutional Neural Network (CNN, or ConvNet) is a special type of multi-layer neural network inspired by the mechanism of the optical and neural systems of humans. In 2012, a large deep convolutional neural network called AlexNet showed excellent performance on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). This marked the start of the broad use and development of convolutional neural network models (CNN) such as VGGNet, GoogleNet, ResNet, DenseNet, and many more. ### Convolutional Neural Network (CNN) A CNN is a framework developed using machine learning concepts. CNNs are able to learn and train from data on their own without the need for human intervention. In fact, there is only some pre-processing needed when using CNNs. They develop and adapt their own image filters, which have to be carefully coded for most algorithms and models. CNN frameworks have a set of layers that perform particular functions to enable the CNN to perform these functions. #### CNN Architecture and Layers The basic unit of a CNN framework is known as a **neuron**. The concept of neurons is based on human neurons, where synapses occur due to **neuron activation.** These are statistical functions that calculate the weighted average of inputs and apply an activation function to the result generated. Layers are a cluster of neurons, with each layer having a particular function. ![[Pasted image 20240317214746.png]] A CNN system may have somewhere between 3 to 150 or even more layers: The “deep” of Deep neural networks refers to the number of layers. One layer’s output acts as another layer’s input. Deep multi-layer neural networks include Resnet50 (50 layers) or ResNet101 (101 layers). ![[Pasted image 20240317214810.png]] CNN layers can be of four main types: Convolution Layer, ReLu Layer, Pooling Layer, and Fully-Connected Layer. 1. **Convolution Layer:** A convolution is the simple application of a filter to an input that results in an activation. The convolution layer has a set of trainable filters that have a small receptive range but can be used to the full depth of data provided. Convolution layers are the major building blocks used in convolutional neural networks. 2. **ReLu Layer:** ReLu layers, also known as Rectified linear unit layers, are activation functions applied to lower overfitting and build the accuracy and effectiveness of the CNN. Models that have these layers are easier to train and produce more accurate results. 3. **Pooling Layer:** This layer collects the result of all neurons in the layer preceding it and processes this data. The primary task of a pooling layer is to lower the number of factors being considered and give streamlined output. 4. **Fully-Connected Layer:** This layer is the final output layer for CNN models that flattens the input received from layers before it and gives the result.