Image recognition with machine learning leverages the potential of algorithms to learn hidden knowledge from a dataset of organized and unorganized samples (Supervised Learning). The most popular machine learning technique is **deep learning**, where a lot of hidden layers are used in a model.
### Recent Advances in Image Classification
With the advent of deep learning, in combination with robust AI hardware and GPUs, outstanding performance can be achieved on image classification tasks. Hence, deep learning brought great successes in the entire field of image recognition, face recognition, and image classification algorithms achieve above human-level performance and real-time object detection. Additionally, there’s been a huge jump in algorithm inference performance over the last few years.
- For example, in 2017, the Mask R-CNN algorithm was the fastest real-time object detector on the MS COCO benchmark, with an inference time of 330 ms per frame.
- In comparison, the YOLOR algorithm released in 2021 achieves inference times of 12 ms on the same benchmark, thereby overtaking the popular YOLOv3 and YOLOv4 deep learning algorithms.
- The releases of YOLOv7 and YOLOv8 (2023) marked a new state-of-the-art that surpasses all previously known models, including YOLOR, in terms of speed and accuracy.
- With the Segment Anything Model (SAM), Meta AI released a new top performer for image instance segmentation. The SAM produces high-quality object masks from input prompts.
### Advantages of Deep Learning vs. Traditional Image Processing
In comparison to the conventional computer vision approach in early image processing around two decades ago, deep learning requires only the knowledge of engineering of a machine learning tool. It doesn’t need expertise in particular machine vision areas to create handcrafted features.
In any case, deep learning requires manual data labeling to interpret good and bad samples, which is known as **image annotation.** The process of gaining knowledge or extracting insights from data labeled by humans is called **supervised learning.** The process of creating such labeled data to train AI models needs tedious human work — for instance, to annotate regular traffic situations in autonomous driving. However, nowadays, we have large datasets with millions of high-resolution labeled data of thousands of categories such as ImageNet, LabelMe, Google OID, or MS COCO.