Image Recognition in Self-Driving Cars Using CNN
Image Recognition in Self-Driving Cars Using CNN
Shreya Muppidi *
Department of Mechanical Engineering, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India.
Publication history: Received on 11 June 2023; revised on 24 July 2023; accepted on 27 July 2023
Abstract
The concept of neural networks has existed for over decades but was never considerably acknowledged as much as of
today. The main reason happens to be “data.” To analyze a problem statement using neural networks, large data is
required in its various forms and therefore it has not been instigated back in the day. But now, with today’s vast
technology, neural networks have begun to take over some of the numerous machine learning applications with the
help of huge datasets. In this research paper, a certain deep learning approach namely convolutional neural network
(CNN) has been discussed which plays a major role in classifying and recognizing objects i.e., obstacles on the road.
Earlier, computer-based algorithms have been followed for image processing in vehicles which seemed to be applicable
to a certain extent. So much so, now with deep learning approaches, simpler yet faster networks can be implemented
for a safe drive. Automatic vehicles such as Tesla which is examined to be “fully self-driving” nevertheless needs a driver
to watch over the road at some particular point. This proves that there is not yet a fully controlled self-driving car
created which can drive itself without a spectator. This appeal can be solved by means of image detection mechanisms
using neural networks along with a programming language to deploy machine learning models at ease. The main
objective is to develop a simple and accurate algorithm to make image recognition more precise for a better self-driving
car.
Keywords: Image recognition; Self-driving cars; Deep learning; Neural networks; CNN
1. Introduction
Arthur Lee Samuel defines machine learning as the field of study that gives computers the ability to learn and perform
tasks without being explicitly programmed. Of course, a software can be programmed to achieve a function but a new
approach called machine learning followed by deep learning is radically changing how we create software to solve these
problems. There are many different approaches to machine learning but all these different types of learning
use statistical algorithms and data. Deep learning, on the other hand, is just a type of machine learning, inspired by the
structure of a human brain. Deep learning algorithms attempt to draw similar conclusions as humans would, by
simultaneously analysing data with a given radical structure. To achieve the following, deep learning uses a multi-layered
structure of algorithms called neural networks.
For a system to inherit computer vision, it is required to program and train a computer by breaking the task down into
smaller and easier details. The task is as terrifying as it sounds nevertheless, simpler when broken down into a
hierarchical. This is where machine learning and artificial intelligence comes to use. Generally, image recognition is the
ability of a system to identify people, places, objects, and actions in images. It makes the use of technologies involving
machine vision and algorithms with artificial intelligence to recognize and distinguish images through a camera system.
As it is known that a computer sees an image or video in the form of 0s and 1s. It stores the image in the form of
combination of pixels which is the smallest unit in an image. Each pixel contains a various number of channels. A
grayscale image has only one pixel, whereas a coloured image contains three channels namely red, green, and blue. In a
digital-coloured image, each channel of each pixel has a value between 0 and 255. Each of these values when represented
in binary can be understood by the computer. But the problem is just being able to read the image is of no use if it cannot
understand what it means. This is where machine learning comes into the picture.
Social networks also use image data which can be analyzed and visualized to comprehend customer preferences, further
this data can be used for customized marketing. A powerful commercial use can be seen in the field of stock photography
and video as well where stock websites provide platforms so photographers and video makers can sell their content.
Google photos which consist of a huge visual dataset uses image recognition along with its subfield deep learning, to
sort millions of images on the internet in order to classify them more accurately. Pinterest uses algorithms to identify
the patterns in a picture that have been pinned so that similar images are displayed when you search for them which
works as an image recommender Similarly, there are many other websites and companies that use this technology
to develop and improvise their services and marketing.
These neural networks are composed of multiple layers of artificial neurons. The first layer usually extracts basic
features such as horizontal or diagonal edges. The corresponding output is passed on to the next layer which detects
more complex features such as corners or combinational edges. As we move deeper into the neural network, it can
identify even more complex features such as objects, faces, etc. Alongside the input and output, the CNN consists of
hidden layers which typically consist of a series of convolutional layers. ReLU is the typical activation function which is
thereby followed by additional operations such as pooling layers, fully connected layers, and normalization layers.
Backpropagation is a method used for error distribution and weight adjustment. The efficient use of convolutional
neural networks depends on more layers and larger networks i.e., larger the dataset, higher the efficiency of CNNs.
343
International Journal of Science and Research Archive, 2023, 09(02), 342–348
OpenCV is a large open-source, cross platform library for computer vision, machine learning, and image
processing. Currently now it plays a major role in real-time operation which is very important in today’s
technology. One can process images and videos to identify objects, faces, or even handwriting of a human by
using this library. The identification of image patterns and its other features is achieved by using vector space
and performing mathematical operations. Using OpenCV we can also analyse videos and estimate the motion
in it, subtract the background, and finally track objects in the video.
PyTorch is an open-source framework that eases one’s way to develop machine learning models and deploy
them to production. PyTorch provides dynamic computation graphs and libraries for distributed training,
which are tuned for high performance on AWS. Thus, it is one of the most popular deep learning libraries
competing with Keras and TensorFlow. Using PyTorch, one can process images and videos to develop a highly
accurate and precise computer vision model.
YOLO is an acronym for "you only look once". It basically is a real time object detection system. Instead of using
classifiers to perform detection of objects, we frame object detection as a regression problem to spatially
separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes
and class probabilities directly. The biggest advantage of YOLO is its incredible speed. It is very fast and can
process 45 frames per second. It has been used in various applications to detect traffic signals, people, parking
meters, animals, and other objects.
To build an image recognition model using CNN, we follow the listed steps:
344
International Journal of Science and Research Archive, 2023, 09(02), 342–348
345
International Journal of Science and Research Archive, 2023, 09(02), 342–348
346
International Journal of Science and Research Archive, 2023, 09(02), 342–348
347
International Journal of Science and Research Archive, 2023, 09(02), 342–348
4. Conclusion
This method of detecting images through computer vision in self-driving cars has proven to be much more accurate
than regular machine learning outlooks. The deep learning algorithm would perform a classification of the images by the
extraction of the features. Whereas manually, a programmer must explicitly meddle in the action for the model to come
to a conclusive statement. Although convolutional neural network models seem to showcase great performance, they all
have their own consequences to address. For example, during computer vision, when an object viewed from a certain
angle slightly different from the way the model has been trained, it may cause false predictions. This may lead the vehicle
to end up in taking unnecessary actions during the drive. Alongside the position of the image of the object, ideal lighting
as well as colour contrast can also affect the model to flaw. However, this can be solved by adding different variations
to the image during the training process which is otherwise known as data augmentation. Nonetheless, this all brings
us back to collecting and analysing proper data before feeding it to the training set.
Although the self-driving cars created in today’s day to day life have plenty of features, they still have not been developed
to its highest extreme. Soon driverless cars will be developed where people can pretty much take a small nap while
taking a ride to their destination without worrying about the road.
References
[1] Hornigold, Thomas Building a Moral Machine: Who Decides the Ethics of Self Driving Cars? . Singularity Hub, (31
October 2018).
[2] Dr. Sebastian Raschka Chapter 1: Introduction to Machine Learning and Deep Learning .5 August 2020.
Retrieved 28 October 2020.
[3] Phantom Auto will tour city . The Milwaukee Sentinel. 8 December 1926. Retrieved 23 July 2013.
[4] S. Mittal and S. Vaishay A Survey of Techniques for Optimizing Deep Learning on GPUs Archived 2021-05-09 at
the Wayback Machine , Journal of Systems Architecture, 2019.
[5] Mesnil, Gregoire; Deng, Li; Gao, Jianfeng; He, Xiaodong; Shen, Yelong Learning Semantic Representations Using
Convolutional Neural Networks for Web Search – Microsoft Research , Microsoft Research (April 2014).
[6] Takeo Kanade Three-Dimensional Machine Vision. Springer Science & Business Media. ISBN 978-1-4613-1981-
8 (6 December 2012).
[7] Margaret Ann Boden Mind as Machine: A History of Cognitive Science. Clarendon Press. p. 781. ISBN 978-0-19-
954316-8 (2006).
[8] Berton cello, Ten ways autonomous driving could redefine the automotive world . McKinsey & Company.
348