Video Processing
Video Processing
Introduction
• Video content analysis deals with the extraction of metadata from raw
video to be used as components for further processing in applications
such as search, summarization, classification or event detection.
• The main goal of video analytics is to automatically recognize
temporal and spatial events in videos.
• This technical capability is used in a wide range of domains including
entertainment, video retrieval and video browsing, health-care, retail,
automotive, transport, home automation, flame and smoke detection,
safety, and security
How does video analytics work?
• Video content analysis can be done in two different ways:
i. In real time, by configuring the system to trigger alerts for
specific events and incidents that unfold in the moment.
ii. In post processing, by performing advanced searches to
facilitate forensic analysis tasks.
• Feeding the system:The data being analyzed can come from
various streaming video sources. The most common
are CCTV cameras, traffic cameras and online video feeds.
• A key goal is coverage: we need to have a clear view of the
entire area, and from various angles,
Central processing vs edge processing
• Video analysis software can be run centrally on servers that
are generally located in the monitoring station, which is
known as central processing.
• Or, it can be embedded in the cameras themselves, a
strategy known as edge processing.
• With a hybrid approach, the processing performed by the
cameras reduces the data being processed by the central
servers
Classification of Video Analysis
Tools and Techniques for Video Analysis
Facial Recognition in Video Analysis
• Facial recognition systems that can identify or verify a person from a
digital image or video find application in a variety of contexts.
• Facial recognition works in two parts: face detection and face
identification.
i. In the first stage, the system detects faces in the input data using
methods like background subtraction.
ii. Next, it measures the facial features to define facial landmarks and
tries to match them with a known dataset. Based on the percentage
of accuracy of match, the faces can be recognized or classified as
unknown.
• Dlib’s face landmark predictor to detect a face and extract features
such as eyes, mouth, brows, nose, and jawline.
• The image was standardized by cropping to include just these features
and aligning it based on the location of eyes and the bottom lip.
• The preprocessed image was then mapped to a numerical vector
representation. An algorithmic comparison of the vector images made
facial recognition possible.
Detecting Motion
• We compare each frame of a video stream to the previous one
and detect all spots that have changed.
• We convert the image to gray and smooth it out a bit by blurring
the image. Converting to grey converts all RGB pixels to a value
between 0 and 255 where 0 is black and 255 is white.
• We’ll compare the previous frame with the current one by
examining the pixel values. Remember that since we’ve
converted the image to grey all pixels are represented by a
single value between 0 and 255.
• We use Threshold function “cv2.threshold” to convert each
pixel to either 0 (white) or 1 (black). The threshold for this is 20.
• Finding areas and contouring: We want to find the area that has
changed since the last frame, not each pixel. In order to do so, we first
need to find an area.
• cv.findContours it retrieves contours or outer limits from each white
spot from the part above.
Application of Video Analysis