02 Python Alt Data NLP Class Example
02 Python Alt Data NLP Class Example
In the course notes, we showed that we can use a time series of car counts for European retailers to understand
earnings per share statistics for these firms. In practice, how do we go about convert images of cars in a parking
lot on different days into such a time series? In other words, how do we structure the image data from an
unstructured form.
In this class example, we shall show you how to convert images of car counts collected a different times of the
same car park into a time series which can be more easily processed.
Import IOpenCV cv2 . We also import the higher level cvlib library, which has convenient methods to
detect_common_objects and also draw bounding boxes for detected objects on images draw_bbox .
We write a function car_count_from_image which takes in a image filename and outputs the number of
car or truck objects in the image, alongside the binary image representation, the coordinates of each object
bounding box, the labels of those objects and finally the confidence of the classification.
In [3]: # Read the image from disk (Hint: use imread in cv2)
In [4]:
17
Is the detection of cars any good? Let's write a new function write_image_object_bounded_box which takes
as an input the filename of the car park image we would to run the car count algorithm on. The function then
draws boundary boxes on the detected objects and then writes it back to disk.
Let's try out our function for drawing the detected objects on our original image. Also display the modified
image back to the user. We see that the output is pretty good. It's detected most of the cars. Admittedly, it has
trouble for those cars which are obscured by trees. It is likely that from other camera angles, these cars wouldn't
be obscured. However, for simplicity we are just looking at one angle.
In [6]:
Out[6]:
We've now written functions for using cvlib and cv2 to detect cars (and trucks) from a particular image. The
next step is to create a list of all the car park images in our folder and sort that
# Go through all the root, directories underneath and then the files
# In a double nested for loop
# Hint: use os.walk
# Append the file name to the list
Each filename is in the form eg. 2015-11-16_0710.jpg so we can identify the date and time of the car count
image.
Convert the car count date/time and car count numbers into a Pandas DataFrame.
We now have time series for the car count which we can use to do analysis on. Let's try to calculate the average
car count by hour of the day.
Plot the car count by hour DataFrame. We get a very intuitive result, with more cars present at the start of the
workday and then rapidly dropping at the end of the working day.
In [11]:
Extending this analysis to create a car count for retailer car parks
We've created a time series for one car park over a few weeks. In practice, if we wanted to track the car count for
a retailer, we would need to replicate this analysis for a large number of their car parks. Typically, such images
would be sourced from satelites as opposed to on the ground CCTV, which we've used here, given we would
likely not have access to such CCTV.