CALL US NOW

For more program benefits & expert career guidance

+91 90036 91315 (OR)

contact@theblackbucks.com

Note: You can call us or expect a call back anytime Mon-Sat from 9 AM - 9 PM

Computer Science, Machine Learning

Shallow and Complex Reasoning in Computer Vision.

Manoj Kesani Sep 20,2020

In our previous blog post, we began our journey into the history of computer vision. The field of computer vision has revolutionized from the shallow, manual processing techniques to the more complex reasoning method. Check out our previous blog post to learn more about the shallow processing methods of the 1970’s – 2010’s. LINK TO PREVIOUS BLOG

Since 2010, the computer vision field has been constantly evolving and innovating image and video processing techniques that have been seamlessly integrated into our regular lives. Can you spot a computer vision application that you use in your daily life?

The advanced concepts and algorithms of the current age of computer vision are collectively known as ‘complex reasoning’ concepts in computer vision. Complex reasoning can be categorized into four features:

  • Image Classification – detecting disease and diagnoses from medical images

  • Image Detection – autonomous ‘self-driving’ cars detecting obstacles to navigate around

  • Image Segmentation – segment 3D medical scan results of patients for diagnosis and analysis, which previously needed a high level of expertise.

  • Image Retrieval – search for images in a database that are similar to a query image, as used in large search engines, such as Google.

In recent times, complex reasoning concepts are adapting to a deep learning-based structure. From a deep-learning perspective, a computer vision task can be divided into three levels:

  • Low-level processing

  • Intermediate-level processing

  • High-level processing

Additionally, all these tasks need a database to store their information. Any computer vision architecture consists of a database of information and task-solvers build around it.

Low-Level Processing

Low-level processing of a computer vision task involves image/video acquisition and pre-processing for further analysis levels. There are several modes of image capturing that cater to different features:

  • RGB cameras...........Capturing light in Red,Green,Blue ranges.

  • RGBD cameras.........Capturing Depth along with RGB information

  • LIDAR cameras.........Projects lasers and measures only distance

  • X-ray machines........Gives us a response of tissues with x-rays waves

  • Sonogram...............An image formed by acoustic information

  • Binocular cameras..........Gives us 2 images that are separated

  • Telescopes.............Images in different wavelengths of very far objects

  • Microscopes............Images of very small objects

We acquire data from any of the above instruments and store them in a format that our code can understand. Generally, this format is a Numpy array in Python.

Consider this black and white image of a house.

We acquire this image for our mode of capture and store the value at each pixel. This value will range from 0 to 1, where 0 represents black and 1 represents white.

A coloured RGB image is divided into three layers – one for each color, red, green and blue. Each of the layers holds a value for each pixel in the image, in the range of 0 to 255.


Intermediate-Level Processing

Intermediate-level processing of a computer vision task involves extracting features from the data. There are many techniques used for feature extraction. Many of these concepts have existed from the age of shallow reasoning in computer vision. However, with the advancement of deep-learning, methods of higher accuracy have been developed and put to good use.

From the black and white image of a house in the previous section, we can apply a feature extractor for edge detection. The output presents the edges between items in the image. Edge detection feature extractors were one of the first image processing techniques to be developed. It is considered a shallow reasoning tool.

We can use image segmentation tools to label different aspects of the image. The output identifies and labels segments of the image, such as the roof, grass, sky and tree. This process requires a more advanced form of processing and is considered a complex reasoning technique.



High-Level Processing

High-level processing of a computer vision task is the highest level of abstraction. This level of processing allows us to understand the image and extrapolate information. Looking at our example of the black and white house, we have identified the edges and the segmentation of the image. With higher-level processing, we can deduce and reason the aspects that make a house, and so forth.

Application of Computer Vision

Now that we have understood the basic structure of a complex reasoning computer vision task, let’s put it into action.

The task is to detect and track all of the people appearing in a surveillance camera video feed. The program should be able to undergo:

  • Object Classification.........Identify and classify a person

  • Object Localization.........Locate the person in space

  • Object Re-Identification.....Re-identify a classified person

  • Object Tracking...............Track the movement of a classified person

We can segregate our program into the three levels of processing of complex reasoning.

Low-Level Processing – acquiring the surveillance footage and storing the data in a Numpy array on disk

Intermediate-Level Processing – object classification and object localization

High-Level Processing – object re-identification and tracking

Tools and algorithms to use:

  • Object detection Algorithm........YOLO V3

  • Object Re-identification Algorithm.......Deep SORT

  • Tracking Algorithm..........Kalman Filter

Using the following command we can run object tracking on the video:

python object_tracker.py --video ./data/video/test.mp4 --output ./data/video/results.avi --weights ./weights/yolov3-tiny.tf --tiny