Computer Vision Flashcards

Question 1

Q

Define computer vision. List any three real life applications of CV.

Answer

A

Computer Vision is a domain of Artificial Intelligence which enables machines to see through images or visual data, process and analyze them on the basis of algorithms and methods in order to analyze actual phenomena with images.
- Facial recognition
- Face filters
- Google search by image
- Self-driving cars
- Medical Imaging
- Computer Vision in retail
- Google Translate App

Question 2

Q

How does a computer see an image?

Answer

A

A computer sees an image as a collection of numerical values rather than as a visual representation.

Question 3

Q

Explain the tasks used in CV applications for single objects.

Answer

A

Classification: Image classification problem is the task of assigning an input image one label from a fixed set of categories. This is one of the core problems in CV that despite its simplicity, has a large number of practical applications.
Classification + Localization: This is the task which involves both processes of identifying what object is present in the image and at what location the object is present in that image. It is used only for single objects.

Question 4

Q

Explain the tasks used in CV Applications for Multiple objects.

Answer

A

Object Detection: Object detection is the process of finding instances of real-world objects such as faces, bicycles and buildings in images or videos. Object detection algorithms typically use extracted features and learning algorithms for recognizing instances of an object category. It is commonly used in applications such as image retrieval and automated vehicle parking systems.
Instance Segmentation: Segmentation is the process of detecting instances of the objects, giving them a category and then giving each pixel a label on the basis of that. A segmentation algorithm takes an image as input and outputs a collection of regions (or segments).

Question 5

Q

What is a pixel?

Answer

A

The word ‘pixel’ means a picture element. Every photograph, in digital form, is made up of pixels. They are the smallest unit of information that make up a picture. Usually round or square, they are typically arranged in a 2-dimensional grid. The pixels approximate the actual image. The more pixels you have, the more closely the image resembles the original. Two important features of a pixel in a digital image are:
1. Color (or Intensity Value): Each pixel stores a color value, such as RGB format for color images or as a grayscale intensity for black and white images.
2. Spatial location (position in the image)

Question 6

Q

Explain the term resolution with an example.

Answer

A

Resolution is the number of pixels in an image. This can be expressed in 2 ways:
- As a product of the number of pixels along the width multiplied by the pixels along the height of the image. e.g. 1280X1024
- As a single number. e.g. 5 megapixel

Question 7

Q

What is pixel value?

Answer

A

Each of the pixels that represents an image stored inside a computer has a pixel value which describes how bright that pixel is, and/or what color it should be. The most common pixel format is the byte image- a range of possible values from 0 to 255.

Question 8

Q

What are grayscale images?

Answer

A

Grayscale images are images which have a range of shades of gray without apparent color. The darkest possible shade is black or zero value of pixel. The lightest possible shade is white, which is the total presence of color or 255 value of pixel. Intermediate shades of grey are represented by equal brightness levels of the primary colors. A grayscale has each pixel of size one byte having a single plane of 2d array of pixels. The size of a grayscale image is defined as the height x width of that image. Each pixel will have a value in range 0-255

Question 9

Q

What are RGB images?

Answer

A

These images are made up of the three primary colors: Red, Green and Blue. All other colors can be made my combining different intensities or red, green and blue. It has three plane/channel. Each pixel has a set of three different values which together give color to that particular pixel [0-255, 0-255, 0-255]. For eg: [0,0,0] will be black and [255,255,255] will be white. [255,0,0] will be red, [0,255,0] will be green, [0,0,255] will be blue. Every RGB image is stored in the form of three different channels called the R channel, the G channel and the B channel. Each plane separately has a number of pixels with each pixel value varying from 0-255. All three planes when combined together give a color image. This means that in an RGB image, each pixel has a set of three different values which together give color to that particular pixel.

Question 10

Q

Vimal bought a Galaxy S23 Ultra 5G with a screen resolution of 3088 x 1440. What does the numbers signify?

Answer

A

The numbers 3088 × 1440 represent the screen resolution of the Galaxy S23 Ultra 5G in pixels.
3088 pixels → Number of pixels along the width (horizontal resolution).
1440 pixels → Number of pixels along the height (vertical resolution).

Question 11

Q

Write three differences between CV and human vision.

Answer

A

Understanding vs. Processing
Human Vision: The brain interprets images with context, prior knowledge, and reasoning.
Computer Vision: CV processes images mathematically and relies on algorithms without inherent understanding.
Adaptability
Human Vision: Easily adapts to different lighting, angles, and occlusions.
Computer Vision: Struggles with variations unless trained with extensive data.
Speed and Precision
Human Vision: Slower but highly intuitive, recognizing emotions, objects, and contexts effortlessly.
Computer Vision: Faster in processing large amounts of data but may misinterpret complex scenes.