Machine Learning | Amazon Rekognition Flashcards
What is Amazon Rekognition?
General
Amazon Rekognition | Machine Learning
Amazon Rekognition is a service that makes it easy to add powerful visual analysis to your applications. Rekognition Image lets you easily build powerful applications to search, verify, and organize millions of images. Rekognition Video lets you extract motion-based context from stored or live stream videos and helps you analyze them.
Rekognition Image is an image recognition service that detects objects, scenes, and faces; extracts text; recognizes celebrities; and identifies inappropriate content in images. It also allows you to search and compare faces. Rekognition Image is based on the same proven, highly scalable, deep learning technology developed by Amazon’s computer vision scientists to analyze billions of images daily for Prime Photos.
Rekognition Image uses deep neural network models to detect and label thousands of objects and scenes in your images, and we are continually adding new labels and facial recognition features to the service. With Rekognition Image, you only pay for the images you analyze and the face metadata you store.
Rekognition Video is a video recognition service that tracks people; detects activities; and recognizes objects, celebrities, and inappropriate content in videos stored in Amazon S3 and live video streams from Acuity. Rekognition Video detects persons and tracks them through the video even when their faces are not visible, or as the whole person might go in and out of the scene. This makes investigation and real-time monitoring of individuals like Persons of Interest easy and accurate. For example, this could be used in an application that sends a real-time notification when someone delivers a package to your door. Rekognition Video allows you also to index metadata like objects, activities, scene, celebrities, and faces that make video search easy.
What is deep learning?
General
Amazon Rekognition | Machine Learning
Deep learning is a sub-field of Machine Learning and a significant branch of Artificial Intelligence. It aims to infer high-level abstractions from raw data by using a deep graph with multiple processing layers composed of multiple linear and non-linear transformations. Deep learning is loosely based on models of information processing and communication in the brain. Deep learning replaces handcrafted features with ones learned from very large amounts of annotated data. Learning occurs by iteratively estimating hundreds of thousands of parameters in the deep graph with efficient algorithms.
Several deep learning architectures such as convolutional deep neural networks (CNNs), and recurrent neural networks have been applied to computer vision, speech recognition, natural language processing, and audio recognition to produce state-of-the-art results on various tasks.
Amazon Rekognition is a part of the Amazon AI family of services. Amazon AI services use deep learning to understand images, turn text into lifelike speech, and build intuitive conversational text and speech interfaces.
Do I need any deep learning expertise to use Amazon Rekognition?
General
Amazon Rekognition | Machine Learning
No. With Amazon Rekognition, you don’t have to build, maintain or upgrade deep learning pipelines.
To achieve accurate results on complex computer vision tasks such as object and scene detection, face analysis, and face recognition, deep learning systems need to be tuned properly and trained with massive amounts of labeled ground truth data. Sourcing, cleaning, and labeling data accurately is a time-consuming and expensive task. Moreover, training a deep neural network is computationally expensive and often requires custom hardware built using Graphics Processing Units (GPU).
Amazon Rekognition is fully managed and comes pre-trained for image and video recognition tasks, so that you don’t have invest your time and resources on creating a deep learning pipeline. Amazon Rekognition continues to improve the accuracy of its models by building upon the latest research and sourcing new training data. This allows you to focus on high-value application design and development.
What are the most common use cases for Amazon Rekognition?
General
Amazon Rekognition | Machine Learning
The most common use-cases for Rekognition Image include:
Searchable Image Library
Face-Based User Verification
Sentiment Analysis
Facial Recognition
Image Moderation
License Plate Recognition
The most common use-cases for Rekognition Video include:
Immediate response for public safety and security
Investigative analysis of events for public safety
Search Index for video archives
Easy filtering of video for explicit and suggestive content
How do I get started with Amazon Rekognition?
General
Amazon Rekognition | Machine Learning
If you are not already signed up for Amazon Rekognition, you can click the “Try Amazon Rekognition” button on the Amazon Rekognition page and complete the sign-up process. You must have an Amazon Web Services account; if you do not already have one, you will be prompted to create one during the sign-up process. Once you are signed up, try out Amazon Rekognition with your own images and videos using the Amazon Rekognition Management Console or download the Amazon Rekognition SDKs to start creating your own applications. Please refer to our step-by-step Getting Started Guide for more information.
What APIs does Amazon Rekognition offer?
General
Amazon Rekognition | Machine Learning
Amazon Rekognition Image offers APIs to detect objects and scenes, detect and analyze faces, recognize celebrities, detect inappropriate content, and search for similar faces in a collection of faces, along with APIs to manage resources. Rekognition Image also offers APIs to compare faces and extract text, while Rekognition Video also offers APIs to track persons and manage live stream video from Acuity. For details, please refer to the Amazon Rekognition API Reference.
What image and video formats does Amazon Rekognition support?
General
Amazon Rekognition | Machine Learning
Amazon Rekognition Image currently supports the JPEG and PNG image formats. You can submit images either as an S3 object or as a byte array. Amazon Rekognition Video operations can analyze videos stored in Amazon S3 buckets. The video must be encoded using the H.264 codec. The supported file formats are MPEG-4 and MOV. A codec is software or hardware that compresses data for faster delivery and decompresses received data into its original form. The H.264 codec is commonly used for the recording, compression and distribution of video content. A video file format may contain one or more codecs. If your MOV or MPEG-4 format video file does not work with Rekognition Video, check that the codec used to encode the video is H.264.
What file sizes can I use with Amazon Rekognition?
General
Amazon Rekognition | Machine Learning
Amazon Rekognition Image supports image file sizes up to 15MB when passed as an S3 object, and up to 5MB when submitted as an image byte array. Amazon Rekognition Video supports up to 8 GB files and up to 2 hour videos when passed through as an S3 file.
How does image resolution affect the quality of Rekognition Image API results ?
General
Amazon Rekognition | Machine Learning
Amazon Rekognition works across a wide range of image resolutions. For best results we recommend using VGA (640x480) resolution or higher. Going below QVGA (320x240) may increase the chances of missing faces, objects, or inappropriate content; although Amazon Rekognition accepts images that are at least 80 pixels in both dimensions.
How small can an object be for Amazon Rekognition Image to detect and analyze it?
General
Amazon Rekognition | Machine Learning
As a rule of thumb, please ensure that the smallest object or face present in the image is at least 5% of the size (in pixels) of the shorter image dimension. For example, if you are working with a 1600x900 image, the smallest face or object should be at least 45 pixels in either dimension.
How does video resolution affect the quality of Rekognition Video API results?
General
Amazon Rekognition | Machine Learning
The system is trained to recognize faces larger than 32 pixels (on the shortest dimension), which translate into a minimum size for a face to be recognized that varies from approximately 1/7 of the screen smaller dimension at QVGA resolution to 1/30 at HD 1080p resolution. For example, at VGA resolution, users should expect lower performances for faces smaller than 1/10 of the screen smaller dimension.
What else can affect the quality of the Rekognition Video APIs ?
General
Amazon Rekognition | Machine Learning
Besides video resolution, heavy blur, fast moving persons, lighting conditions, pose may affect the quality of the APIs.
What is the preferred user video content that is suitable for Rekognition Video APIs?
General
Amazon Rekognition | Machine Learning
This API works best with consumer and professional videos taken with frontal field of view in normal color and lighting conditions. This API is not tested for black and white, IR or extreme lighting condition. Applications that are sensitive to false alarms are advised to discard outputs with confidence score below a selected (application-specific) confidence score.
In which AWS regions is Amazon Rekognition available?
Object and Scene Detection
Amazon Rekognition | Machine Learning
Amazon Rekognition Image is currently available in the US East (Northern Virginia), US West (Oregon), US East (Ohio) , EU (Ireland), Asia Pacific (Tokyo), Asia Pacific (Sydney) and the AWS GovCloud (US) regions. Amazon Rekognition Video is available in US East (Northern Virginia), US West (Oregon), US East (Ohio) , EU (Ireland), Asia Pacific (Tokyo) and Asia Pacific (Sydney) regions. Amazon Rekognition Video real-time streaming is only available in US East (Northern Virginia), US West (Oregon), EU (Ireland) and Asia Pacific (Tokyo) regions.
What is a label?
Object and Scene Detection
Amazon Rekognition | Machine Learning
A label is an object, scene, or concept found in an image based on its contents. For example, a photo of people on a tropical beach may contain labels such as ‘Person’, ‘Water’, ‘Sand’, ‘Palm Tree’, and ‘Swimwear’ (objects), ‘Beach’ (scene), and ‘Outdoors’ (concept).
What is a confidence score and how do I use it?
Object and Scene Detection
Amazon Rekognition | Machine Learning
A confidence score is a number between 0 and 100 that indicates the probability that a given prediction is correct. In the tropical beach example, if the object and scene detection process returns a confidence score of 99 for the label ‘Water’ and 35 for the label ‘Palm Tree’, then it is more likely that the image contains water but not a palm tree.
Applications that are very sensitive to detection errors (false positives) should discard results associated with confidence scores below a certain threshold. The optimum threshold depends on the application. In many cases, you will get the best user experience by setting minimum confidence values higher than the default value.
What is Object and Scene Detection?
Object and Scene Detection
Amazon Rekognition | Machine Learning
Object and Scene Detection refers to the process of analyzing an image or video to assign labels based on its visual content. Amazon Rekognition Image does this through the DetectLabels API. This API lets you automatically identify thousands of objects, scenes, and concepts and returns a confidence score for each label. DetectLabels uses a default confidence threshold of 50. Object and Scene detection is ideal for customers who want to search and organize large image libraries, including consumer and lifestyle applications that depend on user-generated content and ad tech companies looking to improve their targeting algorithms.
What types of labels does Amazon Rekognition support?
Object and Scene Detection
Amazon Rekognition | Machine Learning
Rekognition supports thousands of labels belonging to common categories including, but not limited to:
People and Events: ‘Wedding’, ‘Bride’, ‘Baby’, ‘Birthday Cake’, ‘Guitarist’, etc.
Food and Drink: ‘Apple’, ‘Sandwich’, ‘Wine’, ‘Cake’, ‘Pizza’, etc.
Nature and Outdoors: ‘Beach’, ‘Mountains’, ‘Lake’, ‘Sunset’, ‘Rainbow’, etc.
Animals and Pets: ‘Dog’, ‘Cat’, ‘Horse’, ‘Tiger’, ‘Turtle’, etc.
Home and Garden: ‘Bed’, ‘Table’, ‘Backyard’, ‘Chandelier’, ‘Bedroom’, etc.
Sports and Leisure: ‘Golf’, ‘Basketball’, ‘Hockey’, ‘Tennis’, ‘Hiking’, etc.
Plants and Flowers: ‘Rose’, ‘Tulip’, ‘Palm Tree’, ‘Forest’, ‘Bamboo’, etc.
Art and Entertainment: ‘Sculpture’, ‘Painting’, ‘Guitar’, ‘Ballet’, ‘Mosaic’, etc.
Transportation and Vehicles: ‘Airplane’, ‘Car’, ‘Bicycle’, ‘Motorcycle’, ‘Truck’, etc.
Electronics: ‘Computer’, ‘Mobile Phone’, ‘Video Camera’, ‘TV’, ‘Headphones’, etc.
How is Object and Scene Detection different for video analysis?
Object and Scene Detection
Amazon Rekognition | Machine Learning
Rekognition Video enables you to automatically identify thousands of objects - such as vehicles or pets - and activities - such as celebrating or dancing - and provides you with timestamps and a confidence score for each label. It also relies on motion and time context in the video to accurately identify complex activities, such as “blowing a candle” or “extinguishing fire”.
I can’t find the label I need. How do I request a new label?
Unsafe Content Detection
Amazon Rekognition | Machine Learning
Please send us your requests through AWS Customer Support. Amazon Rekognition continuously expands its catalog of labels based on customer feedback.
What is Unsafe Content Detection?
Unsafe Content Detection
Amazon Rekognition | Machine Learning
Amazon Rekognition’s Unsafe Content Detection is a deep-learning based easy to use API for detection of explicit and suggestive adult content in images. Developers can use this additional metadata to filter inappropriate content based on their business needs. Beyond flagging an image based on presence of adult content, Image Moderation also returns a hierarchical list of labels with confidence scores. These labels indicate specific categories of adult content, thus providing more granular control to developers to filter and manage large volumes of user generated content (UGC). This API can be used in moderation workflows for applications such as social and dating sites, photo sharing platforms, blogs and forums, apps for children, e-commerce site, entertainment and online advertising services.
What types of explicit and suggestive adult content does Amazon Rekognition detect?
Unsafe Content Detection
Amazon Rekognition | Machine Learning
Amazon Rekognition detects the following types of explicit and suggestive adult content in images:
Explicit Nudity
Nudity
Graphic Male Nudity
Graphic Female Nudity
Sexual Activity
Partial Nudity
Suggestive
Female Swimwear or Underwear
Male Swimwear or Underwear
Revealing Clothes
Amazon Rekognition’s Unsafe Image Detection API returns a hierarchy of labels, as well as a confidence score for each detected label. For instance, given an inappropriate image, Rekognition may return “Explicit Nudity” with a confidence score as a top level label. Developers could just use this to flag content. In the same response, Rekognition also returns second level of granularity by providing additional context like “Graphic Male Nudity” with its own confidence score. Developers could use this information to build more complex filtering logic.
Please note that the Unsafe Image Detection API is not an authority on, or in any way purports to be an exhaustive filter of, explicit and suggestive adult content. Furthermore, this API does not detect whether an image includes illegal content (such as child pornography) or unnatural adult content.
Can Amazon Rekognition’s Unsafe Content Detection API detect other inappropriate content besides explicit and suggestive adult content?
Unsafe Content Detection
Amazon Rekognition | Machine Learning
Currently, Rekognition only supports the labels we have outlined above. We will work to continuously add and improve labels based on feedback from our customers.
If you require other types of inappropriate content to be detected in images, please reach out to us using the feedback process outlined later in this section.
How is Unsafe Content Detection different for video analysis?
Unsafe Content Detection
Amazon Rekognition | Machine Learning
Rekognition Video enables you to automatically identify explicit or suggestive adult content and also provides you with timestamps and a confidence score for each content type label.
How can I ensure that Rekognition meets my adult image and video detection use case?
Unsafe Content Detection
Amazon Rekognition | Machine Learning
Rekognition’s Unsafe Content Detection models have been and tuned and tested extensively, but we recommend that you measure the accuracy on your own data sets to gauge performance.
You can use the ‘MinConfidence’ parameter in your API requests to balance detection of content (recall) vs the accuracy of detection (precision). If you reduce ‘MinConfidence’, you are likely to detect most of the inappropriate content, but are also likely to pick up content that is not actually explicit or suggestive. If you increase ‘MinConfidence’ you are likely to ensure that all your detected content is actually explicit or suggestive but some inappropriate content may not be tagged. For examples on how to use ‘MinConfidence’ for images, please refer to the documentation here.
In case Rekogntion fails to detect adult content in images or videos, please reach out to us using the feedback process outlined below.
How can I give feedback to Rekognition to improve its Unsafe Content Detection?
Facial Analysis
Amazon Rekognition | Machine Learning
Please send us your requests through AWS Customer Support. Amazon Rekognition continuously expands the types of inappropriate content detected based on customer feedback. It usually takes 6-8 weeks to add new types of explicit or suggestive adult content. Please note that illegal content (such as child pornography) will not be accepted through this process.
What is Facial Analysis?
Facial Analysis
Amazon Rekognition | Machine Learning
Facial analysis is the process of detecting a face within an image and extracting relevant face attributes from it. Amazon Rekognition Image takes returns the bounding box for each face detected in an image along with attributes such as gender, presence of sunglasses, and face landmark points. Rekognition Video will return the faces detected in a video with timestamps and, for each detected face, the position and a bounding box along with face landmark points.
What face attributes can I get from Amazon Rekognition?
Facial Analysis
Amazon Rekognition | Machine Learning
Amazon Rekognition returns the following facial attributes for each face detected, along with a bounding box and confidence score for each attribute:
Gender
Smile
Emotions
Eyeglasses
Sunglasses
Eyes open
Mouth open
Mustache
Beard
Pose
Quality
Face landmarks
What is face pose?
Facial Analysis
Amazon Rekognition | Machine Learning
Face pose refers to the rotation of a detected face on the pitch, roll, and yaw axes. Each of these parameters is returned as an angle between -180 and +180 degrees. Face pose can be used to find the orientation of the face bounding polygon (as opposed to a rectangular bounding box), to measure deformation, to track faces accurately, and more.