AWS Rekognition and Lex

AI and Deep Learning Solutions by AWS to Streamline Operations and Enhance Customer Experience

AWS AI services bring deep learning technologies within reach of every developer. Whether you are getting started with AI or you are an expert in deep learning, this blog post will provide meaningful insight into the AI services of AWS and demonstrate their impressive functionality.

Amazon Rekognition

Amazon’s computer vision service, AWS Rekognition, helps embed visual analysis to various applications. It can search, verify, and organize tens of thousands of images and analyze motion-based content. The service offering is developed on a highly scalable deep learning technology, which allows the detection of objects, faces, and scenes, reads textual content and identifies any inappropriate content. Moreover, it can also perform face comparison.

Rekognition Image detects any objects and scenes in images, whereas Rekognition Video detects the movement of objects in a frame. It is capable of detecting human activity even when the face is not visible. Therefore, it makes up a use case in receiving notification if a delivery person comes near your entry gate for delivery. A few other widely implemented use cases for Rekognition Image include face recognition, sentimental analysis, and those for Rekognition Video are search indexes and explicit content filtering in online media.

The best part about this service is that you need not have the expertise to build, maintain or upgrade pipelines. To attain accuracy in any computer vision task, a considerable amount of labelled ground truth data and Graphics Processing Units (GPU) should be used for training to allow high computations. However, all this is automatically taken care of while using Rekognition. It is pre-trained for recognition related tasks, hence, leading to a fully managed deep learning pipeline. This helps in keeping your focus on the design and development of the core application.

The following figure shows the functions supported by the Rekognition service.

  • Object and Scene Detection: With AWS Rekognition, thousands of object models are pre-trained for recognition.
  • Custom Labeling: With custom labelling, you may identify the objects and scenes in images specific to your business needs.
  • Content Moderation: It allows detection of explicit content having violence, weapons, and any disturbing or offensive content. It is primarily used in child apps, dating sites, photo sharing apps etc.
  • Facial Analysis: AWS Rekognition enables the detection of faces in images and videos. It also allows the analysis of facial attributes detected in the image.
  • Face Comparison: It allows you to compare two faces and know the similarity levels.
  • Facial Recognition:  It enables identifying any individual by matching the captured frame from a collection or group of face frames.
  • Celebrity Recognition: This function helps in recognizing famous personalities in the field of sports, politics, entertainment, business etc.
  • Text Detection: It helps recognize the text in any image/video, such as addresses, captions, and subtitles. Visual search on social media and vehicle plate number identification by traffic cameras are its significant use cases.
  • PPE Detection: This is an exciting feature that detects PPE worn by the people in an image. This function helps improve workplace safety practices.
  • People Pathing: AWS Rekognition can track the path people take in videos, thus providing the location and facial landmarks. It helps provide customer insights concerning their movement patterns in a shopping mall or a grocery store.
  • Black Frames Detection: Usually, any video with advertisements will have black/empty frames used as cues for ads to begin. Using AWS Rekognition, this media analysis is automated.
  • End Credits Detection: Rekognition helps identify the beginning point for the end credits in a video so that prompts like “next movie” can be timely shown.
  • Shot Detection: A series of interrelated continuous images is called a shot. AWS Rekognition helps detect the start, end and duration of every shot, which helps set up thumbnails during video transition and identify spots for ads to begin.
  • Color Bars Detection: AWS Rekognition helps detect sections in a video comprising SMPTE bars that ensure the color is accurately calibrated. It is usually helpful in the detection of broadcast signal loss.

Fig. Comparison of top computer vision APIs by public cloud providers

Amazon Lex

Nowadays, the chatbot market is growing expansively, and almost every kind of business is benefitting from it. Majorly, the chatbot can be of three types: rule-based, AI bots, and hybrid bots.

The rule-based/linguistic chatbots offer fine-tuned control and are highly flexible. They use if-else logical conditions to direct conversation flow using a linguistic model; hence, their interaction capabilities are very structured. You can easily come across these chatbots on e-commerce platforms and social networking sites.

On the other hand, though higher in complexity, AI bots offer more real-life conversations. Over time, they learn and become contextually better. Due to machine learning capabilities, these bots are capable of learning from previous conversations.

Lastly, the hybrid bots take up the best of both- rule-based and AI. They make use of ML integrations that go beyond linguistic rules.

Amazon Lex is an AWS offering for making conversational bots capable of interacting through voice and text and is backed by ASR (Automatic Speech Recognition). Using Lex, you can publish chatbots on various chat services and mobile devices.

The best part about using Lex is that the developers need not have machine learning expertise. The language model is automatically built as per the prompts given. Also, there are no bandwidth constraints and Lex auto-scales as per your needs. It uses deep learning to get smarter with time. Several Lex use cases include information bots, bots to control devices, order placing or travel bots, self-service bots, etc.

Lex has iOS and Android SDKs for mobile development, and you need not certify the bot before deployment. Currently, the maximum speech input time is 15 seconds for slot filling and the languages supported are US Spanish, Canadian French, British English, Australian English, German, US English, Latin American Spanish and French.

Lex V2 has an additional streaming API where the bot will listen continuously and respond proactively. Messages like “Take your time to respond” can be further added to make the conversation human-like.

According to a 2019 Gartner report, the top 16 market vendors for conversational chatbots are-

  • Avaamo
  • Amazon Web Services
  • Artificial Solutions
  • Eudata
  • Google
  • IBM
  • IPsoft
  • Microsoft
  • Openstream
  • Oracle
  • Rasa
  • Rulai
  • SmartBotHub
  • SoundHound

Other than the above-mentioned, Gensim, Textblob, PyNLPL, CoreNLP, spaCy Python NLP libraries are also widely used to build AI chatbots.

Fig. Comparison of major chatbot building frameworks

Use Cases
  • Automated Covid Assessment Bot  

With Amazon Lex, Motherson Technology Services helped develop a Covid assessment tool consisting of a genie interacting as a chatbot for voice and text-based interactions. It leverages multiple features of AI such as voice, Natural Language Processing and decision making to provide a solution that can scale up to serve multiple people. The solution interacts with the users and suggests whether they are prone to COVID-19 based on a set of related questions. The bot is conversational and responds to the user’s voice and text replies in real-time. It is accessible on the following link:

Fig. Architecture diagram

Fig. Deployed COVID Chatbot

  • 3D, Augmented Reality (AR) and Virtual Reality (VR) Application

Using Amazon Lex, Motherson Technology Services helped develop a virtual environment consisting of an interactive chatbot for voice and text-based interactions. It includes real-time conversations with the bot, and the users can see the captioning of the responses, which further enhances the user experience. The scene consisted of a wall television that plays a video and adds to the aesthetics of the environment.

The host of the scene in Sumerian is in sync with Amazon Lex – which is used for creating the bot. It helps create the conversational interface and build chatbots without any heavy lifting. Amazon Polly then turns the text response by Lex into speech. The Lambda function is written in Python to act as the backend of the whole task by initializing and validating the user input.

Fig. A close view of the VR scene

  • Enterprise Level Attendance Tracking System

Here, the client required an attendance tracking system for its employees. For this, Motherson Technology Services used the facial recognition capabilities of AWS Rekognition along with DynamoDB and Lambda. A collection of image libraries was created in Rekognition, comprising each employee’s training images.

Corresponding to this set of pictures, the video frame is captured that helps match the image for the attendance marking. Amazon S3 is used to store the collection, and the workflow comprises two sections- Indexing and Analysis.

Indexing helps populate the collection in Rekognition using IndexFacesAPI, whereas Analysis comprises queries that run on this collection using SearchFacesByImage API. DynamoDB is used to store the key-value pair like “emp_id-present” for later reference and UI display. Similarly, the Lambda function is created in Python to act as the backend of the service interactions.

Fig. Architecture diagram

If you want to learn more about Amazon AI solutions, request a demo or contact us here.

About the Author:

Prachi Gulihar – ML Engineer, AI / ML Practice

Rajat Dwivedi – ML Engineer, AI / ML Practice

Nishu Malik – ML Engineer, AI / ML Practice


Trends and insights from our IT Experts