Computer vision with Azure Cognitive Services

Rating & reviews (0 reviews)
Study notes

1. Analise images

Computer Vision
  • Part of artificial intelligence (AI) in which software interprets visual input: images or video feeds.
  • Designed to help you extract information from images:
    • Description and tag generation
      Determining an appropriate caption for an image, and identifying relevant "tags"
    • Object detection
      Detecting the presence and location of specific objects within the image.
    • Face detection
      Detecting the presence, location, and features of human faces in the image.
    • Image metadata, color, and type analysis
      Determining the format and size of an image, its dominant color palette, and whether it contains clip art.
    • Category identification
      Identifying an appropriate categorization for the image, and if it contains any known landmarks.
    • Brand detection
      Detecting the presence of any known brands or logos.
    • Moderation rating
      Determine if the image includes any adult or violent content.
    • Optical character recognition
      Reading text in the image.
    • Smart thumbnail generation
      Identifying the main region of interest in the image to create a smaller "thumbnail"
  • Provision:
    • Single-service resource
    • Computer Vision API in a multi-service Cognitive Services resource.
Analyze an image
Use the Analyze Image REST method or the equivalent method in the SDK (Python, C# etc)
You can use scoped functions to retrieve specific subsets of the image features, including the image description, tags, and objects in the image.
Returns a JSON document containing the requested information.
"categories": [
"name": "_outdoor_mountain",
"confidence": "0.9"}
"adult": {"isAdultContent": "false", …},

Generate a smart-cropped thumbnail
Creates thumbnail with different dimensions (and aspect ratio) from the source image, and optionally to use image analysis to determine the region of interest in the image (its main subject) and make that the focus of the thumbnail.

2. Analise video
Extract info:
  • Facial recognition
  • OCR
  • Speech transcription
  • Topics - key topics discussed in the video.
  • Sentiment analysis
  • Labels - label tags that identify key objects or themes throughout the video.
  • Content moderation
  • Scene segmentation
You can create custom models and train them for:
reating custom models for:
  • People.
    Add images of the faces of people you want to recognize in videos, and train a model. Consider Limited Access approval, adhering to our Responsible AI standard.
  • Language.
    Specific terminology that may not be in common usage
  • Brands.
    Train a model to recognize specific names as brands relevant to your business.
  • Animated characters.
    Detect the presence of individual animated characters in a video.
Incorporate the service into custom applications.
  • Video Analyzer for Media widgets
    share insights from specific videos with others without giving them full access to your account in the Video Analyzer for Media portal
  • Video Analyzer for Media API
    REST API that you can subscribe to in order to get a subscription key -> automate video indexing tasks, such as uploading and indexing videos, retrieving insights, and determining endpoints for Video Analyzer widgets.
    Result is in JSON.
3. Classify images
Image classification
Computer vision technique in which a model is trained to predict a class label for an image based on its contents.
  • multiclass classification - multiple classes, each image can belong to only one class.
  • multilabel classification - an image might be associated with multiple labels.
Classic flow for modeling/prediction:
  • Use existing (labeled) images to train a Custom Vision model.
  • Create a client application that allow others to submit new images - model generate predictions.
4. Detect objects in images
Object detection
Computer vision technique in which a model is trained to detect the presence and location of one or more classes of object in an image.
  • Class label of each object detected in the image.
  • Location of each object within the image, indicated as coordinates of a bounding box that encloses the object.
Bounding boxes are defined by four values that represent the left (X) and top (Y) coordinates of the top-left corner of the bounding box, and the width and height of the bounding box. These values are expressed as proportional values relative to the source image size.

Hardest part is training model:
  • Add label to every object in image via use the interactive UI from Custom Vision portal.
    Suggest train the model as soon as you have relevant images labeled then, use smart labeling, system prefill and you just confirm or change.
  • Use labeling tools ie. the one provided in Azure Machine Learning Studio or the Microsoft Visual Object Tagging Tool (VOTT)- team work.
    In this case, you may need to adjust the output to match the measurement units expected by the Custom Vision API
5. Detect objects in images
In fact, there are multiple 'actions':
  • Face detection
  • Face analysis
  • Face recognition
You can use:
  • Computer Vision service
    Detect human faces and return the box blundering face and its location (like in object detection).
  • The Face service
    What do Computer Vision (box +location) plus:
    • Comprehensive facial feature analysis
      • Head pose
      • Glasses
      • Blur
      • Exposure
      • Noise
      • Occlusion
    • Facial landmark location
    • Face comparison and verification.
    • Facial recognition.
When using this service consider:
  • Data privacy and security
  • Transparency
  • Fairness and inclusiveness
System has the ability tocompare faces anonymously(confirm that the same person is present on two occasions, without the need to know the actual identity of the person)
When you need to positively identify individuals, you can train a facial recognition model using face images:

Training process:
  • Create a Person Group that defines the set of individuals you want to identify.
  • Add a Person to the Person Group for each individual you want to identify.
  • Add detected faces from multiple images to each person, preferably in various poses.
    The IDs of these faces will no longer expire after 24 hours (persisted faces).
  • Train the model.
The trained model is stored in your Face (or Cognitive Services) resource.
It can be used to:
  • Identify individuals in images.
  • Verify the identity of a detected face.
  • Analyze new images to find faces that are similar to a known, persisted face.

Hands-On Classify images, Login to view

Hands-On Computer Vision, Login to view

Hands-On Video Indexer, Login to view