Bulk analysis - Amazon Rekognition

Bulk analysis

Amazon Rekognition Bulk Analysis lets you process a large collection of images asynchronously by using a manifest file with the StartMediaAnalysisJob operation. The output for each individual image matches the output returned by the operation that you use for analysis.

Currently, Rekognition supports analysis with the DetectModerationLabels operation.

You will be charged for the number of images that have been successfully processed by the job. The results of a finished job are outputted to a specified Amazon S3 bucket.

Note that Bulk Analysis does not support the Amazon A2I integration.

The API can detect animated or illustrated content types, and information about the detected content type is returned as part of the response.

Processing images in bulk

You can start a new bulk analysis job by submitting a manifest file and calling the StartMediaAnalysisJob operation. The input manifest file contains references to images in an Amazon S3 bucket and it is formatted as follows:

{"source-ref": "s3://foo/bar/1.jpg"}

To create a bulk analysis job (CLI)

  1. If you haven't already:

    1. Create or update a user with AmazonRekognitionFullAccess and AmazonS3ReadOnlyAccess permissions. For more information, see Step 1: Set up an AWS account and create a User.

    2. Install and configure the AWS CLI and the AWS SDKs. For more information, see Step 2: Set up the AWS CLI and AWS SDKs.

  2. Upload images to your S3 bucket.

    For instructions, see Uploading Objects into Amazon S3 in the Amazon Simple Storage Service User Guide.

  3. Use the following commands to create and retrieve bulk analysis jobs.

CLI

Use the following command to call the StartMediaAnalysisJob operation for analysis with the DetectModerationLabels operation:

# Requests # Starting DetectModerationLabels job with default settings aws rekognition start-media-analysis-job \ --operations-config "DetectModerationLabels={MinConfidence='1'}" \ --input "S3Object={Bucket=my-bucket,Name=my-input.jsonl}" \ --output-config "S3Bucket=my-output-bucket,S3KeyPrefix=my-results"

You can get information about a given job, such as the Amazon S3 path of the bucket where results and summary files are stored, by using the GetMediaAnalysisJob operation. You provide it with a job ID returned by StartMediaAnalysisJob or ListMediaAnalysisJob. Details about individual jobs are only retained for one year.

# Request aws rekognition get-media-analysis-job \ --job-id customer-job-id

You can list all of your bulk analyses by using the ListMediaAnalysisJobs job operation, which returns pages of jobs. With the max-results argument, you can specify the maximum number of jobs to return per page, limited to the value of max-results. A maximum of 100 results are returned per page. Details about individual jobs are only retained for one year.

# Request # Specify number of jobs to return per page, limited to max-results. aws rekognition list-media-analysis-jobs --max-results 1

StartMediaAnalysisJob output manifests

The bulk analysis job generates an output manifest file that contains the job results, as well as a manifest summary which contains statistics and details on any errors when processing the input manifest entries.

If duplicated entries were included in the input manifest, the job won’t attempt to filter out unique inputs, and will instead process all provided entries.

The output manifest file is formatted as follows:

// Output manifest for content moderation {"source-ref":"s3://foo/bar/1.jpg", "detect-moderation-labels": {"ModerationLabels":[],"ModerationModelVersion":"7.0","ContentTypes":[{"Confidence":72.7257,"Name":"Animated"}]}}

The output manifest summary is formatted as follows:

{ "version": "1.0", # Schema version, 1.0 for GA. "statistics": { "total-json-lines": Number, # Total number json lines (images) in the input manifest. "valid-json-lines": Number, # Total number of JSON Lines (images) that contain references to valid images. "invalid-json-lines": Number # Total number of invalid JSON Lines. These lines were not handled. }, "errors": [ { "line-numer": Number, # The number of the line in the manifest where the error occured. "source-ref": "String", # Optional. Name of the file if was parsed. "code": "String", # Error code. "message": "String" # Description of the error. } ] }

Content type

Information on the type of media content analyzed by StartMediaAnalysisJob operation is returned by the GetMediaAnalysisJob operation. ContentType can be one of two different categories:

  • Animated content, which includes video game and animation (e.g., cartoon, comics, manga, anime).

  • Illustrated content, which includes drawing, painting, and sketches.

Prediction verification and adapter training

Bulk Analysis can also be leveraged through the Rekognition console to get predictions for a batch of images, verify these predictions, and then create an adapter using the verified predictions. Adapters allow you to enhance the accuracy of any supported Rekognition operation.

Currently, you can create adapters for use with the Rekognition Custom Moderation feature. By creating an adapter and providing it to the DetectModerationLabels operation, you can achieve better accuracy for the content moderation tasks related to your specific use case.

For more information about Custom Moderation, see Enhancing accuracy with Custom Moderation. See Bulk analysis and verification for an explanation of how to verify predictions made with Bulk analysis. For a tutorial covering how to use the Rekognition console to verify predictions and create an adapter, see Custom Moderation adapter tutorial.