How to pass S3 object and Key to HumanLoopInput of start_human_loop

Question

How to pass S3 object and Key to HumanLoopInput of start_human_loop

44 views Asked by Ritesh Manglani At 14 March 2024 at 18:56

I am able to extract text from my multi-page PDF using Amazon Textract. Now I want start Human Loop Review. I have already created a workflow and specified the condition there to trigger the Human Loop. Below is my code: -

import os
import json
import time
import uuid
from urllib.parse import unquote_plus
import boto3

def lambda_handler(event, context):
    textract = boto3.client("textract")
    a2i = boto3.client("sagemaker-a2i-runtime")
    FLOW_ARN = os.environ["FLOW_ARN"]
    if event:
        file_obj = event["Records"][0]
        bucketname = str(file_obj["s3"]["bucket"]["name"])
        filename = unquote_plus(str(file_obj["s3"]["object"]["key"]))
        
        # Start document analysis for the whole document
        response = textract.start_document_analysis(
            DocumentLocation={
                "S3Object": {
                    "Bucket": bucketname,
                    "Name": filename,
                }
            },
            FeatureTypes=["FORMS"],  # Specify the feature types to analyze
            ClientRequestToken=str(uuid.uuid4()),  # Generate a unique client request token
        )
        
        # Retrieve the job ID from the response
        job_id = response["JobId"]
        
        # Poll for the completion of the job
        while True:
            job_status = textract.get_document_analysis(JobId=job_id)['JobStatus']
            if job_status in ['SUCCEEDED', 'FAILED']:
                break
            time.sleep(5)  # Wait for 5 seconds before checking again
        
        # Get the results of the analysis
        response = textract.get_document_analysis(JobId=job_id)
        
        # Process the results
        print(json.dumps(response))
        
        a2i.start_human_loop(
            HumanLoopName=uuid.uuid4().hex,
            FlowDefinitionArn=FLOW_ARN,
            HumanLoopInput={
                'InputContent': json.dumps({
                    "InitialValue": {
                        "Bucket": bucketname,
                            "DocumentPath": filename,
                         }
                    })
                },
                DataAttributes={
                    'ContentClassifiers': [
                        'FreeOfAdultContent',
                    ]
                }
            )

        return {
            "statusCode": 200,
            "body": json.dumps("Document processed successfully!"),
        }

    return {"statusCode": 500, "body": json.dumps("Issue processing file!")}

I was expecting it to start the human loop review but it return following error: -

[ERROR] ValidationException: An error occurred (ValidationException) when calling the StartHumanLoop operation: Provided InputContent is not valid. Please use valid InputContent JSON and try your request again.

Could someone please point what I am doing wrong? I need to pass my PDF in S3 bucket to HumanLoopInput.

--------------------EDIT------------------------------

I am using default worker template, here it is: -

<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>
{% capture s3_uri %}s3://{{ task.input.aiServiceRequest.document.s3Object.bucket }}/{{ task.input.aiServiceRequest.document.s3Object.name }}{% endcapture %}

<crowd-form>
  <crowd-textract-analyze-document src="{{ s3_uri | grant_read_access }}" initial-value="{{ task.input.selectedAiServiceResponse.blocks }}" header="Review the key-value pairs listed on the right and correct them if they don't match the following document." no-key-edit="" no-geometry-edit="" keys="{{ task.input.humanLoopContext.importantFormKeys }}" block-types="['KEY_VALUE_SET']">
<short-instructions header="Instructions"><p>Click on a key-value block to highlight the corresponding key-value pair in the document.</p><p><br></p><p>If it is a valid key-value pair, review the content for the value. If the content is incorrect, correct it.</p><p><br></p><p>If the text of the value is incorrect, correct it.</p><p><img src="https://assets.crowd.aws/images/a2i-console/correct-value-text.png" width="100%"></p><p><br></p><p>If a wrong value is identified, correct it.</p><p><img src="https://assets.crowd.aws/images/a2i-console/correct-value.png" width="100%"></p><p><br></p><p>If it is not a valid key-value relationship, choose <strong>No</strong>.</p><p><img src="https://assets.crowd.aws/images/a2i-console/not-a-key-value-pair.png" width="100%"></p><p><br></p><p>If you can’t find the key in the document, choose <strong>Key not found</strong>.</p><p><img src="https://assets.crowd.aws/images/a2i-console/key-is-not-found.png" width="100%"></p><p><br></p><p>If the content of a field is empty, choose <strong>Value is blank</strong>.</p><p><img src="https://assets.crowd.aws/images/a2i-console/value-is-blank.png" width="100%"></p><p><br></p><p><strong>Examples</strong></p><p>The key and value are often displayed next or below to each other.</p><p><br></p><p>For example, key and value displayed in one line.</p><p><img src="https://assets.crowd.aws/images/a2i-console/sample-key-value-pair-1.png" width="100%"></p><p><br></p><p>For example, key and value displayed in two lines.</p><p><img src="https://assets.crowd.aws/images/a2i-console/sample-key-value-pair-2.png" width="100%"></p><p><br></p><p>If the content of the value has multiple lines, enter all the text without a line break. Include all value text, even if it extends beyond the highlighted box.</p><p><img src="https://assets.crowd.aws/images/a2i-console/multiple-lines.png" width="100%"></p></short-instructions>

<full-instructions header="Instructions"></full-instructions>
  </crowd-textract-analyze-document>
</crowd-form>

I can see below keys in this snippet: -

>     task.input.aiServiceRequest.document.s3Object.bucket
      task.input.aiServiceRequest.document.s3Object.name
      task.input.selectedAiServiceResponse.blocks
      task.input.humanLoopContext.importantFormKeys

But it looks like, it is calling some internal libraries of AWS, because it is a default template.

Original Q&A

There are 1 answers

**dali** · Answer 1 · 2024-03-15T19:28:34+00:00

The keys in your InputContent need to be aligned with the human task UI template, which you have not shared.

Check out this example: https://github.com/aws-samples/amazon-textract-a2i-dynamodb-handwritten-tabular/blob/main/textract-hand-written-a2i-forms.ipynb

Notice the keys in InputContent are also found in the human task UI template. Are your keys aligned with your UI template?

TechQA.

How to pass S3 object and Key to HumanLoopInput of start_human_loop

There are 1 answers

Related Questions in AWS-LAMBDA

Related Questions in AMAZON-TEXTRACT

Popular Questions

Trending Questions