AWS Textract start_document_analysis is not allowing me to use AdapterConfig in my lambda function

48 views Asked by At

Currently I am using textract queries to extract specific information from uploaded pdf documents. I have a lambda function called textract_async_job_creation which is triggered every time a document is uploaded to an S3 bucket. This function then runs the textract start_document_analysis method and stores the response in another S3 bucket. This triggers SNS to send a notification to another lambda function called textract-response-process which stores the results of the queries as json in another bucket. My issue is that the first lambda function, textract_async_job_creation, will not allow me to use the adapter that I trained, and throws an error saying AdapterConfig is not recognized (I do not remember the exact error). All the documentation that I have read allows for AdapterConfig to be used with start_document_analysis. Can someone tell me what I've done wrong here?

import os
import json
import boto3
from botocore.config import Config
from urllib.parse import unquote_plus 



my_config = Config(
    region_name='us-east-2',
    retries={
        'max_attempts': 10,
        'mode': 'adaptive'
    }
)
textract = boto3.client('textract', config=my_config)

OUTPUT_BUCKET_NAME = os.environ["OUTPUT_BUCKET_NAME"]
OUTPUT_S3_PREFIX = os.environ["OUTPUT_S3_PREFIX"]
SNS_TOPIC_ARN = os.environ["SNS_TOPIC_ARN"]
SNS_ROLE_ARN = os.environ["SNS_ROLE_ARN"]

def lambda_handler(event, context):
    responses = []

    for record in event["Records"]:
        file_obj = record["s3"]
        bucketname = str(file_obj["bucket"]["name"])
        filename = unquote_plus(str(file_obj["object"]["key"])) 
        print(f"Bucket: {bucketname} ::: Key: {filename}")

        response = textract.start_document_analysis(
            DocumentLocation={'S3Object': {'Bucket': bucketname, 'Name': filename}},
            FeatureTypes=['QUERIES'],
            OutputConfig={'S3Bucket': OUTPUT_BUCKET_NAME, 'S3Prefix': OUTPUT_S3_PREFIX},
            NotificationChannel={'SNSTopicArn': SNS_TOPIC_ARN, 'RoleArn': SNS_ROLE_ARN},
            QueriesConfig={
                'Queries': [
                    {'Text': 'What is the name of the claimant?', 'Pages': ['1']},
                    {'Text': 'What is the date on the document?', 'Pages': ['1']},
                    {'Text': 'What is the phone number?', 'Pages': ['1']},
                    {'Text': 'What is the address of the  office?', 'Pages': ['1']}
                ]
              }
            # AdaptersConfig={
            #     'Adapters': [
            #         {'AdapterId': 'xxxxxxxxxxx', 'Version': '1'}
            #     ]
            # }
        )
        responses.append(response)

    successful_responses = [resp for resp in responses if resp["ResponseMetadata"]["HTTPStatusCode"] == 200]
    failed_responses = [resp for resp in responses if resp["ResponseMetadata"]["HTTPStatusCode"] != 200]

    if successful_responses:
        return {"statusCode": 200, "body": json.dumps(f"Job(s) created successfully for {len(successful_responses)} file(s)!")}
    else:
        return {"statusCode": 500, "body": json.dumps(f"Job creation failed for {len(failed_responses)} file(s)!")}
        

I tried to use start_document_analysis to use the adapter I trained with textract in order to extract the correct query responses from the documents I am uploading to an S3 bucket. However, I get an error whenever I try to include AdaptersConfig with the start_document_analysis method and I do not understand why given the documentation gives examples using AdaptersConfig.

1

There are 1 answers

0
user21067592 On

I was using python 3.9 as the runtime which was using an older version of boto3 which did not recognize AdapterConfig as a parameter. I switched to python version 3.12 and that fixed the issue.