Limiting Azure Cosmos DB Query Response to 2MB using SDK

146 views Asked by At

Hello Stack Overflow Community,

I am currently working on an Azure Cosmos DB project using the SDK and I have run into a situation where I need to limit the query response size to a maximum of 2MB. The necessity to cap the response size to 2MB arises from the fact that I am using the Container.CreateTransactionalBatch method afterwards, which has a limit of 2MB per batch.

Here is the current snippet of code where I am constructing the query:

var query = new QueryDefinition($"SELECT * FROM c WHERE c.Discriminator = '{_partitionKey}'");
var queryIterator = _sourceContainer.GetItemQueryIterator<T>(queryDefinition: query, continuationToken: _continuationToken);

While I know how to limit the number of items retrieved using QueryRequestOptions, I am unsure how to specifically limit the data size of the response to not exceed 2MB.

// An example of setting a max item count, which does not guarantee a 2MB size limit
var queryRequestOptions = new QueryRequestOptions { MaxItemCount = 100 }; 

I am looking for suggestions or approaches on how to achieve this.

Has anyone encountered a similar scenario or have insights on how this could be achieved efficiently without involving trial and error to find a suitable MaxItemCount?

Thank you!

1

There are 1 answers

0
Balaji On

I am unsure how to specifically limit the data size of the response to not exceed 2MB.

Below are the steps I followed to limit the data size of the response:

  • Sets the maxBatchSizeBytes value, which determines the maximum size for a batch of documents.

  • Performs a SQL query to retrieve all documents from the container and saves them to documents.

  • Initializes variables to manage the batch processing of documents (currentBatch, currentBatchSizeBytes, and batchIndex).

  • The GetDocumentSize method is used to iteratively determine each document's size as it goes through the retrieved documents.

  • It checks if adding the document to the current batch would exceed the maxBatchSizeBytes. If not, it adds the document to the batch; otherwise, it processes the current batch and breaks out of the loop.

  • GetDocumentSize method calculates the size of a MyDocument object in bytes.

Below Is the whole data present In DB:

Batch 1, Total Size: 96 bytes:
ID: 1, Name: John Doe
ID: 2, Name: Jane Smith
ID: 3, Name: Alice Johnson
ID: 4, Name: Pavan
ID: 5, Name: Pavan Balaji

Batch 2, Total Size: 76 bytes:
ID: 6, Name: Balaji
ID: 7, Name: Sai
ID: 8, Name: Venkatesh
ID: 9, Name: Sai Venkatesh
ID: 10, Name: Likitha

Below code Is used to limit the data size:

public class MyDocument
{
    public string Id { get; set; }
    public string Name { get; set; }
}

class Program
{
    private static readonly string EndpointUri = "******";
    private static readonly string PrimaryKey = "******";
    private static readonly string DatabaseId = "testDb";
    private static readonly string ContainerId = "testCont";

    static async Task Main(string[] args)
    {
        var cosmosClient = new CosmosClient(EndpointUri, PrimaryKey);
        var database = cosmosClient.GetDatabase(DatabaseId);
        var container = database.GetContainer(ContainerId);

        try
        {
            var maxBatchSizeBytes = 97;
            
            var query = new QueryDefinition("SELECT * FROM c");
            var iterator = container.GetItemQueryIterator<MyDocument>(query);
            var documents = new List<MyDocument>();

            while (iterator.HasMoreResults)
            {
                var response = await iterator.ReadNextAsync();
                documents.AddRange(response);
            }

            var currentBatch = new List<MyDocument>();
            long currentBatchSizeBytes = 0;
            var batchIndex = 0;

            foreach (var document in documents)
            {
                long documentSizeBytes = GetDocumentSize(document);

                if (currentBatchSizeBytes + documentSizeBytes <= maxBatchSizeBytes)
                {
                    currentBatch.Add(document);
                    currentBatchSizeBytes += documentSizeBytes;
                }
                else
                {
                    batchIndex++;
                    Console.WriteLine($"Batch {batchIndex}, Total Size: {currentBatchSizeBytes} bytes:");
                    PrintBatch(currentBatch);
                    break;
                }
            }
        }
        catch (CosmosException ex)
        {
            Console.WriteLine($"Cosmos DB Exception: {ex}");
        }
        catch (Exception ex)
        {
            Console.WriteLine($"An error occurred: {ex}");
        }
    }

    static long GetDocumentSize(MyDocument document)
    {
        return document.Name.Length * sizeof(char);
    }

    static void PrintBatch(IEnumerable<MyDocument> batch)
    {
        foreach (var document in batch)
        {
            Console.WriteLine($"ID: {document.Id}, Name: {document.Name}");
        }
    }
}

Output:

Batch 1, Total Size: 96 bytes:
ID: 1, Name: John Doe
ID: 2, Name: Jane Smith
ID: 3, Name: Alice Johnson
ID: 4, Name: Pavan
ID: 5, Name: Pavan Balaji
  • After processing, only 96 bytes of the original 172 bytes of data were printed.