I'm trying to do some actions on Json's files and for that run a test on simple Json. I'm getting error and cant understand what is the problem.
// A. CREATE ASSEMBLY: Register assemblies (if they do not already exist).
CREATE ASSEMBLY IF NOT EXISTS [Newtonsoft.Json] FROM @"adl://chstorage.azuredatalakestore.net/Newtonsoft.Json.dll";
CREATE ASSEMBLY IF NOT EXISTS [Microsoft.Analytics.Samples.Formats] FROM @"adl://chstorage.azuredatalakestore.net/Microsoft.Analytics.Samples.Formats.dll";
// B. REFERENCE ASSEMBLY: Load assemblies for compile time and execution.
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
// C. USING: Specify namespace to shorten function names (e.g. Microsoft.Analytics.Samples.Formats.Json.JsonExtractor)
USING Microsoft.Analytics.Samples.Formats.Json;
// 1. Initialise variables for Input (e.g. JSON) and Output (e.g. CSV).
DECLARE @InputFile string = @"adl://chstorage.azuredatalakestore.net/exercise01.json";
DECLARE @OutputFile string = @"adl://chstorage.azuredatalakestore.net/exercise01.csv";
// 2. Extract string content from JSON document (i.e. schema on read).
@json =
EXTRACT
title string,
rating string,
genre string,
year string
FROM
@InputFile
USING new JsonExtractor();
// 3. Write values to CSV
OUTPUT @json
TO @OutputFile
USING Outputters.Csv(outputHeader:true,quoting:true);
The error is:
{
"errors": [
{
"errorId": "E_CSC_USER_ROSLYNBINDERRETRY",
"severity": "Error",
"component": "CSC",
"source": "USER",
"message": "An exception has occurred when retry binding C# expression(s) after an earlier failed attempt. Before retrying, the error was: Index was outside the bounds of the array. After retrying, the error was: Index was outside the bounds of the array.",
"details": "at token 'USING', line 10\r\nnear the ###:\r\n**************\r\n [Microsoft.Analytics.Samples.Formats];\n\n// C. USING: Specify namespace to shorten function names (e.g. Microsoft.Analytics.Samples.Formats.Json.JsonExtractor)\n ### USING Microsoft.Analytics.Samples.Formats.Json;\n\n// 1. Initialise variables for Input (e.g. JSON) and Output (e.g. CSV).\nDECLARE @InputFile string = @\"adl://chs",
"description": "An exception has occurred when retry binding C# expression(s) after an earlier failed attempt.",
"resolution": "Make sure that this expression is a valid C# expression.",
"helpLink": "",
"filePath": "",
"lineNumber": 10,
"startOffset": 639,
"endOffset": 644
}
]
}
I already register the assemblies to my account
And this is the simple content of the Json file
{
"title": "The Godfather",
"rating": "R",
"genre": "Crime, Drama",
"year": 1972
}
As per my note, do not use U-SQL. Here is how you would do this in Azure Synapse Analytics, serverless SQL pool, which can read JSON files natively:
Here's how you might query it with a Synapse Notebook using Pyspark, assuming your file is multi-line json as per your example. A similar script would work in Databricks:
My results: