U- SQL problem with loading the assemblies

101 views Asked by At

I'm trying to do some actions on Json's files and for that run a test on simple Json. I'm getting error and cant understand what is the problem.

// A. CREATE ASSEMBLY: Register assemblies (if they do not already exist).
CREATE ASSEMBLY IF NOT EXISTS [Newtonsoft.Json] FROM @"adl://chstorage.azuredatalakestore.net/Newtonsoft.Json.dll";
CREATE ASSEMBLY IF NOT EXISTS [Microsoft.Analytics.Samples.Formats] FROM @"adl://chstorage.azuredatalakestore.net/Microsoft.Analytics.Samples.Formats.dll";
 
// B. REFERENCE ASSEMBLY: Load assemblies for compile time and execution.
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

// C. USING: Specify namespace to shorten function names (e.g. Microsoft.Analytics.Samples.Formats.Json.JsonExtractor)
USING Microsoft.Analytics.Samples.Formats.Json;

// 1. Initialise variables for Input (e.g. JSON) and Output (e.g. CSV).
DECLARE @InputFile string = @"adl://chstorage.azuredatalakestore.net/exercise01.json";
DECLARE @OutputFile string = @"adl://chstorage.azuredatalakestore.net/exercise01.csv";

// 2. Extract string content from JSON document (i.e. schema on read).
@json =
EXTRACT
    title string,
    rating string,
    genre string,
    year string
FROM
    @InputFile
USING new JsonExtractor();

// 3. Write values to CSV
OUTPUT @json
TO @OutputFile
USING Outputters.Csv(outputHeader:true,quoting:true);


The error is:
{
    "errors": [
        {
            "errorId": "E_CSC_USER_ROSLYNBINDERRETRY",
            "severity": "Error",
            "component": "CSC",
            "source": "USER",
            "message": "An exception has occurred when retry binding C# expression(s) after an earlier failed attempt. Before retrying, the error was: Index was outside the bounds of the array. After retrying, the error was: Index was outside the bounds of the array.",
            "details": "at token 'USING', line 10\r\nnear the ###:\r\n**************\r\n [Microsoft.Analytics.Samples.Formats];\n\n// C. USING: Specify namespace to shorten function names (e.g. Microsoft.Analytics.Samples.Formats.Json.JsonExtractor)\n ### USING Microsoft.Analytics.Samples.Formats.Json;\n\n// 1. Initialise variables for Input (e.g. JSON) and Output (e.g. CSV).\nDECLARE @InputFile string = @\"adl://chs",
            "description": "An exception has occurred when retry binding C# expression(s) after an earlier failed attempt.",
            "resolution": "Make sure that this expression is a valid C# expression.",
            "helpLink": "",
            "filePath": "",
            "lineNumber": 10,
            "startOffset": 639,
            "endOffset": 644
        }
    ]
}

I already register the assemblies to my account

And this is the simple content of the Json file

{
    "title": "The Godfather",
    "rating": "R",
    "genre": "Crime, Drama",
    "year": 1972
}
1

There are 1 answers

0
wBob On

As per my note, do not use U-SQL. Here is how you would do this in Azure Synapse Analytics, serverless SQL pool, which can read JSON files natively:

SELECT
    JSON_VALUE (jsonContent, '$.title') AS title,
    JSON_VALUE (jsonContent, '$.rating') AS rating,
    JSON_VALUE (jsonContent, '$.genre') AS genre,
    JSON_VALUE (jsonContent, '$.year') AS year
FROM
    OPENROWSET(
        BULK 'https://somedatalakeaccount.dfs.core.windows.net/datalake/raw/film.json',
        FORMAT = 'CSV',
        FIELDQUOTE = '0x0b',
        FIELDTERMINATOR ='0x0b',
        ROWTERMINATOR = '0x0b'
    )
    WITH (
        jsonContent VARCHAR(MAX)
    ) AS [result]

Here's how you might query it with a Synapse Notebook using Pyspark, assuming your file is multi-line json as per your example. A similar script would work in Databricks:

%%pyspark
df = spark.read.option("multiline",True).json(json_path)

display(df)

My results:

Results