I could not find proper solution to make use Newtonsoft JsonExtractor to parse Input file with new line delimiter.
From the Newtonsoft JsonExtractor I can read the first line successfully when exploded with "$.d.results[*]" but it's not moving to the next line,
As the data is >4MB for each row, can't extract as text. So parsing need to be performed using custom extractor to proceed.
Sample Input:
{"d":{"results":[{"data":{"Field_1":"1","Field_2":"2"},"Field_3":"3","Field_4":"4"}]}}
{"d":{"results":[{"data":{"Field_1":"11","Field_2":"21"},"Field_3":"31","Field_4":"41"}]}}
{"d":{"results":[{"data":{"Field_1":"12","Field_2":"22"},"Field_3":"32","Field_4":"42"}]}}
Expected Output:
Field_1|Field_2|Field_3|Field_4
1 |2 |3 |4
11 |21 |31 |41
12 |22 |32 |42
USQL Code:
CREATE ASSEMBLY IF NOT EXISTS [Microsoft.Analytics.Samples.Formats] FROM @"Microsoft.Analytics.Samples.Formats.dll";
CREATE ASSEMBLY IF NOT EXISTS [Newtonsoft.Json] FROM @"Newtonsoft.Json.dll";
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
REFERENCE ASSEMBLY [Newtonsoft.Json];
USING Microsoft.Analytics.Samples.Formats.Json;
DECLARE @DATA_SOURCE string = "Input.data" ;
@SOURCE =
EXTRACT Field_1 string,
Field_2 string,
Field_3 string,
Field_4 string
FROM @DATA_SOURCE
USING new JsonExtractor("$.d.results[*]");
Considering your source input as below, i.e. more than one JSON document in a file.
The JSON assembly includes a MultiLevelJsonExtractor which allows us to extract data from multiple JSON paths at differing levels within a single pass. See the underlying code and inline documentation over at Github.
Use it by supplying multiple levels of Json Paths. They will be assigned to the schema by index.
The code snippet shows the MultiLevelJsonExtractor in action.
in your case....