I need to flatten a JSON with different levels of nested JSON arrays in Python
Part of my JSON looks like:
{
"data": {
"workbooks": [
{
"projectName": "TestProject",
"name": "wkb1",
"site": {
"name": "site1"
},
"description": "",
"createdAt": "2020-12-13T15:38:58Z",
"updatedAt": "2020-12-13T15:38:59Z",
"owner": {
"name": "user1",
"username": "John"
},
"embeddedDatasources": [
{
"name": "DS1",
"hasExtracts": false,
"upstreamDatasources": [
{
"projectName": "Data Sources",
"name": "DS1",
"hasExtracts": false,
"owner": {
"username": "user2"
}
}
],
"upstreamTables": [
{
"name": "table_1",
"schema": "schema_1",
"database": {
"name": "testdb",
"connectionType": "redshift"
}
},
{
"name": "table_2",
"schema": "schema_2",
"database": {
"name": "testdb",
"connectionType": "redshift"
}
},
{
"name": "table_3",
"schema": "schema_3",
"database": {
"name": "testdb",
"connectionType": "redshift"
}
}
]
},
{
"name": "DS2",
"hasExtracts": false,
"upstreamDatasources": [
{
"projectName": "Data Sources",
"name": "DS2",
"hasExtracts": false,
"owner": {
"username": "user3"
}
}
],
"upstreamTables": [
{
"name": "table_4",
"schema": "schema_1",
"database": {
"name": "testdb",
"connectionType": "redshift"
}
}
]
}
]
}
]
}
}
The output should like this
Tried using json_normalize but couldn't make it work. Currently parsing it by reading the nested arrays using loops and reading values using keys. Looking for a better way of normalizing the JSON
Here's a partial solution:
First save your data in the same directory as the script as a
JSON filecalleddata.json.Output:
What next?
I think if you re-structure the data a bit in advance (for example flattening
'database': {'name': 'testdb', 'connectionType': 'redshift'}) you will be able to add morefieldsto themetaparameter.As you see in the documentation of json_normalize, the four parameters that are used here are:
data:
dict or list of dicts:record_path:
str or list of str: default Nonemeta:
list of paths (str or list of str): default Nonerecord_prefix:
str: default None