I’ve a df which stores lists in a column. I am saving the df with all columns in json using
config_new: type: json.JSONDataSet filepath: data/01_raw/new_config.json save_args: indent: 6
It’s saving all columns ok , except the column with list as string. As in:
“T”:[{ “Col1”: “9” “Col2”: “[“7”,”9”,”0”,”5”]” }]
As you can see above col2 list is coming out as string
I am using json encoder class as below in a python script and saving it under src :
import json class CustomEncoder(json.JSONEncoder): def default(self, obj): if isinstance(obj, list): return obj.to_json() return super().default(obj)
Updated my config to:
config_new: type: json.JSONDataSet filepath: data/01_raw/new_config.json save_args: indent: 6 cls: custom_encoder.CustomEncoder
However the CustomEncoder is not being identified and shooting error as it can’t call str.
I am not sure on how to import the class to the cataloge
So you're on the right track, but you need to subclass the json.JSONDataSet and make the changes there, then call the classpath of that custom dataset from your yaml catalog.
https://docs.kedro.org/en/stable/data/how_to_create_a_custom_dataset.html