s3://trd-data-lake-landing-zone/fetched_projects/project_65e34c4352faff00017fc8a2/locations/location_65e34c4352faff00017fc835/design_65e34c4352faff00017fc832/analysis_65e34c4352faff00017fc8a3/
look at this file structure. under fetched_projects I see a project_ folder which has some files and a folder called locations.under locations folder I have another folder with location which has few folders inside as design which has few more folder as analysis which has some json files in it. If you notice none of these s are same.
s3://trd-data-lake-landing-zone/
└── fetched_projects
└── project_<id>
├── files...
└── locations
└── location_<id>
└── design_<id>
└── analysis_<id>
└── json files...
at the end of the analysis bucket I get some json files called result_.json. I want to use only those to run another transformation pipeline which flattens the json structure. how do I tackle this problem dynamically? I am trying this on my local machine.
I tried to get the list of all the ids and then creating one format key like f'{projects_path}project_{project_id}/result_{project_id}.json' but that did not work
Could you try to be a bit more specific? If you want to dynamically obtain a file given some key, or match your keys with keys present in the bucket you can list the items in the bucket then just filter a list of strings with regex. This is the most naive approach, and better solutions can be implemented specific to your task.
You can split the key of each file at some character (for example
/). In pseudocode (assuming you have boto3 setup in the environment you are executing your python code):If you have already defined the exact keys inside the curly brackets, then to download a json file from s3 you would
but again, the intent of your question is missing here. Could you try to provide a minimal (at least theoretical) working example of what you would like to accomplish?