How to traverse nested JSON object and delete/modify them using Python after parsing the XML file?

39 views Asked by At

Given XML, I need to convert it to JSON and modify the JSON object.

<?xml version="1.0" standalone="yes"?>
<!--COUNTRIES is the root element-->
<WORLD>
    <country name="A">
        <event day="323" name="$abcd"> </event>
        <event day="23" name="$aklm"> </event>

        <neighbor name="B" direction="W" friend="T"></neighbor>
        <neighbor name="B" direction="W"></neighbor>
        <neighbor name="B" direction="W"></neighbor>
    </country>
    <country name="C">
        <event day="825" name="$nmre"> </event>
        <event day="329" name="$lpok"> </event>
        <event day="145" name="$dswq"> </event>
        <event day="256" name="$tyul"> </event>

        <neighbor name="D" direction="N"/>
        <neighbor name="B" direction="W" friend="T"/>
    </country>
</WORLD>

I want to remove "event" element in the final output of JSON file, and "friend" attribute, which is present inside "WORLD"-> "country"-> "neighbor". I am using "xmltodict" library in Python and successfully able to convert XML to JSON, but could not able to remove these elements and attributes from JSON file.

Python Code:

import xmltodict, json
class XMLParser:
    def __init__(self, xml_file_path):
        self.xml_file_path = xml_file_path
        if not self.xml_file_path:
            raise ValueError("XML file path is not found./n")
        with open (self.xml_file_path, 'r') as f:
            self.xml_file = f.read()

    def parse_xml_to_json(self):
        xml_file = self.xml_file
        json_data = xmltodict.parse(xml_file, attr_prefix='')
        if 'event' in json_data['WORLD']['country']:
            del json_data['WORLD']['country']['event']
        return json.dumps(json_data, indent=4)
  
xml_file_path = "file_path"
xml_parser = XMLParser(xml_file_path)
json_object = xml_parser.parse_xml_to_json()
print(json_object)

Please suggest.

1

There are 1 answers

0
James On BEST ANSWER

You can use a recursive function to remove the unwanted keys from the dictionary. Below is a function that checks each dictionary for a key, and removes it if found, then iterates through the values of each dict and the items of each list and does applies the function again.

def remove_key(d: dict, key: str):
    if key in d:
        d.pop(key)
    for val in d.values():
        if isinstance(val, list):
            for item in val:
                remove_key(item, key)
        if isinstance(val, dict):
            remove_key(val, key)

First, parse the input XML:

import xmltodict
import json

xmltext = """<?xml version="1.0" standalone="yes"?>
<!--COUNTRIES is the root element-->
<WORLD>
    <country name="A">
        <event day="323" name="$abcd"> </event>
        <event day="23" name="$aklm"> </event>

        <neighbor name="B" direction="W" friend="T"></neighbor>
        <neighbor name="B" direction="W"></neighbor>
        <neighbor name="B" direction="W"></neighbor>
    </country>
    <country name="C">
        <event day="825" name="$nmre"> </event>
        <event day="329" name="$lpok"> </event>
        <event day="145" name="$dswq"> </event>
        <event day="256" name="$tyul"> </event>

        <neighbor name="D" direction="N"/>
        <neighbor name="B" direction="W" friend="T"/>
    </country>
</WORLD>"""

d = xmltodict(xmltext)

The value of d is the following:

d
# d has this value:
{'WORLD': {'country': [{'@name': 'A',
    'event': [{'@day': '323', '@name': '$abcd'},
     {'@day': '23', '@name': '$aklm'}],
    'neighbor': [{'@name': 'B', '@direction': 'W', '@friend': 'T'},
     {'@name': 'B', '@direction': 'W'},
     {'@name': 'B', '@direction': 'W'}]},
   {'@name': 'C',
    'event': [{'@day': '825', '@name': '$nmre'},
     {'@day': '329', '@name': '$lpok'},
     {'@day': '145', '@name': '$dswq'},
     {'@day': '256', '@name': '$tyul'}],
    'neighbor': [{'@name': 'D', '@direction': 'N'},
     {'@name': 'B', '@direction': 'W', '@friend': 'T'}]}]}}

Applying the function to d removes the unwanted keys:

remove_key(d, 'event')
remove_key(d, '@friend')

d
# d now has this value:
{'WORLD': {'country': [{'@name': 'A',
    'neighbor': [{'@name': 'B', '@direction': 'W'},
     {'@name': 'B', '@direction': 'W'},
     {'@name': 'B', '@direction': 'W'}]},
   {'@name': 'C',
    'neighbor': [{'@name': 'D', '@direction': 'N'},
     {'@name': 'B', '@direction': 'W'}]}]}}

Now you can export to JSON.

with open('output.json', 'w') as fp:
    json.dump(d, fp, indent=4)