I am trying to generate xml file from pandas dataframe using pd.to_xml() method.
I have data in SQL-Server and trying to read from it.
| id | first_name | last_name | gender | ip_address | |
|---|---|---|---|---|---|
| 1 | Erika | Pickless | [email protected] | female | 133.146.172.220 |
| 2 | Victor | Stodd | [email protected] | male | 111.125.132.0 |
| 3 | Becky | Berrisford | [email protected] | female | 200.62.247.237 |
RawData = pd.read_sql_query('''select * from RFP.dbo.MOCK_DATA;''', conn)
RawData.to_xml('output.xml', attr_cols=['id', 'first_name', 'last_name', 'email',
'gender', 'ip_address'],namespaces={"soap": "http://example1.com", "wsse":
"http://docs.example2.org/wss/"}, prefix="soap")
I am getteing the following output
<?xml version='1.0' encoding='utf-8'?>
<soap:data xmlns:soap="http://example1.com" xmlns:wsse="http://docs.example2.org/wss/">
<soap:row soap:index="0" soap:id="1" soap:first_name="Erika" soap:last_name="Pickless" soap:email="[email protected]" soap:gender="Female" soap:ip_address="133.146.172.220"/>
<soap:row soap:index="1" soap:id="2" soap:first_name="Victoir" soap:last_name="Stodd" soap:email="[email protected]" soap:gender="Male" soap:ip_address="111.125.132.0"/>
<soap:row soap:index="2" soap:id="3" soap:first_name="Becky" soap:last_name="Berrisford" soap:email="[email protected]" soap:gender="Female" soap:ip_address="200.62.247.237"/>
</soap:data>
The above output is almost correct but, I want little different output: Issue: The prefix "soap" is attached to every attribute in the above output.
But I want following output:
<?xml version='1.0' encoding='utf-8'?>
<soap:data xmlns:soap="http://example1.com" xmlns:wsse="http://docs.example2.org/wss/">
<soap:row index="0" id="1" first_name="Erika" last_name="Pickless" email="[email protected]" gender="Female" ip_address="133.146.172.220"/>
<soap:row index="1" id="2" first_name="Victoir" last_name="Stodd" email="[email protected]" gender="Male" ip_address="111.125.132.0"/>
<soap:row index="2" id="3" first_name="Becky" last_name="Berrisford" email="[email protected]" gender="Female" ip_address="200.62.247.237"/>
</soap:data>
Since XML is an open-ended design standard and Pandas cannot support all possible output specifications with default arguments, you need a customized solution.
Therefore consider running XSLT, the special-purpose language designed to transform XML files, which is supported in
DataFrame.to_xml()using the defaultlxmlparser andstylesheetargument. Below XSLT will run after your raw output to remove namespace prefixes from all attributes:XSLT (save as .xsl script, a special .xml file)
Online Demo
Python