How to remove prefix from every attribute in xml using python

103 views Asked by At

I am trying to generate xml file from pandas dataframe using pd.to_xml() method.

I have data in SQL-Server and trying to read from it.

id first_name last_name email gender ip_address
1 Erika Pickless [email protected] female 133.146.172.220
2 Victor Stodd [email protected] male 111.125.132.0
3 Becky Berrisford [email protected] female 200.62.247.237
RawData = pd.read_sql_query('''select * from RFP.dbo.MOCK_DATA;''', conn)


RawData.to_xml('output.xml', attr_cols=['id', 'first_name', 'last_name', 'email', 
'gender', 'ip_address'],namespaces={"soap": "http://example1.com", "wsse": 
"http://docs.example2.org/wss/"}, prefix="soap")

I am getteing the following output

<?xml version='1.0' encoding='utf-8'?>
<soap:data xmlns:soap="http://example1.com" xmlns:wsse="http://docs.example2.org/wss/">
  <soap:row soap:index="0" soap:id="1" soap:first_name="Erika" soap:last_name="Pickless" soap:email="[email protected]" soap:gender="Female" soap:ip_address="133.146.172.220"/>
  <soap:row soap:index="1" soap:id="2" soap:first_name="Victoir" soap:last_name="Stodd" soap:email="[email protected]" soap:gender="Male" soap:ip_address="111.125.132.0"/>
  <soap:row soap:index="2" soap:id="3" soap:first_name="Becky" soap:last_name="Berrisford" soap:email="[email protected]" soap:gender="Female" soap:ip_address="200.62.247.237"/>
</soap:data>

The above output is almost correct but, I want little different output: Issue: The prefix "soap" is attached to every attribute in the above output.

But I want following output:

<?xml version='1.0' encoding='utf-8'?>
<soap:data xmlns:soap="http://example1.com" xmlns:wsse="http://docs.example2.org/wss/">
  <soap:row index="0" id="1" first_name="Erika" last_name="Pickless" email="[email protected]" gender="Female" ip_address="133.146.172.220"/>
  <soap:row index="1" id="2" first_name="Victoir" last_name="Stodd" email="[email protected]" gender="Male" ip_address="111.125.132.0"/>
  <soap:row index="2" id="3" first_name="Becky" last_name="Berrisford" email="[email protected]" gender="Female" ip_address="200.62.247.237"/>    
</soap:data>
1

There are 1 answers

2
Parfait On BEST ANSWER

Since XML is an open-ended design standard and Pandas cannot support all possible output specifications with default arguments, you need a customized solution.

Therefore consider running XSLT, the special-purpose language designed to transform XML files, which is supported in DataFrame.to_xml() using the default lxml parser and stylesheet argument. Below XSLT will run after your raw output to remove namespace prefixes from all attributes:

XSLT (save as .xsl script, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <!-- IDENTITY TRANSFORM -->
    <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
    </xsl:template>

    <!-- REMOVE NAMESPACE PREFIXES FROM ALL ATTRIBUTES -->
    <xsl:template match="@*">
      <xsl:attribute name="{local-name()}">
        <xsl:value-of select="."/>
      </xsl:attribute>
    </xsl:template>
</xsl:stylesheet>

Online Demo

Python

RawData.to_xml(
    "output.xml", 
    attr_cols=["id", "first_name", "last_name", "email", "gender", "ip_address"],
    namespaces={"soap": "http://example1.com", "wsse": "http://docs.example2.org/wss/"}, 
    prefix="soap",
    stylesheet="style.xsl"
)