How to read XML files with initial tags in R

98 views Asked by At

I have several XML files which are missing the initial tag. For example, this is the proper formatted file:-

<?xml version="1.0"?>
<UDI>
<Test_Equipment_Number>3300061-01</Test_Equipment_Number>
<Test_SW_Number>3300062</Test_SW_Number>
<Test_SW_Version>2.1</Test_SW_Version>
<GTIN>(01)00884838088597</GTIN>
<LOT></LOT>
<Date_of_Mfg>(11)20190322</Date_of_Mfg>
<Device_SN>(21)1160001242</Device_SN>
<Material_Number>(96)300001287651</Material_Number>
<PCBA_WO_and_SN>00190311-0001242</PCBA_WO_and_SN>
<FW_Version>06</FW_Version>
<Model>324PHB</Model>
</UDI>

And this is the file with missing initial tag:-

<Test_Equipment_Number>3300011-01</Test_Equipment_Number>
<Test_SW_Number>3300012</Test_SW_Number>
<Test_SW_Version>5.1</Test_SW_Version>
<GTIN>(01)00884838085497</GTIN>
<LOT></LOT>
<Date_of_Mfg>(11)20190411</Date_of_Mfg>
<Device_SN>(21)1120104548</Device_SN>
<Material_Number>(96)300000267981</Material_Number>
<PCBA_WO_and_SN>000143-00000793</PCBA_WO_and_SN>
<FW_Version>V01.0001</FW_Version>
<Model>7000PHW</Model>

How could I read the file with missing initial tag in R Programming Language ?

1

There are 1 answers

1
npjc On BEST ANSWER

One option would be to parse the xml fragment by specifying a top node to be added:

# install.packages('XML')
library(XML)

fragment <- 
'<Test_Equipment_Number>3300011-01</Test_Equipment_Number>
<Test_SW_Number>3300012</Test_SW_Number>
<Test_SW_Version>5.1</Test_SW_Version>
<GTIN>(01)00884838085497</GTIN>
<LOT></LOT>
<Date_of_Mfg>(11)20190411</Date_of_Mfg>
<Device_SN>(21)1120104548</Device_SN>
<Material_Number>(96)300000267981</Material_Number>
<PCBA_WO_and_SN>000143-00000793</PCBA_WO_and_SN>
<FW_Version>V01.0001</FW_Version>
<Model>7000PHW</Model>'

XML::parseXMLAndAdd(fragment, top = 'content')
#> <content>
#>   <Test_Equipment_Number>3300011-01</Test_Equipment_Number>
#>   <Test_SW_Number>3300012</Test_SW_Number>
#>   <Test_SW_Version>5.1</Test_SW_Version>
#>   <GTIN>(01)00884838085497</GTIN>
#>   <LOT/>
#>   <Date_of_Mfg>(11)20190411</Date_of_Mfg>
#>   <Device_SN>(21)1120104548</Device_SN>
#>   <Material_Number>(96)300000267981</Material_Number>
#>   <PCBA_WO_and_SN>000143-00000793</PCBA_WO_and_SN>
#>   <FW_Version>V01.0001</FW_Version>
#>   <Model>7000PHW</Model>
#> </content>