How to create a Dataframe with a file in xlm format?

44 views Asked by At

I have a file called robust.txt in txt extension but data are organised in xlm format
As you can see the content of my file is:

<top>

<num> Number: 301 
<title> International Organized Crime 

<desc> Description: 
Identify organizations that participate in international criminal
activity, the activity, and, if possible, collaborating organizations
and the countries involved.

<narr> Narrative: 
A relevant document must as a minimum identify the organization and the
type of illegal activity (e.g., Columbian cartel exporting cocaine).
Vague references to international drug trade without identification of
the organization(s) involved would not be relevant.

</top>


<top>

<num> Number: 302
<title> Poliomyelitis and Post-Polio 

<desc> Description: 
Is the disease of Poliomyelitis (polio) under control in the
world?
 
<narr> Narrative: 
Relevant documents should contain data or outbreaks of the 
polio disease (large or small scale), medical protection 
against the disease, reports on what has been labeled as 
"post-polio" problems.  Of interest would be location of 
the cases, how severe, as well as what is being done in 
the "post-polio" area.
 
</top>


<top>

<num> Number: 303
<title> ....
....

So I have 2 problems:
How to read robust.txt in xlm if this one have txt extension and xlm format within the file? How to convert xlm file into a dataframe.

I want a data frame with fields: num, title, desc and narr.

Dataframe:
num| title| desc| narr|
301| .....| ....| ....|
302| .....| ....| ....|
...| .....| ....| ....|
1

There are 1 answers

1
vignesh venkat On

Try referring Pandas library of python, I have read similar files(csv, json, tsv). I use read_csv, similarly us may need to use some thing (read_xml). Read the documentaion of padas read_xml before using it.