I have this code. I am trying to extract data from this website into pandas.
from pyquery import PyQuery as pq
import requests
import pandas as pd
url = "https://www.tsa.gov/travel/passenger-volumes"
content = requests.get(url).content
doc = pq(content)
Passengers = doc(".views-align-center").text()
Method 1:
df = pd.DataFrame([x.split(' ') for x in Passengers.split(' ')])
print(df)
Method 2:
Passengers = Passengers.replace(' ',';')
Passengers
For Method 1, is it possible to do pandas data frame unstack to get proper table structure?
Or is it better to do Method 2? How to split string periodically and load into pandas?
You can do this directly in Pandas:
which gives the DataFrame:
The NaN values in 2023 force the float dtype but you can then clean the data as required. For example: