Thank you in advance for anyone who takes the time to help with this!
I'm trying to download all data from https://municipaldata.treasury.gov.za/docs#general and save each cube as an individual dataframes. I'm using R for this.
I've been trying to use lappy and for loops to download all cubes and pages without success.
So there are three aspects that I'm trying to address:
- Download all 21 cubes (see the end for the structure of the URL)
- Save all items from each cube as a separate dataframe
- Download all items from all pages (there's pagination and a limit of 10 000 items per a page)
For the cubes (1), I don't know how to don't know how to use lapply or for loops to save multiple dataframes as outputs (2).
I've figured out the pagination part (3) using a for loop (any help applying it to lapply would be appreciated), but the solution requires me to run the code separately for each cube.
Example, for the cube 'repmaint', to get all the data using a for loop:
# set df
df_repmaint <- data.frame()
# get the number of items to determine the number of pages
url_repmaint <- fromJSON("https://municipaldata.treasury.gov.za/api/cubes/repmaint/facts")
# as there are max 10 000 items per a page, calculate the number of pages
perpage <- ceiling(url_repmaint$total_fact_count / 10000)
# set up loop
for(i in 1:perpage){
# get cube data for each page
repmaint_page <- fromJSON(paste0("https://municipaldata.treasury.gov.za/api/cubes/repmaint/facts?page=",i))
# just to show it's working
message("Retrieving page ", i)
# save output into a single dataframe
df_repmaint <- rbind(df_repmaint,repmaint_page$data)
}
And then, I would want to save the output as separate dataframes.
As you can see, definitely out of my depth but trying to learn!
On the structure of the url:
The url to access data from cubes is structured as follows: https://municipaldata.treasury.gov.za/api/cubes/*'cube name'/facts?page='page number'*
for which I can easily create a list of cube names using the following formula
# extract JSON from URL
JSON_datasets<- fromJSON("https://municipaldata.treasury.gov.za/api/cubes")
# turn into dataframe
df_datasets <- as.data.frame(JSON_datasets[[2]])
# save as a list of each individual url
urls <-
sprintf("https://municipaldata.treasury.gov.za/api/cubes/%s/facts",
df_datasets$name)
# check list
urls