I am looking into ambient air pollution within regions of NSW and conducting a daily time series decomposition analysis using Rbeast to investigate if there is a change point signature around the time of Covid-19 lockdowns.
I have created a looping code to analyse the data for each pollutant within each region - however the Beast X axis ("Date" - i.e. 01-01-2021 - ideally would plot years (2012-2022) is plotting strangely ( I.e. Time = 16000, 17000, 18000 etc.?).
Anyone know how to fix this?
beast_output = list()
target_pollutants = c("PM10", "OZONE", "NO", "NO2")
target_sites = c("WOLLONGONG", "MUSWELLBROOK", "SINGLETON", "CAMBERWELL", "WAGGAWAGGANORTH", "RICHMOND", "CAMDEN", "CHULLORA", "EARLWOOD", "WALLSEND", "BERESFIELD", "BARGO", "BRINGELLY", "PROSPECT", "STMARYS", "OAKDALE", "RANDWICK", "ROZELLE", "NEWCASTLE", "KEMBLAGRANGE", "ALBIONPARKSOUTH")
for (poll in target_pollutants) {
beast_output[[poll]] = list()
df = time_by_poll[[poll]] # grab the target df
sites = colnames(df)
sites$Date = NULL # clear date from the list
for (site in sites) {
ts = ts(df[[site]], start=min(df$Date), end=max(df$Date))
beast_results = beast(ts)
# plot(beastie_resulty)
beast_output[[poll]][[site]] = beast_results
}
}
plot (beast_results[["OZONE"]][["RANDWICK"]])

Thanks for asking and sorry about the issue. Indeed, the API interface in Rbeast is kinda confusing because it was originally coded to handle satellite time series.
Regardless, the BEAST model in the package was formulated only for regular time series.(By regular, I mean equally-spaced time series with the same number of data points per period.) Because leap years have 366 days but others have 356 days, daily time series are treated in BEAST as irregular time series if the periodicity is one year. However, if the periodic variation is weekly/7 days, daily time series are considered as regular. In order to handle irregular time series, I implemented the
beast.irregfunction which accepts irregular inputs and aggregate them into regular time series before doing the decomposition and changepoint detection.To illustrate, I got a sample PM10 dataset for several regions (e.g., WOLLONGONG", and "MUSWELLBROOK") from this site https://www.dpie.nsw.gov.au/air-quality/air-quality-data-services/data-download-facility, and I posted the CSV file (as well as another dataset on ozone) under https://github.com/zhaokg/Rbeast/tree/master/R/SampleData. You can directly read the files from R as shown below:
By default, a time series is decomposed as
Y= season + trend + error, but for your dataset in the original scale (e.g., not log-tranformed), there could be some spikes. One way to model this is to add an extra spike/outlier component:Y=season+trend+outlier/spike-like+errorBelow is an example for one time series analyzed at the weekly interval (Again, the exact results vary, depending on the choices of tseg.min or sseg.min).
More important, another issue I noticed from your figure is that your data seem to have lots of missing values, which should be assigned NA but instead assigned zeros in your figure. If that is the case, the analysis result for certain would be wrong. BEAST can handle missing data and these missing values should be given NA or NAN (e.g.,
Y[Y==0]=NA).