Rbeast Time Series - Weird Time Axis In Plot

Question

Rbeast Time Series - Weird Time Axis In Plot

494 views Asked by Hannah0575 At 14 July 2022 at 07:37

I am looking into ambient air pollution within regions of NSW and conducting a daily time series decomposition analysis using Rbeast to investigate if there is a change point signature around the time of Covid-19 lockdowns.

I have created a looping code to analyse the data for each pollutant within each region - however the Beast X axis ("Date" - i.e. 01-01-2021 - ideally would plot years (2012-2022) is plotting strangely ( I.e. Time = 16000, 17000, 18000 etc.?).

Anyone know how to fix this?

beast_output = list()

target_pollutants = c("PM10", "OZONE", "NO", "NO2")
target_sites = c("WOLLONGONG", "MUSWELLBROOK", "SINGLETON", "CAMBERWELL", "WAGGAWAGGANORTH", "RICHMOND", "CAMDEN", "CHULLORA", "EARLWOOD",  "WALLSEND", "BERESFIELD", "BARGO", "BRINGELLY", "PROSPECT", "STMARYS", "OAKDALE", "RANDWICK", "ROZELLE", "NEWCASTLE", "KEMBLAGRANGE", "ALBIONPARKSOUTH")
for (poll in target_pollutants) {
  beast_output[[poll]] = list()
  df = time_by_poll[[poll]]  #  grab the target df
  sites = colnames(df)
  sites$Date = NULL  #  clear date from the list
  for (site in sites) {
    ts = ts(df[[site]], start=min(df$Date), end=max(df$Date))
    beast_results = beast(ts)
    # plot(beastie_resulty)
    beast_output[[poll]][[site]] = beast_results
  }
}


plot (beast_results[["OZONE"]][["RANDWICK"]])

Original Q&A

There are 1 answers

**zhaokg** · Answer 1 · 2022-07-18T04:51:28+00:00

Thanks for asking and sorry about the issue. Indeed, the API interface in Rbeast is kinda confusing because it was originally coded to handle satellite time series.

Regardless, the BEAST model in the package was formulated only for regular time series.(By regular, I mean equally-spaced time series with the same number of data points per period.) Because leap years have 366 days but others have 356 days, daily time series are treated in BEAST as irregular time series if the periodicity is one year. However, if the periodic variation is weekly/7 days, daily time series are considered as regular. In order to handle irregular time series, I implemented the beast.irreg function which accepts irregular inputs and aggregate them into regular time series before doing the decomposition and changepoint detection.

To illustrate, I got a sample PM10 dataset for several regions (e.g., WOLLONGONG", and "MUSWELLBROOK") from this site https://www.dpie.nsw.gov.au/air-quality/air-quality-data-services/data-download-facility, and I posted the CSV file (as well as another dataset on ozone) under https://github.com/zhaokg/Rbeast/tree/master/R/SampleData. You can directly read the files from R as shown below:

df    = read.csv('https://github.com/zhaokg/Rbeast/raw/master/R/SampleData/pm10_1658112168.csv',header=FALSE,skip=3)

dates = as.Date(df[,1], "%d/%m/%Y")  # the 1st col is dates
Y     = df[,2:ncol(df)]              # the rest are PM10 data for the several sample sites
                                     # e.g., Y[,1] for the first region (WOLLONGONG)


library(Rbeast)

o = beast.irreg(log(Y[,1]),time=dates,deltat=1/12, freq=12, tseg.min=3, sseg.min=6)

# log(Y[,1]) :  Log-transformation may help if data is skewed bcz the BEAST model 
                assumes Gaussian errors;
# time=dates :  Use the 'time' arg to supply the times of individual data points.
               Alternatively, the `beast123' function also handles date strings of different formats
# deltat=1/12:  Aggregate the daily time series into a regular one at the interval 
                of 1 month=1/12 year
# freq=12 :    The period is 1 year, which is 12 data points (1.0/deltat=12)
# tseg.min:     The minimum trend segment length allowed in the changepoint detection is 3 data points (3 months) 
                -- the results MAY be sensitive to this parameter
# sseg.min:     The minimum seasonal segment length allowed in the changepoint detection is 6 data points (6 months) 
                -- the results MAY be sensitive to this parameter

plot(o)




# For sure, the daily time series can be re-sampled/aggregated to a different time interval
# Below, it is aggregated into a half monthly time series (dT=1/24 year), and the number
# of data points per period is freq=1.0 year/dT=24
o = beast.irreg( log(Y[,1]),time=dates,  deltat = 1/24, freq=24, tseg.min=12, sseg.min=12)
plot(o)

# Aggregated to a weekly time series (e.g., dT=7 / 365  year: the unit again is year), 
# and freq=1 year/ dT = 365/7.
# tcp.minmax=c(0,10) :  the min and max numbers of changepoints allowed in the trend component
o = beast.irreg( log(Y[,1]),time=dates,deltat=7/365,freq=365/7, tcp.minmax=c(0,15),tseg.min=5, sseg.min=20,ocp=10)
plot(o)


# Finally if you want to run on the daily interval. Specify the dT=deltat=1/365 year, and 
# freq = period/dT= 1.0 year/(1/365)year =365. Bcz the raw data is daily, 
# the majority of the raw data is kept intact during the aggregation except when 
# there is a leap year (the last two days of the leap year are merged into a single day)

o = beast.irreg( log(Y[,1]),time=dates,  deltat = 1/365, freq=365/1,  tseg.min=30, sseg.min=180)
plot(o)

By default, a time series is decomposed as Y= season + trend + error, but for your dataset in the original scale (e.g., not log-tranformed), there could be some spikes. One way to model this is to add an extra spike/outlier component: Y=season+trend+outlier/spike-like+error

# Use 'ocp' to specify the maximum number of spikes (either upward or downward) to be allowed in the outlier/spike component

o = beast.irreg(Y[,1],time=dates,  deltat = 1/365, freq=365/1,  tseg.min=30, sseg.min=180, ocp=10)
plot(o)

Below is an example for one time series analyzed at the weekly interval (Again, the exact results vary, depending on the choices of tseg.min or sseg.min).

More important, another issue I noticed from your figure is that your data seem to have lots of missing values, which should be assigned NA but instead assigned zeros in your figure. If that is the case, the analysis result for certain would be wrong. BEAST can handle missing data and these missing values should be given NA or NAN (e.g., Y[Y==0]=NA).

TechQA.

Rbeast Time Series - Weird Time Axis In Plot

There are 1 answers

Related Questions in R

Related Questions in TIME-SERIES

Related Questions in BEAST

Related Questions in MULTIVARIATE-TIME-SERIES

Popular Questions

Trending Questions