I would like to use linear interpolation to replace NA values on a Df. On my Df columns represent a time series of daily data, so the Df in on a wide format. My data represents Days (in columns) and values of a variable, temperature, in rows. So the DF looks something like below (this is way simplified)
Tempdf <- data.frame ("Day1"= c(20, 22, 19, 28, NA),
"Day2" = c(NA, 24, NA, NA, 28) , "Day3"=c(23, 26, NA, NA, 29), "Day4"= c(25, 24, NA, 29, 30),"Day5"=c(24, NA, 22, 28, 29))
I've got a long time series of data (more than 1000 days in which some days are NA values) so I would like to interpolate those NA values based on the data within the time series. I am not sure if I need to specify a range, lets say for each NA look at the values on the same row, 3 columns before and after the NA (for example) and then interpolate that value. I need something like this as if the interpolation was made based on all columns it would look at daily values over the years so the result would be very different to the temperature data on days closer to when the NA record was missing.
So let's say temperature on Day1:Day5 was as in my example: 20, NA, 23,25,24, I would expect that NA to be around 23.
I have tried with zoo package but haven't have much luck.
Temp2 <- na.approx(Temp1)
Thanks!
You could use
approxinsideapply. Note that for missing values in the first or last column, this will simply copy over the adjacent value rather than trying to extrapolate a trend. For missing values in the inner columns, the value will be interpolated between adjacent non-missing columns.