Extracting the date from a filename

69 views Asked by At

I have a folder with over 400 satellite images with names such as:

filenames <- c("mosaic_NDVI_2000_10_15.tif", "mosaic_NDVI_2000_10_31.tif", "mosaic_NDVI_2000_11_16.tif", "mosaic_NDVI_2000_12_18.tif", "mosaic_NDVI_2000_12_2.tif",  "mosaic_NDVI_2000_7_27.tif")

I then have a dataset which I need to extract the point coordinate for NDVI on the closest date, but to do this, I need to match up the filename closest date with the row date.

# example data

df <- data.frame(date = c("2000-7-26", "2000-10-20", "2000-12-10"), lon = c(23.05, 23.12, 23.08), lat = c(-32.56, -32.61, -32.53))
df$date <- lubridate::as_date(df$date)

filenames <- c("mosaic_NDVI_2000_10_15.tif", "mosaic_NDVI_2000_10_31.tif", "mosaic_NDVI_2000_11_16.tif", "mosaic_NDVI_2000_12_18.tif", "mosaic_NDVI_2000_12_2.tif",  "mosaic_NDVI_2000_7_27.tif")

### NEED SOME CODE HERE TO EXTRACT THE DATES AND CREATE THE FOLLOWING VECTOR ###
filename_dates <- c("2000-10-15", "2000-10-31", "2000-11-16", "2000-12-18", "2000-12-2",  "2000-7-27")

filename_dates <- lubridate::as_date(filename_dates)

for(i in 1:nrow(df)){
  date <- df$date[i]
  index <- which(abs(filename_dates - date) == min(abs(filename_dates - date)))
  extraction_filename <- filenames[index]
  point_vec <- vect(df[1,], geom=c("lon", "lat"), crs="+proj=longlat +datum=WGS84", keepgeom=FALSE)
  NDVI <- rast(extraction_filename)
  crs(NDVI) <- crs(point_vec)
  value <- terra::extract(x=NDVI, y=point_vec, df=TRUE)
  df$NDVI[i] <- value[2]}

3

There are 3 answers

0
G. Grothendieck On

Convert filenames to a data frame and convert the dates in both to Date class. Then use which.min

library(lubridate)

filenamesDF <- data.frame(date = ymd(trimws(filenames, whitespace = "\\D")))
df <- transform(df, date = ymd(date))

DF <- cbind(filenamesDF, 
  df[sapply(filenamesDF$date, \(x) which.min(abs(x - df$date))), ])
names(DF)[2] <- "date2"
DF
   

or using sql using df and filenamesDF defined above

library(sqldf)

DF2 <- sqldf("select a.date, df.date date, df.lon, df.lat
  from filenamesDF as a left join df
  on abs(a.date - df.date) = (select min(abs(a.date - df.date)) from df)")
names(DF2)[2] <- "date2"
DF2

Either of these give

        date      date2   lon    lat
1 2000-10-15 2000-10-20 23.12 -32.61
2 2000-10-31 2000-10-20 23.12 -32.61
3 2000-11-16 2000-12-10 23.08 -32.53
4 2000-12-18 2000-12-10 23.08 -32.53
5 2000-12-02 2000-12-10 23.08 -32.53
6 2000-07-27 2000-07-26 23.05 -32.56

Note

Inputs from question

filenames <- c("mosaic_NDVI_2000_10_15.tif", "mosaic_NDVI_2000_10_31.tif", 
  "mosaic_NDVI_2000_11_16.tif", "mosaic_NDVI_2000_12_18.tif", 
  "mosaic_NDVI_2000_12_2.tif",  "mosaic_NDVI_2000_7_27.tif")

df <- data.frame(date = c("2000-7-26", "2000-10-20", "2000-12-10"),
  lon = c(23.05, 23.12, 23.08),
  lat = c(-32.56, -32.61, -32.53)
)
0
shghm On

This code will transform your filenames into your desired vector which you can use to continue your work:

filenames <- c("mosaic_NDVI_2000_10_15.tif", "mosaic_NDVI_2000_10_31.tif", "mosaic_NDVI_2000_11_16.tif", "mosaic_NDVI_2000_12_18.tif", "mosaic_NDVI_2000_12_2.tif",  "mosaic_NDVI_2000_7_27.tif")

fname1<-unlist(lapply(strsplit(filenames,split = "NDVI_"),'[[',2)) #cut away the first part before the date
fname2<-unlist(lapply(strsplit(fname1,split=".",fixed=T),'[[',1)) # cut away the file ending
filename_dates<-gsub(pattern="_",replacement="-",fname2,fixed = T) #exchange the underscore with a dash

It may not be the most elegant code, but it should do the job...

0
Robert Hijmans On

With

filenames <- c("mosaic_NDVI_2000_10_15.tif", "mosaic_NDVI_2000_10_31.tif", 
               "mosaic_NDVI_2000_11_16.tif", "mosaic_NDVI_2000_12_18.tif",
               "mosaic_NDVI_2000_12_2.tif", "mosaic_NDVI_2000_7_27.tif")

df <- data.frame(
  date = c("2000-7-26", "2000-10-20", "2000-12-10"),
  lon = c(23.05, 23.12, 23.08), 
  lat = c(-32.56, -32.61, -32.53)
)
df$date <- as.Date(df$date)

You can get the dates from the filenames with

dates <- as.Date(filenames, "mosaic_NDVI_%Y_%m_%d.tif")

You can get the row in the data.frame with the nearest date like this

i <- sapply(dates, \(d) which.min(abs(df$date - d)))
i
# [1] 2 2 3 3 3 1

These correspond to

df$date[i]
#[1] "2000-10-20" "2000-10-20" "2000-12-10" "2000-12-10" "2000-12-10" "2000-07-26"