as.Date returns different formats if sapply or not

808 views Asked by At

I have a data frame with a date columns that I need to convert into a format R recognizes as a date.

> dataframe
        Date        Sum
1   06/09/15       2.51
2   06/09/15       3.75
3   06/09/15       3.50
...

I first converted it using sapply:

> dataframe$Date2<-sapply(dataframe$Date,as.Date,format="%m/%d/%y")

This returned the date as the number of days from Jan 1, 1970:

> dataframe
        Date        Sum      Date2
1   06/09/15       2.51      16595
2   06/09/15       3.75      16595
3   06/09/15       3.50      16595
...

Later on I tried converting it without sapply:

> dataframe$Date3<-as.Date(dataframe$Date,format="%m/%m/%d")

This, in turn, returned

> dataframe
        Date        Sum      Date2       Date3
1   06/09/15       2.51      16595  2015-09-15
2   06/09/15       3.75      16595  2015-09-15
3   06/09/15       3.50      16595  2015-09-15
...

These are two very different, apparently incompatible formats. Why does sapply return one format (days since the origin), while doing without it returns another (%Y-%m-%d)?

Now, obviously I could just ignore one method and go forth never using sapply with as.Date but I'd like to know why it reads differently. I am also struggling to convert the Date3 vector into the Date2 format.

Thus, I have two questions:

  1. Why does sapply provide a different date format?
  2. How do I convert a date-recognizable sequence (such as mm/dd/yyyy) into the number of days since 1 Jan 1970?
3

There are 3 answers

0
Tim Biegeleisen On

Here is an answer to the second part of your original question. To obtain the number of days since the epoch (1 Jan 1970) for a date in the format mm/dd/yyyy you can use the as.Date() function:

some.date <- as.Date("06/17/2015", "%m/%d/%Y")
days.since.epoch <- unclass(some.date)

> days.since.epoch
[1] 16616

Internally, R stores the date object some.date in terms of the number of days since the epoch (1 Jan 1970), and calling unclass() reveals this internal representation.

0
Sarina On

when working with dates I love to use lubridate as it is in my eyes much easier to use and much more intuitive then the base functions.
Your second question could be done with the following code:

require(lubridate)
dataframe$Date2<-difftime(dataframe$Date3,dmy("01-01-1970"),units="days")

depending on if you want to have the 1. January 1970 as day 1 or not you may have to add a +1 to the end of this line.

I don't really work with sapply and tapply directly (I prefer to use plyr for this) so I can't help with your first question.

0
svendvn On

1.

If you don't use the argument simplify=FALSE, sapply will use the command unlist to transform the answer from a list into a vector. unlist coerces the list elements to be of common type. From the manual:

Where possible the list elements are coerced to a common mode during the unlisting, and so the result often ends up as a character vector. Vectors will be coerced to the highest type of the components in the hierarchy NULL < raw < logical < integer < double < complex < character < list < expression: pairlists are treated as lists.

Because Date is not a part of hierarchy, unlist can not coerce to Date. I am not sure exactly why unlist chooses to coerce to integer (and not character), but it has probably something to do with the fact that Date objects are stored as integers.

2.

To convert a Date to the number of days since 1 Jan 1970, you can use as.numeric

today=Sys.Date()
> today
[1] "2019-04-16"
> as.numeric(today)
[1] 18002

and to go back

> as.Date(18002, origin="1970-01-01")
[1] "2019-04-16"