I have a data frame with a date columns that I need to convert into a format R recognizes as a date.
> dataframe
Date Sum
1 06/09/15 2.51
2 06/09/15 3.75
3 06/09/15 3.50
...
I first converted it using sapply
:
> dataframe$Date2<-sapply(dataframe$Date,as.Date,format="%m/%d/%y")
This returned the date as the number of days from Jan 1, 1970:
> dataframe
Date Sum Date2
1 06/09/15 2.51 16595
2 06/09/15 3.75 16595
3 06/09/15 3.50 16595
...
Later on I tried converting it without sapply
:
> dataframe$Date3<-as.Date(dataframe$Date,format="%m/%m/%d")
This, in turn, returned
> dataframe
Date Sum Date2 Date3
1 06/09/15 2.51 16595 2015-09-15
2 06/09/15 3.75 16595 2015-09-15
3 06/09/15 3.50 16595 2015-09-15
...
These are two very different, apparently incompatible formats. Why does sapply
return one format (days since the origin), while doing without it returns another (%Y-%m-%d)?
Now, obviously I could just ignore one method and go forth never using sapply
with as.Date
but I'd like to know why it reads differently. I am also struggling to convert the Date3 vector into the Date2 format.
Thus, I have two questions:
- Why does
sapply
provide a different date format? - How do I convert a date-recognizable sequence (such as mm/dd/yyyy) into the number of days since 1 Jan 1970?
Here is an answer to the second part of your original question. To obtain the number of days since the epoch (1 Jan 1970) for a date in the format
mm/dd/yyyy
you can use theas.Date()
function:Internally, R stores the date object
some.date
in terms of the number of days since the epoch (1 Jan 1970), and callingunclass()
reveals this internal representation.