Can someone help me with the following.
I am trying to parse column 'date2' in the format mdy (Month Day Year)
date1 date2 date2b date3 date4 date5
<chr> <chr> <chr> <chr> <chr> <chr>
1 5/13/2013 22/9/2012 22-Sep-12 40958 2012/28/08 2010-25-11
2 5/4/2013 26/2/2012 26-Feb-12 41271 2012/15/4 2010-24-02
3 5/15/2013 29/11/2012 29-Nov-12 40942 2/3/2012 2010-28-3
4 4/17/2013 3/2/2012 3-Feb-12 40954 2/15/2012 6/19/2010
5 12/20/2013 3/20/2012 20-Mar-12 40944 2012/14/11 5/11/2010
6 26/02/2013 29/11/2012 15-Aug-03 22/9/2012 2012/05/06 26/02/2013
dput(data)
structure(list(date1 = c("5/13/2013", "5/4/2013", "5/15/2013",
"4/17/2013", "12/20/2013", "26/02/2013", "4/17/2013"), date2 = c("22/9/2012",
"26/2/2012", "29/11/2012", "3/2/2012", "3/20/2012", "29/11/2012",
"2/8/2012"), date2b = c("22-Sep-12", "26-Feb-12", "29-Nov-12",
"3-Feb-12", "20-Mar-12", "15-Aug-03", "12/17/2010"), date3 = c("40958",
"41271", "40942", "40954", "40944", "22/9/2012", "14.05.2013"
), date4 = c("2012/28/08", "2012/15/4", "2/3/2012", "2/15/2012",
"2012/14/11", "2012/05/06", "14.05.2012"), date5 = c("2010-25-11",
"2010-24-02", "2010-28-3", "6/19/2010", "5/11/2010", "26/02/2013",
"18/11/2010")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-7L))
My understanding is that R should first try to format it as mdy (see code below), and then (if not possible if the day is >12) as dmy.
> parse_date_time(data$date2, orders = c("mdy", "dmy"))
[1] "2012-09-22 UTC" "2012-02-26 UTC" "2012-11-29 UTC" "2012-02-03 UTC" "2012-03-20 UTC"
[6] "2012-11-29 UTC" "2012-08-02 UTC"
dput(parse_date_time(data$date2, orders = c("mdy", "dmy")))
structure(c(1348272000, 1330214400, 1354147200, 1328227200, 1332201600,
1354147200, 1343865600), class = c("POSIXct", "POSIXt"), tzone = "UTC")
Therefore, the value 3/2/2012 and 2/8/2012 should be parsed as March 3rd, 2012 and Feb 2nd, 2012. The result is, however, Feb 3rd, 2012 and Aug 2nd 2012.
What am I doing wrong?
Code Example:
library(readxl)
library(lubridate)
# data <- read_excel("./data/Data.xlsx")
data <- structure(list(date1 = c("5/13/2013", "5/4/2013", "5/15/2013",
"4/17/2013", "12/20/2013", "26/02/2013", "4/17/2013"),
date2 = c("22/9/2012", "26/2/2012", "29/11/2012",
"3/2/2012", "3/20/2012", "29/11/2012", "2/8/2012"),
date2b = c("22-Sep-12", "26-Feb-12", "29-Nov-12", "3-Feb-12",
"20-Mar-12", "15-Aug-03", "12/17/2010"),
date3 = c("40958", "41271", "40942", "40954",
"40944", "22/9/2012", "14.05.2013"),
date4 = c("2012/28/08", "2012/15/4", "2/3/2012", "2/15/2012",
"2012/14/11", "2012/05/06", "14.05.2012"),
date5 = c("2010-25-11", "2010-24-02", "2010-28-3", "6/19/2010",
"5/11/2010", "26/02/2013","18/11/2010")),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,-7L))
# Print the first 6 rows of the data
head(data)
parse_date_time(data$date2, orders = c("mdy", "dmy"))
Output:
[1] "2012-09-22 UTC" "2012-02-26 UTC" "2012-11-29 UTC" "2012-02-03 UTC" "2012-03-20 UTC"
[6] "2012-11-29 UTC" "2012-08-02 UTC"
Thank you very much in advance!!
I tried to use the function: parse_date_time(ds$date2, orders = c('mdy', 'dmy')) Unfortunately, it is not working
Use the
select_formatsargument to determine the presedence:I multiplied by 10 just to increase the output of
myd. The results is as needed ieMarch 02and Feb 08`Here is a quick example of what the
select_formatdoes;