Date Format mdy vs dmy

79 views Asked by At

Can someone help me with the following.

I am trying to parse column 'date2' in the format mdy (Month Day Year)

enter image description here

  date1      date2      date2b    date3     date4      date5     
  <chr>      <chr>      <chr>     <chr>     <chr>      <chr>     
1 5/13/2013  22/9/2012  22-Sep-12 40958     2012/28/08 2010-25-11
2 5/4/2013   26/2/2012  26-Feb-12 41271     2012/15/4  2010-24-02
3 5/15/2013  29/11/2012 29-Nov-12 40942     2/3/2012   2010-28-3 
4 4/17/2013  3/2/2012   3-Feb-12  40954     2/15/2012  6/19/2010 
5 12/20/2013 3/20/2012  20-Mar-12 40944     2012/14/11 5/11/2010 
6 26/02/2013 29/11/2012 15-Aug-03 22/9/2012 2012/05/06 26/02/2013

dput(data)
structure(list(date1 = c("5/13/2013", "5/4/2013", "5/15/2013", 
"4/17/2013", "12/20/2013", "26/02/2013", "4/17/2013"), date2 = c("22/9/2012", 
"26/2/2012", "29/11/2012", "3/2/2012", "3/20/2012", "29/11/2012", 
"2/8/2012"), date2b = c("22-Sep-12", "26-Feb-12", "29-Nov-12", 
"3-Feb-12", "20-Mar-12", "15-Aug-03", "12/17/2010"), date3 = c("40958", 
"41271", "40942", "40954", "40944", "22/9/2012", "14.05.2013"
), date4 = c("2012/28/08", "2012/15/4", "2/3/2012", "2/15/2012", 
"2012/14/11", "2012/05/06", "14.05.2012"), date5 = c("2010-25-11", 
"2010-24-02", "2010-28-3", "6/19/2010", "5/11/2010", "26/02/2013", 
"18/11/2010")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-7L))

My understanding is that R should first try to format it as mdy (see code below), and then (if not possible if the day is >12) as dmy.

enter image description here

    > parse_date_time(data$date2, orders = c("mdy", "dmy"))
[1] "2012-09-22 UTC" "2012-02-26 UTC" "2012-11-29 UTC" "2012-02-03 UTC" "2012-03-20 UTC"
[6] "2012-11-29 UTC" "2012-08-02 UTC"

dput(parse_date_time(data$date2, orders = c("mdy", "dmy")))
structure(c(1348272000, 1330214400, 1354147200, 1328227200, 1332201600, 
1354147200, 1343865600), class = c("POSIXct", "POSIXt"), tzone = "UTC")

Therefore, the value 3/2/2012 and 2/8/2012 should be parsed as March 3rd, 2012 and Feb 2nd, 2012. The result is, however, Feb 3rd, 2012 and Aug 2nd 2012.

What am I doing wrong?

Code Example:

library(readxl)
library(lubridate)

# data <- read_excel("./data/Data.xlsx")
data <- structure(list(date1 = c("5/13/2013", "5/4/2013", "5/15/2013", 
                         "4/17/2013", "12/20/2013", "26/02/2013", "4/17/2013"), 
               date2 = c("22/9/2012", "26/2/2012", "29/11/2012", 
                         "3/2/2012", "3/20/2012", "29/11/2012", "2/8/2012"), 
               date2b = c("22-Sep-12", "26-Feb-12", "29-Nov-12", "3-Feb-12", 
                          "20-Mar-12", "15-Aug-03", "12/17/2010"), 
               date3 = c("40958", "41271", "40942", "40954", 
                         "40944", "22/9/2012", "14.05.2013"), 
               date4 = c("2012/28/08", "2012/15/4", "2/3/2012", "2/15/2012",
                         "2012/14/11", "2012/05/06", "14.05.2012"), 
               date5 = c("2010-25-11", "2010-24-02", "2010-28-3", "6/19/2010", 
                         "5/11/2010", "26/02/2013","18/11/2010")), 
          class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,-7L))
# Print the first 6 rows of the data
head(data)
parse_date_time(data$date2, orders = c("mdy", "dmy"))

Output:

[1] "2012-09-22 UTC" "2012-02-26 UTC" "2012-11-29 UTC" "2012-02-03 UTC" "2012-03-20 UTC"
[6] "2012-11-29 UTC" "2012-08-02 UTC"

Thank you very much in advance!!

I tried to use the function: parse_date_time(ds$date2, orders = c('mdy', 'dmy')) Unfortunately, it is not working

1

There are 1 answers

0
Onyambu On

Use the select_formats argument to determine the presedence:

fn <- function(x,...)names(x)[which.max(grepl('m.*d',names(x))*10)]
parse_date_time(data$date2, c('mdy', 'dmy'), select_formats = fn)

[1] "2012-09-22 UTC" "2012-02-26 UTC" "2012-11-29 UTC" "2012-03-02 UTC" "2012-03-20 UTC" "2012-11-29 UTC" "2012-02-08 UTC"

I multiplied by 10 just to increase the output of myd. The results is as needed ie March 02 and Feb 08`


Here is a quick example of what the select_format does;

parse_date_time('3/2/2012', orders = c("dmy", "mdy"))
[1] "2012-02-03 UTC" # Feb 3rd

parse_date_time('3/2/2012', orders = c("dmy", "mdy"), select_formats = fn)
[1] "2012-03-02 UTC" # March 2nd