Lubridate 无法正确解析包含工作日/月/日/年的日期

德里克·科克兰

问题

我从一个网站下载了一个数据库,大业专栏的格式如下:

x <- c("Fri, Mar 1, 2019", "Sat, Mar 2, 2019", "Sun, Mar 3, 2019", "Mon, Mar 4, 2019", "Tue, Mar 5, 2019", "Wed, Mar 6, 2019", "Thu, Mar 7, 2019", "Fri, Mar 8, 2019", "Sat, Mar 9, 2019", "Sun, Mar 10, 2019", "Mon, Mar 11, 2019", "Tue, Mar 12, 2019", "Wed, Mar 13, 2019", "Thu, Mar 14, 2019", "Fri, Mar 15, 2019", "Sat, Mar 16, 2019", "Sun, Mar 17, 2019", "Mon, Mar 18, 2019", "Tue, Mar 19, 2019", "Wed, Mar 20, 2019", "Thu, Mar 21, 2019", "Fri, Mar 22, 2019", "Sat, Mar 23, 2019", "Sun, Mar 24, 2019", "Mon, Mar 25, 2019",  "Tue, Mar 26, 2019", "Wed, Mar 27, 2019", "Thu, Mar 28, 2019", "Fri, Mar 29, 2019", "Sat, Mar 30, 2019", "Sun, Mar 31, 2019")

其中包含从 3 月 1 日到 31 日的日期。我正在尝试将其转换为日期格式,因此我,dy在 lubridate 中使用了 y函数:

library("lubridate")
mdy(x)

这导致了以下向量:

 [1] "2019-03-01" "2019-03-02" "2019-03-20" "2019-04-20" "2019-05-20" "2019-03-06"
 [7] "2019-03-07" "2019-03-08" "2019-03-09" "2019-10-20" "2019-11-20" "2019-12-20"
[13] "2019-03-13" "2019-03-14" "2019-03-15" "2019-03-16" "2019-03-17" "2019-03-18"
[19] "2019-03-19" "2019-03-20" "2019-03-21" "2019-03-22" "2019-03-23" "2019-03-24"
[25] "2019-03-25" "2019-03-26" "2019-03-27" "2019-03-28" "2019-03-29" "2019-03-30"
[31] "2019-03-31"

正如您所看到的,大多数日期都是正确的,但它不适用于当月的第 4、5、10、11 和 12 天,在那里它读取日期就像它是月份一样。我一直在尝试几种解决方案,但到目前为止都没有奏效

一些没有奏效的可能解决方案

使用正则表达式从字符向量中删除工作日:

我认为解决这个问题的一种方法是删除字符串的工作日部分,所以我尝试删除逗号之前的所有内容,但我无法完美地做到这一点:

library(stringr)
y <- str_extract(Dt,",.*$")
y 
 [1] ", Mar 1, 2019"  ", Mar 2, 2019"  ", Mar 3, 2019"  ", Mar 4, 2019" 
 [5] ", Mar 5, 2019"  ", Mar 6, 2019"  ", Mar 7, 2019"  ", Mar 8, 2019" 
 [9] ", Mar 9, 2019"  ", Mar 10, 2019" ", Mar 11, 2019" ", Mar 12, 2019"
 [13] ", Mar 13, 2019" ", Mar 14, 2019" ", Mar 15, 2019" ", Mar 16, 2019"
 [17] ", Mar 17, 2019" ", Mar 18, 2019" ", Mar 19, 2019" ", Mar 20, 2019"
 [21] ", Mar 21, 2019" ", Mar 22, 2019" ", Mar 23, 2019" ", Mar 24, 2019"
 [25] ", Mar 25, 2019" ", Mar 26, 2019" ", Mar 27, 2019" ", Mar 28, 2019"
 [29] ", Mar 29, 2019" ", Mar 30, 2019" ", Mar 31, 2019"

但是现在当我使用时,我mdy把前 12 天都弄错了。

mdy(y)

[1] "2019-01-20" "2019-02-20" "2019-03-20" "2019-04-20" "2019-05-20" "2019-06-20"
[7] "2019-07-20" "2019-08-20" "2019-09-20" "2019-10-20" "2019-11-20" "2019-12-20"
[13] "2019-03-13" "2019-03-14" "2019-03-15" "2019-03-16" "2019-03-17" "2019-03-18"
[19] "2019-03-19" "2019-03-20" "2019-03-21" "2019-03-22" "2019-03-23" "2019-03-24"
[25] "2019-03-25" "2019-03-26" "2019-03-27" "2019-03-28" "2019-03-29" "2019-03-30"
[31] "2019-03-31"

关于如何解决这个问题的任何想法?

会话信息

我按要求添加了 SessionInfo

R version 3.4.4 (2018-03-15) 
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_CL.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=es_CL.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=es_CL.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_CL.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stringr_1.3.1   dplyr_0.7.6     rvest_0.3.2     xml2_1.2.0      XML_3.98-1.16  
[6] lubridate_1.7.4

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18     rstudioapi_0.7   knitr_1.20       bindr_0.1.1     
 [5] magrittr_1.5     tidyselect_0.2.4 R6_2.2.2         rlang_0.2.2     
 [9] httr_1.3.1       tools_3.4.4      pacman_0.4.6     selectr_0.4-1    
 [13] htmltools_0.3.6  yaml_2.2.0       rprojroot_1.3-2  digest_0.6.17   
 [17] assertthat_0.2.0 tibble_1.4.2     crayon_1.3.4     bindrcpp_0.2.2    
 [21] purrr_0.2.5      curl_3.2         glue_1.3.0       evaluate_0.11    
 [25] rmarkdown_1.10   stringi_1.2.4    pillar_1.3.0     compiler_3.4.4  
 [29] backports_1.1.2  pkgconfig_2.0.2 
德里克·科克兰

就像@duckmayr 认为这是语言环境问题一样,如上我的 sessioninfo 中所示,我的语言环境设置如下:

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_CL.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=es_CL.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=es_CL.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_CL.UTF-8 LC_IDENTIFICATION=C  

当我将 LC_TIME 更改为 en_US.UTF-8 时,一切都已修复,当我这样做时:

Sys.setlocale("LC_TIME", 'en_US.UTF-8')

然后使用mdy效果很好。希望这可以帮助将来遇到类似问题的人

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章