当我检查这个链接时,我仍然在努力将格式化的 sas 文件导入 R,因为我有一个格式化的 .sas7bdat 文件(附在这里)但是当我尝试将它导入时,R
我注意到所有格式都丢失了。我使用了 2 个不同的代码:
## Code 1:
##========
library(haven)
data <- read_sas("C:/Users/mmr2011/OneDrive/R codes/df_nsclc1.sas7bdat", NULL)
## Code 2:
##========
library(sas7bdat)
data("sas7bdat.sources")
data<-read.sas7bdat("C:/Users/mmr2011/OneDrive/R codes/df_nsclc1.sas7bdat", debug= F)
table(data$SEX) # gives me 1 and 2 instead of males and females
# 1 2
#880916 799960
# Then I tried this code (as I have sas catalog folder named format so I added that to my prior code; formats.sas7bcat) as follows
#===============================================================================
data<- read_sas("C:/Users/mmr2011/OneDrive/OneDrive/R codes/df_nsclc1.sas7bdat", catalog_file = "C:/Users/mmr2011/OneDrive/OneDrive/R codes/formats.sas7bcat")
# table(data$SEX)
# 1 2
#50190 66064
我正在使用 SAS 目录文件夹Windows
,如下所示(截图 5)。此外,它是可用的here
任何建议将不胜感激
我认为您遇到的问题很可能是误解了 R 标签的工作方式。
当我使用以下 SAS 代码时:
libname temp 'h:\temp\';
proc format lib=temp;
value sexf
1='Female'
2='Male'
;
value racef
1='Black'
2='Asian'
3='White'
4='Other'
;
value hispf
1='Of Hispanic Origin'
2='Not of Hispanic Origin'
;
quit;
options fmtsearch=(temp);
data temp.rtest;
input sex race hisp;
format sex sexf. race racef. hisp hispf.;
datalines;
1 1 1
2 1 1
1 2 1
2 2 1
1 3 1
2 3 1
1 4 1
2 4 1
1 1 2
2 1 2
1 2 2
2 2 2
1 3 2
2 3 2
1 4 2
2 4 2
;;;;
run;
And then use the following R code:
library(haven)
data <- read_sas("H:/temp/rtest.sas7bdat", catalog_file="H:/temp/formats.sas7bcat")
print(data)
It works as expected - the console prints the labelled text.
# A tibble: 16 x 3
sex race hisp
<dbl+lbl> <dbl+lbl> <dbl+lbl>
1 1 [Female] 1 [Black] 1 [Of Hispanic Origin]
2 2 [Male] 1 [Black] 1 [Of Hispanic Origin]
3 1 [Female] 2 [Asian] 1 [Of Hispanic Origin]
4 2 [Male] 2 [Asian] 1 [Of Hispanic Origin]
5 1 [Female] 3 [White] 1 [Of Hispanic Origin]
6 2 [Male] 3 [White] 1 [Of Hispanic Origin]
7 1 [Female] 4 [Other] 1 [Of Hispanic Origin]
8 2 [Male] 4 [Other] 1 [Of Hispanic Origin]
9 1 [Female] 1 [Black] 2 [Not of Hispanic Origin]
10 2 [Male] 1 [Black] 2 [Not of Hispanic Origin]
11 1 [Female] 2 [Asian] 2 [Not of Hispanic Origin]
12 2 [Male] 2 [Asian] 2 [Not of Hispanic Origin]
13 1 [Female] 3 [White] 2 [Not of Hispanic Origin]
14 2 [Male] 3 [White] 2 [Not of Hispanic Origin]
15 1 [Female] 4 [Other] 2 [Not of Hispanic Origin]
16 2 [Male] 4 [Other] 2 [Not of Hispanic Origin]
However, if I view it in RStudio's viewer by double-clicking on the dataset in the Data pane, it doesn't, and that is what you pasted into the question (a picture of that). I don't believe that's supported (variable labels are, meaning column header labels, but not value labels); if you want to verify that you may want to ask a new question specifically mentioning that, with the code here cleaned up (you're welcome to use my example code).
您可能想要做的是将值标签转换为factor
s。这可以通过几种方式完成;在标记的包文档中有一些关于为什么的讨论,这是您可以使用的一件事,但有几种方法。同样,如果您自己无法弄清楚,这将是一个很好的单独问题。因素是 R 通常如何管理此类事情(即分类变量)。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句