如何导入.dat文件?

托宾兹

我想在NBER网站上的ZIP文件faminc99.dat.Z中导入faminc99.dat:http://data.nber.org/psid/supp/

但是,我尝试了read.tableread.delim并使用了几种不同的方法sep,并且在导入的数据中始终只有1个变量。我不确定原因。谁能阐明它?

杰伊

除了@sindri_baldur的答案外,请使用faminc_99.zip包含same文件faminc99.dat好处是您可以直接在R中处理文件。

widths=是不同的,并找出值,而不码本是一个小法庭,但实际上还有的信息"Width:"在你的链接码本!我们还可以faminc99.dat在文本编辑器中打开并使用给定的48个可区分的值创建代表行,以获取矢量x

x <- c("06933"," 4"," 2"," 213800.0"," 200000.0"," 13800","     0","    0","    0","     0","xyrid","      0","htdck","      0","      0","     0","      0","0","0","0","     0","     0","xyrid","     0","asgnd","    0","xscid","    0","xscid","     0","asgnd","     0","asgnd","    0","asgnd","    0","asgnd","     0","asgnd","200000","asgnd","  0","  0"," 45"," 69"," 2667"," 0915"," 4.197")

我们可以算出哪些字符长度nchar

wdt <- nchar(x)

这给了wdt 我们可以插入的所需宽度read.fwf

## lk <- "http://data.nber.org/psid/supp/faminc99.dat.Z"  ## unknown .Z archive
lk <- "http://data.nber.org/psid/supp/faminc_99.zip"

temp <- tempfile()  ## open connection

download.file(lk, temp)
r <- read.fwf(unz(temp, "faminc99.dat"), wdt)

unlink(temp)  ## close connection

在字符串变量中,有些元素只有空格,我们可能想将其变成NA(我认为r[sapply(r, is.character)]这里不是绝对必要的)。

r[] <- lapply(r, function(z) {z[grep("^\\s*$", z)] <- NA;z})

结果

head(r)
#    V1 V2 V3    V4    V5   V6    V7 V8    V9 V10  V11  V12  V13 V14 V15  V16  V17
# 1  1 19 14 25500 25500    0     0  0     0   0 <NA>    0 <NA>   0   0    0    0
# 2  2 47 41 27060 22260 4800     0  0     0   0 <NA>    0 <NA>   0   0    0    0
# 3  3 47 41 11718     0 5400     0  0  6318   0 <NA>    0 <NA>   0   0    0    0
# 4  4 26 21 73928 73428  500     0  0     0   0 <NA> 7114 <NA>   0   0 3557 3557
# 5  5 37 32 32760 19000 4800     0  0  8960   0 <NA>    0 <NA>   0   0    0    0
# 6  6 29 24 30430  3000 3940 11085  0 12405   0 <NA>    0 <NA>   0   0    0    0
#   V18 V19 V20   V21   V22  V23 V24  V25 V26  V27 V28  V29 V30  V31 V32  V33 V34
# 1   0   0   0 25000 25000 <NA>   0 <NA>   0 <NA>   0 <NA>   0 <NA>   0 <NA>   0
# 2   0   0   0 20900 20800 <NA> 100 <NA>   0 <NA>   0 <NA>   0 <NA>   0 <NA>   0
# 3   0   0   0     0     0 <NA>   0 <NA>   0 <NA>   0 <NA>   0 <NA>   0 <NA>   0
# 4   1   0   1 56970 56970 <NA>   0 <NA>   0 <NA>   0 <NA>   0 <NA>   0 <NA>   0
# 5   0   0   0     0     0 <NA>   0 <NA>   0 <NA>   0 <NA>   0 <NA>   0 <NA>   0
# 6   0   0   0     0     0 <NA>   0 <NA>   0 <NA>   0 <NA>   0 <NA>   0 <NA>   0
#    V35 V36  V37 V38  V39   V40  V41 V42 V43 V44 V45  V46   V47    V48
# 1 <NA>   0 <NA>   0 <NA>     0 <NA> 430  69   0   0 1922  8480 25.278
# 2 <NA>   0 <NA>   0 <NA>  1360 <NA> 310 628 663 319 4087 17088 18.344
# 3 <NA>   0 <NA>   0 <NA>     0 <NA>   0   0   0   0 2338  9344 25.921
# 4 <NA>   0 <NA>   0 <NA>  9344 <NA> 152 219 281 339 4283 19453 15.649
# 5 <NA>   0 <NA>   0 <NA> 15000 <NA>   0   0 372 857 2476  9853 34.565
# 6 <NA>   0 <NA>   0 <NA>     0 <NA>   0   0   0   0 6360 27715  5.825

哪里

str(r)
# 'data.frame': 6997 obs. of  48 variables:
#   $ V1 : int  1 2 3 4 5 6 7 8 9 10 ...
# $ V2 : int  19 47 47 26 37 29 6 8 39 45 ...
# $ V3 : int  14 41 41 21 32 24 4 5 34 39 ...
# $ V4 : num  25500 27060 11718 73928 32760 ...
# $ V5 : num  25500 22260 0 73428 19000 ...
# $ V6 : int  0 4800 5400 500 4800 3940 0 200 0 0 ...
# $ V7 : int  0 0 0 0 0 11085 0 0 0 0 ...
# $ V8 : int  0 0 0 0 0 0 0 0 0 0 ...
# $ V9 : int  0 0 6318 0 8960 12405 0 0 8760 0 ...
# $ V10: int  0 0 0 0 0 0 0 0 0 0 ...
# $ V11: chr  NA NA NA NA ...
# $ V12: int  0 0 0 7114 0 0 0 0 0 0 ...
# $ V13: chr  NA NA NA NA ...
# $ V14: int  0 0 0 0 0 0 0 0 0 0 ...
# $ V15: int  0 0 0 0 0 0 0 0 0 0 ...
# $ V16: int  0 0 0 3557 0 0 0 0 0 0 ...
# $ V17: int  0 0 0 3557 0 0 0 0 0 0 ...
# $ V18: int  0 0 0 1 0 0 0 0 0 0 ...
# $ V19: int  0 0 0 0 0 0 0 0 0 0 ...
# $ V20: int  0 0 0 1 0 0 0 0 0 0 ...
# $ V21: int  25000 20900 0 56970 0 0 34000 24825 0 34500 ...
# $ V22: int  25000 20800 0 56970 0 0 28000 24825 0 34500 ...
# $ V23: chr  NA NA NA NA ...
# $ V24: int  0 100 0 0 0 0 0 0 0 0 ...
# $ V25: chr  NA NA NA NA ...
# $ V26: int  0 0 0 0 0 0 0 0 0 0 ...
# $ V27: chr  NA NA NA NA ...
# $ V28: int  0 0 0 0 0 0 0 0 0 0 ...
# $ V29: chr  NA NA NA NA ...
# $ V30: int  0 0 0 0 0 0 0 0 0 0 ...
# $ V31: chr  NA NA NA NA ...
# $ V32: int  0 0 0 0 0 0 3000 0 0 0 ...
# $ V33: chr  NA NA NA NA ...
# $ V34: int  0 0 0 0 0 0 0 0 0 0 ...
# $ V35: chr  NA NA NA NA ...
# $ V36: int  0 0 0 0 0 0 3000 0 0 0 ...
# $ V37: chr  NA NA NA NA ...
# $ V38: int  0 0 0 0 0 0 0 0 0 0 ...
# $ V39: chr  NA NA NA NA ...
# $ V40: int  0 1360 0 9344 15000 0 0 0 0 0 ...
# $ V41: chr  NA NA NA NA ...
# $ V42: int  430 310 0 152 0 0 706 14 0 133 ...
# $ V43: int  69 628 0 219 0 0 398 187 0 858 ...
# $ V44: int  0 663 0 281 372 0 984 0 0 0 ...
# $ V45: int  0 319 0 339 857 0 769 0 0 0 ...
# $ V46: int  1922 4087 2338 4283 2476 6360 2667 2932 1648 3268 ...
# $ V47: int  8480 17088 9344 19453 9853 27715 10915 13120 7818 16246 ...
# $ V48: num  25.3 18.3 25.9 15.6 34.6 ...

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章