我想在NBER网站上的ZIP文件faminc99.dat.Z中导入faminc99.dat:http://data.nber.org/psid/supp/
但是,我尝试了read.table
,read.delim
并使用了几种不同的方法sep
,并且在导入的数据中始终只有1个变量。我不确定原因。谁能阐明它?
除了@sindri_baldur的答案外,请使用faminc_99.zip
包含same的文件faminc99.dat
。好处是您可以直接在R中处理文件。
该widths=
是不同的,并找出值,而不码本是一个小法庭,但实际上还有的信息"Width:"
在你的链接码本!我们还可以faminc99.dat
在文本编辑器中打开并使用给定的48个可区分的值创建代表行,以获取矢量x
,
x <- c("06933"," 4"," 2"," 213800.0"," 200000.0"," 13800"," 0"," 0"," 0"," 0","xyrid"," 0","htdck"," 0"," 0"," 0"," 0","0","0","0"," 0"," 0","xyrid"," 0","asgnd"," 0","xscid"," 0","xscid"," 0","asgnd"," 0","asgnd"," 0","asgnd"," 0","asgnd"," 0","asgnd","200000","asgnd"," 0"," 0"," 45"," 69"," 2667"," 0915"," 4.197")
我们可以算出哪些字符长度nchar
,
wdt <- nchar(x)
这给了wdt
我们可以插入的所需宽度read.fwf
。
## lk <- "http://data.nber.org/psid/supp/faminc99.dat.Z" ## unknown .Z archive
lk <- "http://data.nber.org/psid/supp/faminc_99.zip"
temp <- tempfile() ## open connection
download.file(lk, temp)
r <- read.fwf(unz(temp, "faminc99.dat"), wdt)
unlink(temp) ## close connection
在字符串变量中,有些元素只有空格,我们可能想将其变成NA
。(我认为r[sapply(r, is.character)]
这里不是绝对必要的)。
r[] <- lapply(r, function(z) {z[grep("^\\s*$", z)] <- NA;z})
head(r)
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17
# 1 1 19 14 25500 25500 0 0 0 0 0 <NA> 0 <NA> 0 0 0 0
# 2 2 47 41 27060 22260 4800 0 0 0 0 <NA> 0 <NA> 0 0 0 0
# 3 3 47 41 11718 0 5400 0 0 6318 0 <NA> 0 <NA> 0 0 0 0
# 4 4 26 21 73928 73428 500 0 0 0 0 <NA> 7114 <NA> 0 0 3557 3557
# 5 5 37 32 32760 19000 4800 0 0 8960 0 <NA> 0 <NA> 0 0 0 0
# 6 6 29 24 30430 3000 3940 11085 0 12405 0 <NA> 0 <NA> 0 0 0 0
# V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34
# 1 0 0 0 25000 25000 <NA> 0 <NA> 0 <NA> 0 <NA> 0 <NA> 0 <NA> 0
# 2 0 0 0 20900 20800 <NA> 100 <NA> 0 <NA> 0 <NA> 0 <NA> 0 <NA> 0
# 3 0 0 0 0 0 <NA> 0 <NA> 0 <NA> 0 <NA> 0 <NA> 0 <NA> 0
# 4 1 0 1 56970 56970 <NA> 0 <NA> 0 <NA> 0 <NA> 0 <NA> 0 <NA> 0
# 5 0 0 0 0 0 <NA> 0 <NA> 0 <NA> 0 <NA> 0 <NA> 0 <NA> 0
# 6 0 0 0 0 0 <NA> 0 <NA> 0 <NA> 0 <NA> 0 <NA> 0 <NA> 0
# V35 V36 V37 V38 V39 V40 V41 V42 V43 V44 V45 V46 V47 V48
# 1 <NA> 0 <NA> 0 <NA> 0 <NA> 430 69 0 0 1922 8480 25.278
# 2 <NA> 0 <NA> 0 <NA> 1360 <NA> 310 628 663 319 4087 17088 18.344
# 3 <NA> 0 <NA> 0 <NA> 0 <NA> 0 0 0 0 2338 9344 25.921
# 4 <NA> 0 <NA> 0 <NA> 9344 <NA> 152 219 281 339 4283 19453 15.649
# 5 <NA> 0 <NA> 0 <NA> 15000 <NA> 0 0 372 857 2476 9853 34.565
# 6 <NA> 0 <NA> 0 <NA> 0 <NA> 0 0 0 0 6360 27715 5.825
哪里
str(r)
# 'data.frame': 6997 obs. of 48 variables:
# $ V1 : int 1 2 3 4 5 6 7 8 9 10 ...
# $ V2 : int 19 47 47 26 37 29 6 8 39 45 ...
# $ V3 : int 14 41 41 21 32 24 4 5 34 39 ...
# $ V4 : num 25500 27060 11718 73928 32760 ...
# $ V5 : num 25500 22260 0 73428 19000 ...
# $ V6 : int 0 4800 5400 500 4800 3940 0 200 0 0 ...
# $ V7 : int 0 0 0 0 0 11085 0 0 0 0 ...
# $ V8 : int 0 0 0 0 0 0 0 0 0 0 ...
# $ V9 : int 0 0 6318 0 8960 12405 0 0 8760 0 ...
# $ V10: int 0 0 0 0 0 0 0 0 0 0 ...
# $ V11: chr NA NA NA NA ...
# $ V12: int 0 0 0 7114 0 0 0 0 0 0 ...
# $ V13: chr NA NA NA NA ...
# $ V14: int 0 0 0 0 0 0 0 0 0 0 ...
# $ V15: int 0 0 0 0 0 0 0 0 0 0 ...
# $ V16: int 0 0 0 3557 0 0 0 0 0 0 ...
# $ V17: int 0 0 0 3557 0 0 0 0 0 0 ...
# $ V18: int 0 0 0 1 0 0 0 0 0 0 ...
# $ V19: int 0 0 0 0 0 0 0 0 0 0 ...
# $ V20: int 0 0 0 1 0 0 0 0 0 0 ...
# $ V21: int 25000 20900 0 56970 0 0 34000 24825 0 34500 ...
# $ V22: int 25000 20800 0 56970 0 0 28000 24825 0 34500 ...
# $ V23: chr NA NA NA NA ...
# $ V24: int 0 100 0 0 0 0 0 0 0 0 ...
# $ V25: chr NA NA NA NA ...
# $ V26: int 0 0 0 0 0 0 0 0 0 0 ...
# $ V27: chr NA NA NA NA ...
# $ V28: int 0 0 0 0 0 0 0 0 0 0 ...
# $ V29: chr NA NA NA NA ...
# $ V30: int 0 0 0 0 0 0 0 0 0 0 ...
# $ V31: chr NA NA NA NA ...
# $ V32: int 0 0 0 0 0 0 3000 0 0 0 ...
# $ V33: chr NA NA NA NA ...
# $ V34: int 0 0 0 0 0 0 0 0 0 0 ...
# $ V35: chr NA NA NA NA ...
# $ V36: int 0 0 0 0 0 0 3000 0 0 0 ...
# $ V37: chr NA NA NA NA ...
# $ V38: int 0 0 0 0 0 0 0 0 0 0 ...
# $ V39: chr NA NA NA NA ...
# $ V40: int 0 1360 0 9344 15000 0 0 0 0 0 ...
# $ V41: chr NA NA NA NA ...
# $ V42: int 430 310 0 152 0 0 706 14 0 133 ...
# $ V43: int 69 628 0 219 0 0 398 187 0 858 ...
# $ V44: int 0 663 0 281 372 0 984 0 0 0 ...
# $ V45: int 0 319 0 339 857 0 769 0 0 0 ...
# $ V46: int 1922 4087 2338 4283 2476 6360 2667 2932 1648 3268 ...
# $ V47: int 8480 17088 9344 19453 9853 27715 10915 13120 7818 16246 ...
# $ V48: num 25.3 18.3 25.9 15.6 34.6 ...
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句