我想合并几列并创建一个包含列表(或类似python中的字典)的列,这些列用分号分隔。
基本上,我有这个数据框:(空值是缺失值)
ID Event Category Start Time End Time Account No. Dosage Doctor's_ID
1 Stroke 1/1/2011
1 Admitted 1/6/2011 24287939 5487
1 Diagnosed 1/25/2011
6 Diagnosed 1/1/2011
6 Drug A 1/2/2011 1/10/2011 "high"
6 Drug B 1/7/2011 1/20/2011 35287930 "medium"
10 Drug A 1/3/2011 1/6/2011 "low"
10 Drug B 1/9/2011 1/13/2011 "high"
10 Stroke 1/8/2011
我想创建一个attribute
合并几列和分号分隔符内的列的列。
输出文件(可以是文本文件)看起来是:
ID Event Category Start Time End Time attributes
1 Stroke 1/1/2011
1 Admitted 1/6/2011 Account No.="24287939"; Doctor's_ID="5487"
1 Diagnosed 1/25/2011
6 Diagnosed 1/1/2011
6 Drug A 1/2/2011 1/10/2011 Dosage="high"
6 Drug B 1/7/2011 1/20/2011 Account No.="35287930"; Dosage="medium"
10 Drug A 1/3/2011 1/6/2011 Dosage="low"
10 Drug B 1/9/2011 1/13/2011 Dosage="high"
10 Stroke 1/8/2011
我的目的是编写一个文本文件,其中各列由制表符分隔符(“ \ t”)和属性数据(最后一列)分隔,就像列表由“;”分隔。
有关所需输出的更多详细信息,请参见此处http://www.cs.umd.edu/hcil/eventflow/manual/chapter_start.html#1.4
我如何在R中做到这一点?
一种选择是使用apply
函数并为最后3列传递按行数据。好的方面apply
是,行数据作为与列名匹配的named-vector
地方传递给函数name
。
现在,必须先使用结合name
使用value
named-vector paste
,然后再使用collapse=";"
function的参数合并到一个字符串中paste0
。解决方案将是:
cbind(df[1:4],Attribute =
apply(df[,5:7],1, function(x)paste0(paste(names(x[!is.na(x)]),x[!is.na(x)], sep = "="),
collapse = ";")))
# ID Event.Category Start.Time End.Time Attribute
# 1 1 Stroke 1/1/2011 <NA>
# 2 1 Admitted 1/6/2011 <NA> Account.No.=24287939;Doctor.s_ID=5487
# 3 1 Diagnosed 1/25/2011 <NA>
# 4 6 Diagnosed 1/1/2011 <NA>
# 5 6 Drug A 1/2/2011 1/10/2011 Dosage=high
# 6 6 Drug B 1/7/2011 1/20/2011 Account.No.=35287930;Dosage=medium
# 7 10 Drug A 1/3/2011 1/6/2011 Dosage=low
# 8 10 Drug B 1/9/2011 1/13/2011 Dosage=high
# 9 10 Stroke 1/8/2011 <NA>
数据:
df <- read.table(text =
'ID "Event Category" "Start Time" "End Time" "Account No." Dosage Doctor\'s_ID
1 Stroke 1/1/2011 NA NA NA NA
1 Admitted 1/6/2011 NA 24287939 NA 5487
1 Diagnosed 1/25/2011 NA NA NA NA
6 Diagnosed 1/1/2011 NA NA NA NA
6 "Drug A" 1/2/2011 1/10/2011 NA "high" NA
6 "Drug B" 1/7/2011 1/20/2011 35287930 "medium" NA
10 "Drug A" 1/3/2011 1/6/2011 NA "low" NA
10 "Drug B" 1/9/2011 1/13/2011 NA "high" NA
10 Stroke 1/8/2011 NA NA NA NA',
stringsAsFactors = FALSE, header = TRUE)
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句