在Python中看到了一个与此问题类似的答案,但没有看到R,因此为了冗余起见,因为python不在我的机舱内,所以没有找到答案。数据变量“ PublicFilings”包含多个值,我想将其拆分为4个新变量。下面列出了三个基本输出,但是判决,留置权和诉讼的计数将有不同的组合,不用说破产是肯定的,但是我想要那个二进制。对数据框的简单方法有何想法?可以将Id用作主键,将无数据的组合用作初始输出,无法使用逗号分隔,并希望将yes转换为二进制,这使我不知所措。
Existing Data
Id PublicFilings
1 Bankruptcies: No, Judgments: 0, Liens: 0, Suits: 0
2 Bankruptcies: Yes, Judgments: 0, Liens: 0, Suits: 0
3 No Data
没有数据表示与实体不匹配,也没有返回公共申请数据
Converted Data
Id Bankruptcies Judgments Liens Suits
1 0 0 0 0
2 1 0 0 0
3 Null Null Null Null
df1 <-
structure(list(TranId = 1:3,
Name = c("ACME Five,","ACME","WALMART"),
Check = c("1234","1234","1235"),
Entity = c("55555","55551","55556"),
Match =c("0","0","0"),
Score = c("50","60","NA"),
Date = c("2019-01-01", "2019-01-02","2019-01-02"),
PublicFilings = c("Bankruptcies: No, Judgments: 0, Liens: 10, Suits: 0",
"Bankruptcies: Yes, Judgments: 0, Liens: 0, Suits: 0",
"No Data"),
Controls =c("2015","2015","1998"),
NumEmpoyees = c("5","8","6"),
LOB = c("Retail, Food","Retail, Food","Retail, All"),
PayScore = c("40","42","NA"),
Primary = c("CEO","CEO","CFO"),
STARTYear = c("1982","1982","1965"),
SpecEvent = c("0","0","0"),
Filings =c("0","0","1"),
PayExp =c("","","1"
)), class = "data.frame", row.names = c(NA, -3L))
View(df1)
library(dplyr)
library(tidyr)
df1 %>%
separate_rows(PublicFilings, sep = ",\\s+") %>%
separate(PublicFilings, into = c("key", "value"), sep=":\\s+") %>%
mutate(key = na_if(key, "No Data"),
value = as.integer(value %in% c("Yes", "1"))) %>%
pivot_wider(names_from = key, values_from = value) %>%
select(-`NA`)
View(df1)
# A tibble: 3 x 20
TranId Name Check Entity Match Score Date Controls NumEmpoyees LOB PayScore Primary STARTYear SpecEvent Filings
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 ACME~ 1234 55555 0 50 2019~ 2015 5 Reta~ 40 CEO 1982 0 0
2 2 ACME 1234 55551 0 60 2019~ 2015 8 Reta~ 42 CEO 1982 0 0
3 3 WALM~ 1235 55556 0 NA 2019~ 1998 6 Reta~ NA CFO 1965 0 1
# ... with 5 more variables: PayExp <chr>, Bankruptcies <int>, Judgments <int>, Liens <int>, Suits <int>
Warning message:
Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [9].
一种选择是将'PublicFilings'拆分,
为'long'格式,然后使用创建两列,然后使用整形separate
为'wide'格式pivot_wider
library(dplyr)
library(tidyr)
df1 %>%
separate_rows(PublicFilings, sep = ",\\s+") %>%
separate(PublicFilings, into = c("key", "value"), sep=":\\s+") %>%
mutate(key = na_if(key, "No Data"),
value = as.integer(value %in% c("Yes", "1"))) %>%
pivot_wider(names_from = key, values_from = value) %>%
select(-`NA`)
# Id Bankruptcies Judgments Liens Suits
#1 1 0 0 0 0
#2 2 1 0 0 0
#3 3 NA NA NA NA
df1 <- structure(list(Id = 1:3, PublicFilings = c("Bankruptcies: No, Judgments: 0, Liens: 0, Suits: 0",
"Bankruptcies: Yes, Judgments: 0, Liens: 0, Suits: 0", "No Data"
)), class = "data.frame", row.names = c(NA, -3L))
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句