如何在R中将字符串变量解析或拆分为多个新变量

瑞奇

在Python中看到了一个与此问题类似的答案,但没有看到R,因此为了冗余起见,因为python不在我的机舱内,所以没有找到答案。数据变量“ PublicFilings”包含多个值,我想将其拆分为4个新变量。下面列出了三个基本输出,但是判决,留置权和诉讼的计数将有不同的组合,不用说破产是肯定的,但是我想要那个二进制。对数据框的简单方法有何想法?可以将Id用作主键,将无数据的组合用作初始输出,无法使用逗号分隔,并希望将yes转换为二进制,这使我不知所措。

Existing Data 
Id   PublicFilings 
1    Bankruptcies: No, Judgments: 0, Liens: 0, Suits: 0 
2    Bankruptcies: Yes, Judgments: 0, Liens: 0, Suits: 0 
3    No Data

没有数据表示与实体不匹配,也没有返回公共申请数据

Converted Data 
Id Bankruptcies Judgments Liens Suits 
1  0             0         0     0 
2  1             0         0     0 
3 Null           Null      Null  Null



   df1 <- 
  structure(list(TranId = 1:3, 
                 Name = c("ACME Five,","ACME","WALMART"),
                 Check = c("1234","1234","1235"), 
                 Entity = c("55555","55551","55556"),
                 Match =c("0","0","0"),
                 Score = c("50","60","NA"),
                 Date = c("2019-01-01", "2019-01-02","2019-01-02"),
                 PublicFilings = c("Bankruptcies: No, Judgments: 0, Liens: 10, Suits: 0", 
                                   "Bankruptcies: Yes, Judgments: 0, Liens: 0, Suits: 0", 
                                   "No Data"),
                 Controls =c("2015","2015","1998"),
                 NumEmpoyees = c("5","8","6"),
                 LOB = c("Retail, Food","Retail, Food","Retail, All"),
                 PayScore = c("40","42","NA"),
                 Primary = c("CEO","CEO","CFO"),
                 STARTYear = c("1982","1982","1965"),
                 SpecEvent = c("0","0","0"),
                 Filings =c("0","0","1"),
                 PayExp =c("","","1"
                 )), class = "data.frame", row.names = c(NA, -3L))

View(df1)


library(dplyr)
library(tidyr)
df1 %>%
  separate_rows(PublicFilings, sep = ",\\s+") %>%
  separate(PublicFilings, into = c("key", "value"), sep=":\\s+") %>%
  mutate(key = na_if(key, "No Data"),
         value = as.integer(value %in%  c("Yes", "1"))) %>%
  pivot_wider(names_from = key, values_from = value) %>%
  select(-`NA`)
View(df1)

    # A tibble: 3 x 20
  TranId Name  Check Entity Match Score Date  Controls NumEmpoyees LOB   PayScore Primary STARTYear SpecEvent Filings
   <int> <chr> <chr> <chr>  <chr> <chr> <chr> <chr>    <chr>       <chr> <chr>    <chr>   <chr>     <chr>     <chr>  
1      1 ACME~ 1234  55555  0     50    2019~ 2015     5           Reta~ 40       CEO     1982      0         0      
2      2 ACME  1234  55551  0     60    2019~ 2015     8           Reta~ 42       CEO     1982      0         0      
3      3 WALM~ 1235  55556  0     NA    2019~ 1998     6           Reta~ NA       CFO     1965      0         1      
# ... with 5 more variables: PayExp <chr>, Bankruptcies <int>, Judgments <int>, Liens <int>, Suits <int>
Warning message:
Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [9].
阿克伦

一种选择是将'PublicFilings'拆分,为'long'格式,然后使用创建两列,然后使用整形separate为'wide'格式pivot_wider

library(dplyr)
library(tidyr)
df1 %>%
     separate_rows(PublicFilings, sep = ",\\s+") %>%
     separate(PublicFilings, into = c("key", "value"), sep=":\\s+") %>%
     mutate(key = na_if(key, "No Data"),
           value = as.integer(value %in%  c("Yes", "1"))) %>%
     pivot_wider(names_from = key, values_from = value) %>%
     select(-`NA`)
#    Id Bankruptcies Judgments Liens Suits
#1  1            0         0     0     0
#2  2            1         0     0     0
#3  3           NA        NA    NA    NA

数据

df1 <- structure(list(Id = 1:3, PublicFilings = c("Bankruptcies: No, Judgments: 0, Liens: 0, Suits: 0", 
"Bankruptcies: Yes, Judgments: 0, Liens: 0, Suits: 0", "No Data"
)), class = "data.frame", row.names = c(NA, -3L))

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

如何在R中将字符串变量拆分为n个变量

如何在PHP中将单词字符串拆分为单独的字符串变量?

如何在bash shell中将一个字符串拆分为多个变量?

如何在C#中将字符串拆分为两个单独的变量

在JavaScript中将单个字符串拆分为多个单独的变量

如何在 TypeScript 中将一个字符串拆分为两个变量?

将字符串拆分为多个可变的变量

将多个定界符定义的字符串部分拆分为R中的多个变量

如何在Snowflake中将字符串拆分为字符?

将字符串拆分为 r 中的六个新变量

如何在SQLite3命令行中将字符串拆分为新列?

如何在Java中将字符串从3维数组拆分为新的2维数组

如何在SQL Server中将字符串拆分为多个

如何在bash中将字符串拆分为参数?

如何在bash中将字符串拆分为列

如何在bash中将字符串拆分为多行

如何在 Python 中将字符串拆分为列表?

如何在bash中将字符串拆分为数组

如何在MATLAB中将字符串拆分为字母?

如何在Java中将字符串拆分为列

如何将字符串拆分为变量

将字符串拆分为变量

如何在R中将字符串评估为变量?

将字符串变量拆分为多个变量,并在SPSS中使用暂存变量

如何在R中将字符串拆分为规则间隔?

在R中将一个变量拆分为多个变量

如何在JavaScript中将一个变量拆分为多个变量?

关于如何在Swift中将字符串拆分为所需字符串数组的问题

C中将字符串拆分为两个变量