R按列中的新行拆分数据框

黑帽

我正在尝试用换行符“ \ n”拆分列中的字符串。这是一个数据框sample_data:

 test_data <- data.frame(ID=c('[email protected]', '[email protected]'),
                  Changes=c('3 max cost changes
  productxyz > pb100  > a : Max cost decreased from $0.98 to $0.83
  productxyz > pb2  > a : Max cost decreased from $1.07 to $0.91
  productxyz > pb2  > b : Max cost decreased from $0.65 to $0.55', 
                            '2 max cost changes
  productabc > pb1000  > d : Max cost decreased from $1.07 to $0.91
  productabc > pb1000  > x : Max cost decreased from $1.44 to $1.22'), stringsAsFactors=FALSE)

我的目标是将价格提取到各列中并获得如下结果集:

ID              Prev_Price    New_Price
[email protected]     $0.98            $0.83
[email protected]     $1.07            $0.91
[email protected]     $0.65            $0.55
[email protected]    $1.07            $0.91
[email protected]    $1.44            $1.22

我已经尝试过使用tidyr软件包,但是我的结果充满了N / A。

vars <- c("Prev_Price","New_Price")
seperate(sample_data, Changes, into = vars, sep = "[A-Za-z]+from", extra= "drop")

任何帮助将非常感激。

谢谢!

阿克伦

尝试

df1$ID <- df1$ID[df1$ID!=''][cumsum(df1$ID!='')]
library(stringi)
setNames(data.frame(df1$ID, do.call(rbind,stri_extract_all(df1$Changes, 
       regex='\\$\\d*'))), c('ID', 'Prev_Price', 'New_Price'))
 #   ID Prev_Price New_Price
 #1  A        $20       $10
 #2  A        $11       $10
 #3  B        $13       $12
 #4  B        $15       $12

或者

library(tidyr)
extract(df1, Changes, into=c('Prev_Price', 'New_Price'), 
          '[^$]*(\\$\\d*)[^$]*(\\$\\d*)')
#   ID Prev_Price New_Price
#1  A        $20       $10
#2  A        $11       $10
#3  B        $13       $12
#4  B        $15       $12

或者

library(data.table)#v1.9.5+
setDT(df1)[, c('Prev_Price', 'New_Price') := tstrsplit(Changes, 
                                 '[A-Za-z ]+')[-1]][]
#   ID              Changes Prev_Price New_Price
#1:  A down from $20 to $10        $20       $10
#2:  A down from $11 to $10        $11       $10
#3:  B down from $13 to $12        $13       $12
#4:  B down from $15 to $12        $15       $12

注意:可以删除“更改”

或仅使用base R方法

data.frame(ID=df1$ID, read.table(text=gsub('[^$]*(\\$\\d+)', ' \\1 ', 
   df1$Changes),col.names=c('Prev_Price', 'New_Price'), 
                    stringsAsFactors=FALSE))
 #   ID Prev_Price New_Price
 #1  A        $20       $10
 #2  A        $11       $10
 #3  B        $13       $12
 #4  B        $15       $12

更新

如果元素位于同一单元格中,则一种选择是使用data.tableie的精简版本v1.9.5 +。可以从以下位置安装here

在这里,我们使用相同的代码到“更改”(分裂tstrsplit(Changes,..)),然后melt通过指定的输出以长格式measure.vars作为一个list,并且如果需要order通过“ID”和删除不需要的列(“变量”)。

 melt(
   setDT(df2)[, paste0('V',1:4) := tstrsplit(Changes,
           '[A-Za-z ]+')[-1]][,-2, with=FALSE],
      id.var='ID', measure=list(c('V1', 'V3'), c('V2', 'V4')), 
        value.name=c('Prev_Price', 'New_Price'))[order(ID)][, variable:=NULL]
  #    ID Prev_Price New_Price
  #1:  A        $20       $10
  #2:  A        $11       $10
  #3:  B        $13       $12
  #4:  B        $15       $12

或者我们可以gsub像以前一样使用,然后long使用reshapefrom转换为格式base R

 d1 <- data.frame(ID=df2$ID,read.table(text=gsub('[^$]*(\\$\\d+)',
                 ' \\1 ', df2$Changes)))

colnames(d1)[-1] <- paste0(c('Prev_Price.', 'New_Price.'), 
                          rep(1:2,each=2))
reshape(d1, idvar='ID', varying=2:ncol(d1), sep=".", direction='long')
#    ID time Prev_Price New_Price
#A.1  A    1        $20       $10
#B.1  B    1        $13       $12
#A.2  A    2        $11       $10
#B.2  B    2        $15       $12

更新2

对于新的数据集(“ df3”),我们可以使用stri_extract_all_regex提取“ Changes”列$后跟数字,包括小数点('\\$[0-9.]+'),用于Map将第一列与list我们stri_extract_all_regex将输出更改为matrix(因为我们需要将交替元素放在不同的列中),然后单击rbinddo.call(rbind,)。

library(stringi)
res <- do.call(rbind,
       Map(function(x,y) data.frame(x,matrix(y, ncol=2, byrow=TRUE, 
           dimnames=list(NULL, c("Prev_Price", "New_Price")))),
        df3$ID, stri_extract_all_regex(df3$Changes, '\\$[0-9.]+')))
row.names(res) <- NULL
res
#              x Prev_Price New_Price
#1  [email protected]      $0.98     $0.83
#2  [email protected]      $1.07     $0.91
#3  [email protected]      $0.65     $0.55
#4 [email protected]      $1.07     $0.91
#5 [email protected]      $1.44     $1.22

数据

df1 <- structure(list(ID = c("A", "", "B", ""), 
 Changes = c("down from $20 to $10", 
"down from $11 to $10", "down from $13 to $12", "down from $15 to $12"
)), .Names = c("ID", "Changes"), class = "data.frame", 
row.names = c(NA, -4L))

df2 <- data.frame(ID=c('A', 'B'),
   Changes=c('down from $20 to $10 down from $11 to $10', 
  'down from $13 to $12 down from $15 to $12'), stringsAsFactors=FALSE)

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章