R按列中的新行拆分数据框

黑帽

我正在尝试用换行符“ \ n”拆分列中的字符串。这是一个数据框sample_data：

 test_data <- data.frame(ID=c('[email protected]', '[email protected]'),
                  Changes=c('3 max cost changes
  productxyz > pb100  > a : Max cost decreased from $0.98 to $0.83
  productxyz > pb2  > a : Max cost decreased from $1.07 to $0.91
  productxyz > pb2  > b : Max cost decreased from $0.65 to $0.55', 
                            '2 max cost changes
  productabc > pb1000  > d : Max cost decreased from $1.07 to $0.91
  productabc > pb1000  > x : Max cost decreased from $1.44 to $1.22'), stringsAsFactors=FALSE)

我的目标是将价格提取到各列中并获得如下结果集：

ID              Prev_Price    New_Price
[email protected]     $0.98            $0.83
[email protected]     $1.07            $0.91
[email protected]     $0.65            $0.55
[email protected]    $1.07            $0.91
[email protected]    $1.44            $1.22

我已经尝试过使用tidyr软件包，但是我的结果充满了N / A。

vars <- c("Prev_Price","New_Price")
seperate(sample_data, Changes, into = vars, sep = "[A-Za-z]+from", extra= "drop")

任何帮助将非常感激。

谢谢！

阿克伦

尝试

df1$ID <- df1$ID[df1$ID!=''][cumsum(df1$ID!='')]
library(stringi)
setNames(data.frame(df1$ID, do.call(rbind,stri_extract_all(df1$Changes, 
       regex='\\$\\d*'))), c('ID', 'Prev_Price', 'New_Price'))
 #   ID Prev_Price New_Price
 #1  A        $20       $10
 #2  A        $11       $10
 #3  B        $13       $12
 #4  B        $15       $12

或者

library(tidyr)
extract(df1, Changes, into=c('Prev_Price', 'New_Price'), 
          '[^$]*(\\$\\d*)[^$]*(\\$\\d*)')
#   ID Prev_Price New_Price
#1  A        $20       $10
#2  A        $11       $10
#3  B        $13       $12
#4  B        $15       $12

或者

library(data.table)#v1.9.5+
setDT(df1)[, c('Prev_Price', 'New_Price') := tstrsplit(Changes, 
                                 '[A-Za-z ]+')[-1]][]
#   ID              Changes Prev_Price New_Price
#1:  A down from $20 to $10        $20       $10
#2:  A down from $11 to $10        $11       $10
#3:  B down from $13 to $12        $13       $12
#4:  B down from $15 to $12        $15       $12

注意：可以删除“更改”

或仅使用base R方法

data.frame(ID=df1$ID, read.table(text=gsub('[^$]*(\\$\\d+)', ' \\1 ', 
   df1$Changes),col.names=c('Prev_Price', 'New_Price'), 
                    stringsAsFactors=FALSE))
 #   ID Prev_Price New_Price
 #1  A        $20       $10
 #2  A        $11       $10
 #3  B        $13       $12
 #4  B        $15       $12

更新

如果元素位于同一单元格中，则一种选择是使用data.tableie的精简版本。v1.9.5 +。可以从以下位置安装here

在这里，我们使用相同的代码到“更改”（分裂tstrsplit(Changes,..)），然后melt通过指定的输出以长格式measure.vars作为一个list，并且如果需要order通过“ID”和删除不需要的列（“变量”）。

 melt(
   setDT(df2)[, paste0('V',1:4) := tstrsplit(Changes,
           '[A-Za-z ]+')[-1]][,-2, with=FALSE],
      id.var='ID', measure=list(c('V1', 'V3'), c('V2', 'V4')), 
        value.name=c('Prev_Price', 'New_Price'))[order(ID)][, variable:=NULL]
  #    ID Prev_Price New_Price
  #1:  A        $20       $10
  #2:  A        $11       $10
  #3:  B        $13       $12
  #4:  B        $15       $12

或者我们可以gsub像以前一样使用，然后long使用reshapefrom转换为格式base R

 d1 <- data.frame(ID=df2$ID,read.table(text=gsub('[^$]*(\\$\\d+)',
                 ' \\1 ', df2$Changes)))

colnames(d1)[-1] <- paste0(c('Prev_Price.', 'New_Price.'), 
                          rep(1:2,each=2))
reshape(d1, idvar='ID', varying=2:ncol(d1), sep=".", direction='long')
#    ID time Prev_Price New_Price
#A.1  A    1        $20       $10
#B.1  B    1        $13       $12
#A.2  A    2        $11       $10
#B.2  B    2        $15       $12

更新2

对于新的数据集（“ df3”），我们可以使用stri_extract_all_regex提取“ Changes”列的$后跟数字，包括小数点（'\\$[0-9.]+'），用于Map将第一列与list我们stri_extract_all_regex将输出更改为matrix（因为我们需要将交替元素放在不同的列中），然后单击rbind（do.call(rbind,）。

library(stringi)
res <- do.call(rbind,
       Map(function(x,y) data.frame(x,matrix(y, ncol=2, byrow=TRUE, 
           dimnames=list(NULL, c("Prev_Price", "New_Price")))),
        df3$ID, stri_extract_all_regex(df3$Changes, '\\$[0-9.]+')))
row.names(res) <- NULL
res
#              x Prev_Price New_Price
#1  [email protected]      $0.98     $0.83
#2  [email protected]      $1.07     $0.91
#3  [email protected]      $0.65     $0.55
#4 [email protected]      $1.07     $0.91
#5 [email protected]      $1.44     $1.22

数据

df1 <- structure(list(ID = c("A", "", "B", ""), 
 Changes = c("down from $20 to $10", 
"down from $11 to $10", "down from $13 to $12", "down from $15 to $12"
)), .Names = c("ID", "Changes"), class = "data.frame", 
row.names = c(NA, -4L))

df2 <- data.frame(ID=c('A', 'B'),
   Changes=c('down from $20 to $10 down from $11 to $10', 
  'down from $13 to $12 down from $15 to $12'), stringsAsFactors=FALSE)

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-03-30

我来说两句

0 条评论

登录后参与评论

TOP 榜单

文章

R按列中的新行拆分数据框

R按列中的新行拆分数据框

更新

更新2

数据

UITableView的项目向下滚动后更改颜色，然后快速备份

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Mac OS X更新后的GRUB 2问题

验证REST API参数

Java Eclipse中的错误13，如何解决？

带有错误“ where”条件的查询如何返回结果？

ggplot：对齐多个分面图-所有大小不同的分面

尝试反复更改屏幕上按钮的位置 - kotlin android studio

如何从视图一次更新多行（ASP.NET - Core）

计算数据帧中每行的NA

蓝屏死机没有修复解决方案

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

离子动态工具栏背景色

VB.net将2条特定行导出到DataGridView

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

在Windows 7中无法删除文件（2）

python中的boto3文件上传

当我尝试下载 StanfordNLP en 模型时，出现错误

Node.js中未捕获的异常错误，发生调用