我有这种格式的数据
b1 1995 1
b1 2007 0.1
b2 1974 0.1
b2 1974 0.6
b2 1975 0.3
而且,我想根据第 1 列和第 2 列中的相似值对第 3 列中的值求和。
我写了一个代码,它总结了值,但我不知道如何打印组值。
use strict;
use warnings;
use Data::Dumper;
my $file=shift;
open (DATA, $file);
my %score_by_year;
while ( my $line = <DATA> )
{
my ($protein, $year, $score) = split /\s+/, $line;
$score_by_year{$year} +=$score;
print "$protein\t$year\t$score_by_year{$year}\n";
}
close DATA;
所以我的代码输出如下:
b1 1995 1
b1 2007 0.1
b2 1974 0.1
b2 1974 0.7
b2 1975 0.3
而预期的输出是这样的:
b1 1995 1
b1 2007 0.1
b2 1974 0.7
b2 1975 0.3
要保持序列,请将其存储:
use strict;
use warnings;
my @sequence;
my %scores_by_year;
while (<DATA>) {
my ($protein, $year, $score) = split;
if (not exists $scores_by_year{$protein}{$year}) {
push @sequence, [$protein, $year];
}
$scores_by_year{$protein}{$year} += $score;
}
for my $protein_year (@sequence) {
my($protein, $year)= @$protein_year;
print join("\t", $protein, $year, $scores_by_year{$protein}{$year}), "\n";
}
__DATA__
b1 1995 1
b1 2007 0.1
b2 1974 0.1
b2 1974 0.7
b2 1975 0.3
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句