Mapreduce:平均计算不起作用

fnas

我正在尝试使用此 map reduce 代码来计算平均值,但由于某种原因,平均值计算不正确。这个想法是计算每年的平均电影评分

映射器代码

public class AverageRatingMapper extends Mapper<LongWritable, Text, Text, DoubleWritable>
{
    //initialize the writable datatype variables
    private final static DoubleWritable tempWritable = new DoubleWritable(0);
    private Text ReleaseYear = new Text();
    
    //Override the original map methods
    @Override
    //map takes in three parameters
    public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException
    {
        //creating an array called line of type String
        //Split the values in the fields and store each value in one array element
        String[] line = value.toString().split("\t");
        
        //create a variable ID with type String and store the the 4th element (index 3) from array line. This contains the Year in th data file
        String Year = line[3];
        
        //Set the value of the year object created from the Text class to be the value of the Year read from the data file
        ReleaseYear.set(Year);
        
        //Create a variable, temp, of type double and convert the value stored in the 15th element (14th index) of the line array from String to Double and store it in temp
        double temp = Double.parseDouble(line[14].trim());
        
        //Store temp in tempWritable
        tempWritable.set(temp);
        
        //Emit Year and the average rating in tempWritable to the Reducer class
        context.write(ReleaseYear, tempWritable);
        
        
    }

}

减速机代码:

public class AverageRatingReducer extends Reducer<Text, DoubleWritable, Text, DoubleWritable>
{
    //Create an arraylist of type double called ratingList
    ArrayList<Double> ratingList = new ArrayList<Double>();
    
    //Override reduce method
    @Override
    //reduce takes in thre parameters
    public void reduce(Text key, Iterable<DoubleWritable> values, Context context)
            throws IOException, InterruptedException
    {
        //create a variable SumofRatings of type double and initialize it to 0
        double SumofRating = 0.0;
        
        //use a for loop to store ratings in the ratingList array and sum the ratings in the SumofRatings variable
        for(DoubleWritable value : values)
        {
            ratingList.add(value.get());
            
            //calculate the cumulative sum
            SumofRating = SumofRating + value.get();
            
        }
        
        //get the number of rating in the arrayList
        int size = ratingList.size();
        
        //calculate the average rating
        double averageRating = SumofRating/size;
        
        //Emit the year and the average rating to the output file
        context.write(key, new DoubleWritable(averageRating));
    }
 
}

和主类:

public class AverageRating 
{
    
    public static void main(String[] args) throws Exception
    {
        //Create an object, conf, from the configuration class
        Configuration conf = new Configuration();
        if (args.length != 3)
        {
            System.err.println("Usage: MeanTemperature <input path> <output path>");
            System.exit(-1);
        }
        
        //create an object, job,  from the Job class
        Job job;
        
        //configure the parameters for the job
        job = Job.getInstance(conf, "Average Rating");
        
        //specify the driver class in the JAR file
        job.setJarByClass(AverageRating.class);
        
        
        //setting the input and output paths for the job
        FileInputFormat.addInputPath(job, new Path(args[1]));
        FileOutputFormat.setOutputPath(job, new Path(args[2]));
        
        //Set the mapper and reducer for the job
        job.setMapperClass(AverageRatingMapper.class);
        job.setReducerClass(AverageRatingReducer.class);
        
        //Set the key class (Text) and value class (DoubleWritable) for te job output data
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(DoubleWritable.class);
        
        //Delete output if it exists
        FileSystem hdfs = FileSystem.get(conf);
        Path outputDir = new Path(args[2]);
        if(hdfs.exists(outputDir))
        {
            hdfs.delete(outputDir, true);
        }
        
        System.exit(job.waitForCompletion(true) ? 0 : 1);
        
    }

}

要求的数据样本:

标题 标题 平均评级
tt0000009 XYZ 1911年 6.9
tt0001892 PQR 1912年 6.2
tt0002154 美国广播公司 1912年 8.2
tt0000458 JKL 1913年 6.3
tt0015263 TGH 1913年 7.1
tt0000053 巴解组织 1912年 4.9

注意:还有更多的列。我刚刚添加了重要的

结果在第一年显示正确,但第二年和第三年完全不准确。

  • 1911 6.9
  • 1912 5.25
  • 1913 2.2

有人可以帮我吗!

课程

您实际上并不需要ArrayListReduce 函数中的 ,因为每个减速器都获得按给定键分组的所有值(因此在这种情况下,每个减速器都具有一年的所有评级)。

此外,声明Reduce 函数体ArrayList 外部会带来麻烦,因为您使用这个列表只是为了计算已经在 reducer 上的值的数量。我的猜测是,在不同的减速器扫描之间,列表会不断填充下一个键的评级(也就是这里的年份)。所以第一个键值对是正确的,但它之后的一切都不是因为列表中的元素数量不断增加。

您可以保持更传统,通过使用一个简单的int变量命名numOfRatings来简单地计算值的数量(也就是此处为特定年份给出的评分),然后使用该变量进行除法以求出每年的平均评分。

public static class AverageRatingReducer extends Reducer<Text, DoubleWritable, Text, DoubleWritable>
{
    public void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException 
    {
        // create a variable SumofRatings of type double and initialize it to 0
        double sumOfRating = 0.0;
        int numOfRatings = 0;

        // use a for loop to store ratings in the ratingList array and sum the ratings in the SumofRatings variable
        for(DoubleWritable value : values)
        {
            sumOfRating += value.get();
            numOfRatings++;
        }
                
        // calculate the average rating
        double averageRating = sumOfRating/numOfRatings;
        
        // emit the year and the average rating to the output file
        context.write(key, new DoubleWritable(averageRating));
    }
}

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章