我有这些数据,需要按两列进行分组,然后对其他两个字段求和。假设这四列的名称分别是:OS,device,view,click。我基本上想知道每个操作系统和设备的数量,它们拥有多少个视图以及拥有多少点击。
(2,3346,1,)
(3,3953,1,1)
(25,4840,1,1)
(2,94840,1,1)
(14,0526,1,1)
(37,4864,1,)
(2,7353,1,)
这就是我到目前为止
A is data: OS,device,view,click
B = GROUP A BY (OS,device);
Result = FOREACH B {
GENERATE group AS OS,device, SUM(view) AS visits, SUM(click) AS clicks;};
dump Result;
这将无法正常工作,错误消息是:模式中不存在投影字段[OS]:group:tuple(OS:int,device:long),B:bag {:tuple(OS:int,device:long, view:int,click:int)}。
这是经过测试的代码,您缺少FLATTEN:
A = LOAD '/user/root/pig_data' using PigStorage(',') AS (OS, device, view, click);
B = GROUP A BY (OS, device);
RESULT = FOREACH B GENERATE FLATTEN(group) AS (OS, device), SUM(A.view) as views, SUM(A.click) as clicks;
dump RESULT;
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句