学术难题:没有自我加入的派生比例

MatBailie

我们有数据到达以下结构

entity_id   entity_value   category_id   category_weight   group_id   group_weight
    1            100            11               6            101          4
    1            100            11               6            102          3
    1            100            12               5            102          3
    1            100            12               5            103          2
    1            100            13               6            101          4

实体可以属于任何类别和任何组合的任何组,类别成员资格和组成员资格之间没有隐式关系。

数据是冗余但一致的;如果一行显示类别11的权重为6,则所有行将显示类别11的权重为6。这同样适用于组及其权重。

一行数据由唯一标识{entity_id, category_id, group_id}


目的是根据各种权重在所有行中分配实体的价值。首先,按类别分配,然后按组分配。


步骤1:按类别分摊

实体1与权重为{6,5,4}的3个类别{11,12,13}相关联

将100 *(6 /(6 + 5 + 6))
分配给类别11 => 35.29将100 *(5 /(6 + 5 + 6))
分配给类别12 => 29.41将100 *(6 /(6 + 5) +6))类别13 => 35.29

步骤2:按组分配结果

Entity1Category11与权重为{4,3}的组{101,102}相关联

将35.29 *(4 /(4 + 3))
分配给组101 => 20.17将35.29 *(3 /(4 + 3))分配给组102 => 15.12

Entity1Category12与权重为{3,2}的组{102,103}相关联

将29.41 *(3 /(3 + 2))
分配给组102 => 17.65将29.41 *(2 /(3 + 2))分配给组103 => 11.76

Entity1Category13与权重为{4}的组{101}相关联

将35.29 *(4 /(4))分配给组103 => 35.29


我可以使用窗口功能执行第二步。干净整洁,没有自我参与。

但是,第一步似乎需要子查询和自我联接。

例如... http://sqlfiddle.com/#!18/be890/1

SELECT
  sample.entity_id,
  sample.category_id,
  sample.group_id,
  sample.entity_value   AS original_value,
  sample.entity_value
  * (sample.category_weight / entity.total_category_weight)
  * (sample.group_weight    / SUM(sample.group_weight) OVER (PARTITION BY sample.entity_id, sample.category_id))
    AS apportioned_value
FROM
(
  SELECT
    entity_id,
    SUM(category_weight)   AS total_category_weight
  FROM
  (
    SELECT
      entity_id,
      category_id,
      MAX(category_weight)   AS category_weight
    FROM
      sample
    GROUP BY
      entity_id,
      category_id
  )
    entity_category
  GROUP BY
    entity_id
)
  entity
INNER JOIN
  sample
    ON sample.entity_id = entity.entity_id

是否有一种更整洁的方式,不需要自我加入?

MatBailie
SELECT
  entity_id,
  category_id,
  group_id,
  entity_value   AS original_value,
  entity_value
  * (category_weight / SUM(scaled_cat_weight) OVER (PARTITION BY entity_id             ))
  * (group_weight    / SUM(group_weight     ) OVER (PARTITION BY entity_id, category_id))
    AS apportioned_value
FROM
(
  SELECT
    *,
    category_weight / COUNT(*) OVER (PARTITION BY entity_id, category_id)   AS scaled_cat_weight
  FROM
    sample
)
  scaled
ORDER BY
  entity_id,
  category_id,
  group_id

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章