cassandra 分区键增长限制?

本地数据01

我的分区变大意味着什么?我认为 cassandra 可以处理非常大的尺寸。为什么他们在这个例子中使用 2 个分区键?

我所做的也许两个分区键都太大了?

在此处输入图片说明

曼尼什·坎德尔瓦尔

您给出的示例是防止分区变得太大的方法之一。在 Cassandra 中partition key(主键的一部分)用于对相似的行集进行分组。

Here in left side data model, user_id is the partition key which means every video interaction by that user will be placed in same partition. As mentioned in example comment, if user is active and has 1000 interaction daily then in 60 days (2 months) you will have 60000 rows for that user. This may breach Cassandra permissible partition size (in terms of data size stored in single partirion).

So to avoid this situation there are many ways you can avoid partition size to grow too big. For example, you can do

  1. Make another column from that table a part of partition key. This is done in the example above. The video_id is made part of partition key along with user_id.

  2. Bucketing - This is the strategy which is used in time series data generally where you make multiple buckets of a partition key. For example if date is your partition key then you can create 24 buckets as date_1, date_2,.....,date_24. Now you have divided your partition key into smaller partition keys and hence you divided one big partition into 24 small partitions.

The main idea is to avoid your partition to grow too big in size. This is a data modeling technique which one should be aware of while creating data model for Cassandra.

如果仍然有较大的分区大小,则需要根据各种可用的数据建模技术重构数据模型。为此,我建议了解您的数据,估计增长率,计算分区的估计大小,如果您的数据模型不满足分区大小需求,则优化您的数据模型。

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章