您给出的示例是防止分区变得太大的方法之一。在 Cassandra 中partition key
(主键的一部分)用于对相似的行集进行分组。
Here in left side data model, user_id
is the partition key which means every video interaction by that user will be placed in same partition. As mentioned in example comment, if user is active and has 1000 interaction daily then in 60 days (2 months) you will have 60000 rows for that user. This may breach Cassandra permissible partition size (in terms of data size stored in single partirion).
So to avoid this situation there are many ways you can avoid partition size to grow too big. For example, you can do
Make another column from that table a part of partition key. This is done in the example above. The video_id
is made part of partition key along with user_id
.
Bucketing - This is the strategy which is used in time series data generally where you make multiple buckets of a partition key. For example if date
is your partition key then you can create 24 buckets as date_1, date_2,.....,date_24
. Now you have divided your partition key into smaller partition keys and hence you divided one big partition into 24 small partitions.
The main idea is to avoid your partition to grow too big in size. This is a data modeling technique which one should be aware of while creating data model for Cassandra.
如果仍然有较大的分区大小,则需要根据各种可用的数据建模技术重构数据模型。为此,我建议了解您的数据,估计增长率,计算分区的估计大小,如果您的数据模型不满足分区大小需求,则优化您的数据模型。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句