SQLalchemy:迭代 + 计数和 func.count() 结果之间的差异

独自的

我有一系列这样定义的类(注意:它们通过 扩展了一个已经存在的数据库automap,所以对下面列的引用可能不会反映这里的类):

class VariantAssociation(Base):

    __tablename__ = "sample_variant_association"

    vid = Column(Integer, ForeignKey("variants.variant_id"),
                primary_key=True, index=True)
    sid = Column(Integer, ForeignKey("samples.sample_id"),
                primary_key=True, index=True)

    vdepth = Column(Integer, index=True)
    valt_depth = Column(Integer, index=True)
    gt = Column(Text)
    gt_type = Column(Integer)
    fraction = Column(Float, index=True)

    variant = relationship("Variant", back_populates="samples")
    sample = relationship("Samples", back_populates="variants")


class Variant(Base):

    __tablename__ = "variants"

    variant_id = Column(Integer, primary_key=True)
    info = deferred(Column(LargeBinary))

    samples = relationship("VariantAssociation",
                        back_populates="variant")

    def __repr__(self):

        data = "<Variant {chrom}:{start}-{end} {gene} {ref}/{alt} {type}>"

        return data.format(chrom=self.chrom,
                        start=self.start,
                        end=self.end,
                        gene=self.gene,
                        ref=self.ref,
                        alt=self.alt,
                        type=self.type)


class Samples(Base):

    __tablename__ = "samples"

    sample_id = Column(Integer, primary_key=True, index=True)
    name = Column(Text, index=True)
    variants = relationship("VariantAssociation",
                            back_populates="sample")

它们组装在一个相当复杂的查询中,但在这里精简了:

query = session.query(Variant).join(VariantAssociation.variant_id).join(Samples)
query = query.filter(VariantAssociation.vdepth >= 60)

现在,我想计算两列的组合:refalt

我认为这很简单:

query = query.with_entities(Variant.ref, Variant.alt, 
    func.count()).distinct().group_by(gemini.Variant.ref, gemini.Variant.alt)

其中产生(一行示例):

('A', 'C', 308)

但是,如果我只是迭代查询并计数:

from collections import defaultdict, Counter
counts  = defaultdict(Counter)
for row in query.with_entities(Variant.ref, Variant.alt):
    counts[f"{row.ref}>{row.alt}"].update(["present"])

给我

'A>C': Counter({'present': 155})

一半的我发现通过count我知道后者是正确的,而不是前者。但我想使用前者,因为后者可能非常慢(大型 SQLite 数据库)。

我是否错误地设置了计数?

编辑:根据要求,完整查询count(包括来自数据库本身的更多过滤器)

SELECT DISTINCT variants.ref AS variants_ref, variants.alt AS variants_alt, count(*) AS count_1 
FROM variants JOIN sample_variant_association ON variants.variant_id = sample_variant_association.vid JOIN
samples ON samples.sample_id = sample_variant_association.sid 
WHERE sample_variant_association.gt_type != ? AND variants.impact NOT IN (?, ?, ?, ?) AND
sample_variant_association.vdepth >= ? AND sample_variant_association.fraction >= ? AND variants.chrom NOT IN (?,
?) AND variants.aaf_1kg_eur < ? AND variants.type = ? AND sample_variant_association.fraction >= ? AND
sample_variant_association.vdepth >= ? GROUP BY variants.ref, variants.alt

还有一个用于迭代的:

    SELECT DISTINCT variants.ref AS variants_ref, variants.alt AS variants_alt 
FROM variants JOIN sample_variant_association ON variants.variant_id = sample_variant_association.vid JOIN
samples ON samples.sample_id = sample_variant_association.sid 
WHERE sample_variant_association.gt_type != ? AND variants.impact NOT IN (?, ?, ?, ?) AND
sample_variant_association.vdepth >= ? AND sample_variant_association.fraction >= ? AND variants.chrom NOT IN (?,
?) AND variants.aaf_1kg_eur < ? AND variants.type = ? AND sample_variant_association.fraction >= ? AND
sample_variant_association.vdepth >= ?

编辑 2:我将此追溯到基本查询中重复的变体 ID 的存在:

query.with_entities(gemini.Variant.variant_id).count()
18288
query.with_entities(gemini.Variant.variant_id).distinct().count()
14437

所以这个问题与我最初的想法不同。以某种方式在循环中考虑了重复记录,但不考虑func.count().

独自的

使用子查询有效,首先删除重复项:

id_subquery = query.with_entities(Variant.variant_id).distinct().subquery()

然后获取实际数据:

c_query = session.query(Variant.ref, Variant.alt, func.count(1))
c_query = c_query.filter(Variant.variant_id.in_(id_subquery))
c_query = c_query.group_by(Variant.ref, Variant.alt)

c_query.first()
('A', 'C', 155)

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

备份MBR时,计数= 2的dd和对应块大小的count = 1之间是否存在差异?

C ++中func()和(* this).func()之间的区别

(func)()和(func).call(window)之间的区别

func.count(distinct(...))的结果与distinct()。count()的结果不同

PHP的foreach和for之间的迭代差异

在sqlalchemy中为func.count列定义名称

带过滤器的SQLAlchemy func.count

SQLAlchemy-在查询中过滤func.count

如何使用SQLAlchemy addcolumn包含func.count值

DFT和FFT(幅度)结果之间的差异

ss 和 t 定位之间的结果差异?

为什么 call_user_func 和 call_user_func_array() 返回类型之间存在差异

std :: count和std :: find之间的性能差异

count = count + n&1 和 count += n&1 之间的差异

curl 结果和 Python 请求结果之间的差异

迭代,ddply和计数

快速保留计数和参考计数之间的差异

time.sleep() 和迭代直到秒过去之间的差异

JDK 1.6和1.7之间的迭代器差异

列表和迭代器之间可能存在的差异

C ++(LAPACK,sgels)和Python(Numpy,lstsq)结果之间的差异

时间结果中运行次数和循环次数之间的差异

结果中均值和高斯滤波器之间的差异

EMR-Presto和Athena之间的查询结果差异

检测秘密和检测秘密挂钩结果之间的差异

如何计算日期之间的差异;限制和排序结果?

CSS选择器和querySelector函数之间的结果差异

R和SPSS线性模型结果之间的差异

Always模块和组合模块中的总和结果之间的差异