我有一组高度嵌套的mongoDB对象,我想计算与给定条件匹配的子文档的数量Edit :(在每个文档中)。例如:
{"_id":{"chr":"20","pos":"14371","ref":"A","alt":"G"},
"studies":[
{
"study_id":"Study1",
"samples":[
{
"sample_id":"NA00001",
"formatdata":[
{"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
]
},
{
"sample_id":"NA00002",
"formatdata":[
{"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
]
}
]
}
]
}
{"_id":{"chr":"20","pos":"14372","ref":"T","alt":"AA"},
"studies":[
{
"study_id":"Study3",
"samples":[
{
"sample_id":"SAMPLE1",
"formatdata":[
{"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
]
},
{
"sample_id":"SAMPLE2",
"formatdata":[
{"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
]
}
]
}
]
}
{"_id":{"chr":"20","pos":"14373","ref":"C","alt":"A"},
"studies":[
{
"study_id":"Study3",
"samples":[
{
"sample_id":"SAMPLE3",
"formatdata":[
{"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
]
},
{
"sample_id":"SAMPLE7",
"formatdata":[
{"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
]
}
]
}
]
}
我想知道有多少子文档包含GT:“ 1 | 0”,在这种情况下,第一个文档中为1,第二个文档中为两个,第三个文档中为0。我已经尝试了unwind和aggregate函数,但是显然我没有做正确的事情。当我尝试通过“ GT”字段计算子文档时,mongo抱怨:
db.collection.aggregate([{$group: {"$studies.samples.formatdata.GT":1,_id:0}}])
由于我的群组名称不能包含“。”,因此,如果我将其省略:
db.collection.aggregate([{$group: {"$GT":1,_id:0}}])
它抱怨,因为“ $ GT不能是运算符名称”
有任何想法吗?
$unwind
使用数组时需要进行处理,并且需要执行三遍:
db.collection.aggregate([
// Un-wind the array's to access filtering
{ "$unwind": "$studies" },
{ "$unwind": "$studies.samples" },
{ "$unwind": "$studies.samples.formdata" },
// Group results to obtain the matched count per key
{ "$group": {
"_id": "$studies.samples.formdata.GT",
"count": { "$sum": 1 }
}}
])
理想情况下,您要过滤输入。可能在处理$ unwind之前和之后都使用$ match进行此操作,并使用$ regex来匹配文档,其中点处的数据以“ 1”开头。
db.collection.aggregate([
// Match first to exclude documents where this is not present in any array member
{ "$match": { "studies.samples.formdata.GT": /^1/ } },
// Un-wind the array's to access filtering
{ "$unwind": "$studies" },
{ "$unwind": "$studies.samples" },
{ "$unwind": "$studies.samples.formdata" },
// Match to filter
{ "$match": { "studies.samples.formdata.GT": /^1/ } },
// Group results to obtain the matched count per key
{ "$group": {
"_id": {
"_id": "$_id",
"key": "$studies.samples.formdata.GT"
},
"count": { "$sum": 1 }
}}
])
请注意,在所有情况下,带“美元$”前缀的条目都是指引用文档属性的“变量”。这些是使用右侧输入的“值”。左侧的“键”必须指定为纯字符串键。不能使用任何变量来命名键。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句