使用GNU Parallel并行化嵌套循环

Paul 发表于 Dev

保罗

我在Bash工作。我有一系列嵌套的for循环，它们反复查找是否存在96个条形码序列的三个列表。我的目标是找到条形码的每个唯一组合，其中有96x96x96（884,736）个可能的组合。

for barcode1 in "${ROUND1_BARCODES[@]}";
do
grep -B 1 -A 2 "$barcode1" $FASTQ_R > ROUND1_MATCH.fastq
echo barcode1.is.$barcode1 >> outputLOG

    if [ -s ROUND1_MATCH.fastq ]
    then

        # Now we will look for the presence of ROUND2 barcodes in our reads containing barcodes from the previous step
        for barcode2 in "${ROUND2_BARCODES[@]}";
        do
        grep -B 1 -A 2 "$barcode2" ROUND1_MATCH.fastq > ROUND2_MATCH.fastq

            if [ -s ROUND2_MATCH.fastq ]
            then

                # Now we will look for the presence of ROUND3 barcodes in our reads containing barcodes from the previous step 
                for barcode3 in "${ROUND3_BARCODES[@]}";
                do
                grep -B 1 -A 2 "$barcode3" ./ROUND2_MATCH.fastq | sed '/^--/d' > ROUND3_MATCH.fastq

                # If matches are found we will write them to an output .fastq file itteratively labelled with an ID number
                if [ -s ROUND3_MATCH.fastq ]
                then
                mv ROUND3_MATCH.fastq results/result.$count.2.fastq
                fi

                count=`expr $count + 1` 
                done
            fi
        done
    fi
done

该代码有效，并且我能够成功提取每个条形码组合的序列。但是，我认为可以通过并行化此循环结构来提高处理大型文件的速度。我知道我可以使用GNU并行执行此操作，但是我正在努力嵌套并行化。

# Parallelize nested loops
now=$(date +"%T")
echo "Beginning STEP1.2: PARALLEL Demultiplex using barcodes. Current 
time : $now" >> outputLOG

mkdir ROUND1_PARALLEL_HITS
parallel -j 6 'grep -B 1 -A 2 -h {} SRR6750041_2_smalltest.fastq > ROUND1_PARALLEL_HITS/{#}_ROUND1_MATCH.fastq' ::: "${ROUND1_BARCODES[@]}"

mkdir ROUND2_PARALLEL_HITS
parallel -j 6 'grep -B 1 -A 2 -h {} ROUND1_PARALLEL_HITS/*.fastq > ROUND2_PARALLEL_HITS/{#}_{/.}.fastq' ::: "${ROUND2_BARCODES[@]}"

mkdir ROUND3_PARALLEL_HITS
parallel -j 6 'grep -B 1 -A 2 -h {} ROUND2_PARALLEL_HITS/*.fastq > ROUND3_PARALLEL_HITS/{#}_{/.}.fastq' ::: "${ROUND3_BARCODES[@]}"

mkdir parallel_results
parallel -j 6 'mv {} parallel_results/result_{#}.fastq' ::: ROUND3_PARALLEL_HITS/*.fastq

如何使用并行成功地重新创建for循环的嵌套结构？

奥莱·丹吉（Ole Tange）

仅并行化内部循环：

for barcode1 in "${ROUND1_BARCODES[@]}";
do
grep -B 1 -A 2 "$barcode1" $FASTQ_R > ROUND1_MATCH.fastq
echo barcode1.is.$barcode1 >> outputLOG

    if [ -s ROUND1_MATCH.fastq ]
    then

        # Now we will look for the presence of ROUND2 barcodes in our reads containing barcodes from the previous step
        for barcode2 in "${ROUND2_BARCODES[@]}";
        do
        grep -B 1 -A 2 "$barcode2" ROUND1_MATCH.fastq > ROUND2_MATCH.fastq
            if [ -s ROUND2_MATCH.fastq ]
            then
                # Now we will look for the presence of ROUND3 barcodes in our reads containing barcodes from the previous step 
                doit() {
                    grep -B 1 -A 2 "$1" ./ROUND2_MATCH.fastq | sed '/^--/d'
                }
                export -f doit
                parallel -j0 doit {} '>' results/$barcode1-$barcode2-{} ::: "${ROUND3_BARCODES[@]}"
                # TODO remove files with 0 length
            fi
        done
    fi
done

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-11-24

我来说两句

0 条评论

登录后参与评论

上一篇：Web Scraping mit Rvest - NA zurückgeben, wenn der Knoten nicht gefunden wird?

GNU并行：使用顺序命令的while循环

Gnu Parallel：嵌套并行

使用GNU parallel并行化bash for循环

如何使用GNU并行化对大型数据集上包含嵌套循环的bash脚本进行并行化？

使用GNU Parallel并行化嵌套循环

使用GNU Parallel并行化嵌套循环

Android Studio Kotlin：提取为常量

计算数据帧R中的字符串频率

如何使用Redux-Toolkit重置Redux Store

http：// localhost：3000 /＃！/为什么我在localhost链接中得到“＃！/”。

如何使用tweepy流式传输来自指定用户的推文（仅在该用户发布推文时流式传输）

TreeMap中的自定义排序

TYPO3：将 Formhandler 添加到新闻扩展

遍历元素数组以每X秒在浏览器上显示

在Ubuntu和Windows中，触摸板有时会滞后。硬件问题？

警告消息：在matrix（unlist（drop.item），ncol = 10，byrow = TRUE）中：数据长度[16]不是列数的倍数[10]>？

无法连接网络并在Ubuntu 14.04中找到eth0

将辅助轴原点与主要轴对齐

我可以ping IPv6但不能ping IPv4

在Jenkins服务器中使用Selenium和Ruby进行的黄瓜测试失败，但在本地计算机中通过

提交html表单时为空

使用C ++ 11将数组设置为零

如果从DB接收到的值为空，则JMeter JDBC调用将返回该值作为参数名称

尝试在Dell XPS13 9360上安装Windows 7时出错

如何在R中转置数据

无法使用 envoy 访问 .ssh/config

未捕获的SyntaxError：带有Ajax帖子的意外令牌u