How can I extract a portion of text from all lines of a file?

arteagavskiy

I have these sequences:

0,<|endoftext|>ERRDLLRFKH:GAGCGCCGCGACCTGTTACGATTTAAACAC<|endoftext|>
1,<|endoftext|>RRDLLRFKHG:CGCCGCGACCTGTTACGATTTAAACACGGC<|endoftext|>
2,<|endoftext|>RDLLRFKHGD:CGCGACCTGTTACGATTTAAACACGGCGAC<|endoftext|>
3,<|endoftext|>DLLRFKHGDS:GACCTGTTACGATTTAAACACGGCGACAGT<|endoftext|>

And I'd like to get only the aminoacid sequences, like this:

ERRDLLRFKH:
RRDLLRFKHG:
RDLLRFKHGD:
DLLRFKHGDS:

I have wrote this script so far:

with open("example_val.txt") as f:
    for line in f:
        if line.startswith(""):
            line = line[:-1]
        print(line.split(":", 1))

Nevertheless, I got only the original sequences. Please give me some advice.

johann

Regex solution:

import re

with open("example_val.txt") as f:
    re.findall("(?<=>)[a-zA-Z]*:", f.read())

Regex Explanation:

  • (?<=>) : is a positive lookbehind which finds the > character before our match
  • [a-zA-Z]*: : matches zero or more of characters present in a-z and A-Z with the colon at the end

Test in Regex101 : regex101.com/r/qVGCYF/1

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How can I extract certain portions of all lines in a text file?

How can I remove all lines not containing `@` from a text file?

How can I delete all unordered lines from a text file?

How can I extract a predetermined range of lines from a text file on Unix?

How I can extract the portion of words from the file using python3.6?

How to extract lines from a text file with a step?

How can I extract paragraphs from html file (portion between <p> </p>) containing a specific word using R?

How can I save lines from terminal to a text file?

How can I read lines from a text file into a variable

How can I extract these characters from a text file?

How can I extract data from text file?

How can I extract a text from a bytes file using python

How I can extract specific target number from text file

Parsing XML: How can I get all the information from lines with same name but different text in XML file using Python?

How can I extract/change lines in a text file whose data are separated into fields?

How can I both extract a specific line in a text file as well as multiple lines containing a specific string?

How can I consolidate lines in a text file?

extract portion of text from cmd

How to extract text portion of a binary file in linux/bash?

How to get portion of lines from all .txt files in a directory?

How can I only extract lines from a file if column 1 occurs at least n- times?

How to extract all lines from a file starting with a particular number?

How can I extract text from images?

How to extract the lines containing the letters into an array from a text file?

How to extract an specific portion of data( a block) from text

How can I extract the unmatched portion of a string in R with regular expressions?

how can I extract the lines of a big log file in python

How do I extract a portion of a sentence in a list of sentences to a separate file and then replace a portion of the original sentences in notepad++?

Extract a string from all lines in a file