modify range in every loop of the range

Arnaud 'KaRn1zC'

I have a groups.txt file which contains ortholog groups with species and geneID in every groups. it looks like :

OG_117996: R_baltica_p|32476565 V_spinosum_v|497645257
OG_117997: R_baltica_p|32476942 S_pleomorpha_s|374317197
OG_117998: R_baltica_p|32477405 V_bacterium_v|198258541

I made a function that created a list of every species in the whole file (66 total) called listOfAllSpecies. I need to create a function that gives me all the groups which contain 1 species from these 66, then all the groups which contain 2 species from these 66, etc.

To simplify it :

OG_1: A|1 A|3 B|1 C|2
OG_2: A|4 B|6
OG_3: C|8 B|9 A|10

and I need to get in this example:

(species) A,B (are in groups) OG_1, OG_2, OG_3
(species) A,C (are in groups) OG_1, OG_3
(species) B,C (are in groups) OG_1, OG_2, OG_3
(species) A,B,C (are in groups) OG_1, OG_3
(species) B (is in groups) OG_1, OG_2, OG_3

I thought to try

for species in range(start, end=None):         
    if end == None:           
        start = 0
        end = start + 1

to get the first species in my listOfAllSpecies and then tell me in which groups OG_XXXX it is contained. Then get the first and the second species, etc. until it takes all the 66 species. How do I modify the range within the for loop, or is there a different way to do this?

here is my actual code with function that i need without the part I need that i asked :

import sys 

if len(sys.argv) != 2:
print("Error, file name to open is missing")
sys.exit([1])

def readGroupFile(groupFileName):
dict_gene_taxonomy = {}
fh = open(groupFileName,"r")

for line in fh:
    liste = line.split(": ")
    groupName = liste[0]
    genesAsString = liste[1]
    dict_taxon = {}
    liste_gene = genesAsString.split()

    for item in liste_gene:
        taxonomy_gene = item.split("|")
        taxonomy = taxonomy_gene[0]
        geneId   = taxonomy_gene[1]

        if not taxonomy in dict_taxon:
            dict_taxon[taxonomy] = []

        dict_taxon[taxonomy].append(geneId)

    dict_gene_taxonomy[groupName] = dict_taxon
fh.close()
return dict_gene_taxonomy


def showListOfAllSpecies(dictio):
listAllSpecies = []
for groupName in dictio:
    dictio_in_dictio = dictio[groupName]
    for speciesName in dictio_in_dictio:
        if not speciesName in listAllSpecies:
            listAllSpecies.append(speciesName)
return listAllSpecies

dico = readGroupFile(sys.argv[1])
listAllSpecies = showListOfAllSpecies(dico)
Rob Grant

Not sure if this is exactly what you want, but it's a start :)

from itertools import combinations

# Assume input is a list of strings called input_list
input_list = ['OG_1: A|1 A|3 B|1 C|2','OG_2: A|4 B|6','OG_3: C|8 B|9 A|10']

# Create a dict to store relationships and a list to store OGs
rels = {}
species = set()

# Populate the dict
for item in input_list:
    params = item.split(': ')
    og = params[0]
    raw_species = params[1].split()
    s = [rs.split('|')[0] for rs in raw_species]
    rels[og] = s

    for item in s:
        species.add(item)

# Get the possible combinations of species:
combos = [c for limit in range(1, len(l)-1) for c in combinations(species,limit)]

def combo_in_og(combo, og):
    for item in combo:
        if item not in rels[og]:
            return False
    return True

# Loop over the combinations and print
for combo in combos:
    valid_ogs = []
    for og in ogs:
        if combo_in_og(combo, og):
            valid_ogs.append(og)
    print('(species) ' + ','.join(combo) + ' (are in groups) ' + ', '.join(valid_ogs))

Produces:

(species) C (are in groups) OG_1, OG_3
(species) A (are in groups) OG_1, OG_2, OG_3
(species) B (are in groups) OG_1, OG_2, OG_3
(species) C,A (are in groups) OG_1, OG_3
(species) C,B (are in groups) OG_1, OG_3
(species) A,B (are in groups) OG_1, OG_2, OG_3
(species) C,A,B (are in groups) OG_1, OG_3

Just a warning: what you're trying to do will start to take forever with large enough numbers of inputs, as its complexity is 2^N. You can't get around it (that's what the problem demands), but it's there.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related