izaak_pyzaak - 1 year ago 53

Python Question

I am trying to generate a list of all possible DNA sequences of length four with the four character

`A`

`T`

`C`

`G`

`AAAA`

I have looked at

`itertools.combinations_with_replacement(iterable, r)`

however, the list output changes depending on the order of the input string i.e

`itertools.combinations_with_replacement('ATCG', 4) #diff results to...`

itertools.combinations_with_replacement('ATGC', 4)

Because of this, I had an attempt at combining

`itertools.combinations_with_replacement(iterable, r)`

`itertools.permutations()`

Such that pass the output of

`itertools.permutations()`

`itertools.combinations_with_replacement()`

`def allCombinations(s, strings):`

perms = list(itertools.permutations(s, 4))

allCombos = []

for perm in perms:

combo = list(itertools.combinations_with_replacement(perm, 4))

allCombos.append(combo)

for combos in allCombos:

for tup in combos:

strings.append("".join(str(x) for x in tup))

However running

`allCombinations('ATCG', li)`

`li = []`

`list(set(li))`

There must be an easy way to do this, maybe generating a power set and then filtering for length 4?

Answer

You can achieve this by using `product`

. It gives the Cartesian product of the passed iterables:

```
a = 'ACTG'
print(len(list(itertools.product(a, a, a, a))))
# or even better, print(len(list(itertools.product(a, repeat=4)))) as @ayhan commented
>> 256
```

But it returns tuples, so if you are looking for strings:

```
for output in itertools.product(a, repeat=4):
print(''.join(output))
>> 'AAAA'
'AAAC'
.
.
'GGGG'
```

Source (Stackoverflow)