piRSquared - 11 months ago 31

Python Question

I have a dataframe of zeros and ones. I want to treat each column as if its values were a binary representation of an integer. What is easiest way to make this conversion?

I want this:

`df = pd.DataFrame([[1, 0, 1], [1, 1, 0], [0, 1, 1], [0, 0, 1]])`

print df

0 1 2

0 1 0 1

1 1 1 0

2 0 1 1

3 0 0 1

converted to:

`0 12`

1 6

2 11

dtype: int64

As efficiently as possible.

Answer

You can create a string from the column values and then use `int(binary_string, base=2)`

to convert to integer:

```
df.apply(lambda col: int(''.join(str(v) for v in col), 2))
Out[6]:
0 12
1 6
2 11
dtype: int64
```

Not sure about efficiency, multiplying by the relevant powers of 2 then summing probably takes better advantage of fast numpy operations, this is probably more convenient though.