vishes_shell - 1 month ago 13

Python Question

I have some list of data, for example

`some_data = [1, 2, 4, 1, 6, 23, 3, 56, 6, 2, 3, 5, 6, 32, 2, 12, 5, 3, 2]`

and i want to get unique values with fixed length(i don't care which i will get) and i also want it to be

`set`

I know that i can do

`set`

`some_data`

`list`

`set`

`set(list(set(some_data))[:5]) # don't look so friendly`

I understand that i don't have

`__getitem__`

`set`

And i completely understand that

`set`

`set`

Possible options is to use:

- ordered-set
- using with
`dict`

values:`None`

`set(dict(map(lambda x: (x, None), some_data)).keys()[:2]) # not that great`

Answer

Sets are iterable. If you *really* don't care which items from your set are selected, you can use `itertools.islice`

to get an iterator that will yield a specified number of items (whichever ones come first in the iteration order). Pass the iterator to the `set`

constructor and you've got your subset without using any extra lists:

```
import itertools
some_data = [1, 2, 4, 1, 6, 23, 3, 56, 6, 2, 3, 5, 6, 32, 2, 12, 5, 3, 2]
big_set = set(some_data)
small_set = set(itertools.islice(big_set, 5))
```

While this is what you've asked for, I'm not sure you should really use it. Sets may iterate in a very deterministic order, so if your data often contains many similar values, you may end up selecting a very similar subset every time you do this. This is especially bad when the data consists of integers (as in the example), which hash to themselves. Consecutive integers will very frequently appear in order when iterating a set. With the code above, only `32`

is out of order in `big_set`

(using Python 3.5), so `small_set`

is `{32, 1, 2, 3, 4}`

. If you added `0`

to the your data, you'd almost always end up with `{0, 1, 2, 3, 4}`

even if the dataset grew huge, since those values will always fill up the first fives slots in the set's hash table.

To avoid such deterministic sampling, you can use `random.sample`

as suggested by jprockbelly.