Mart&#237;n Fixman - 2 years ago 151
Python Question

# How can I slice each element of a numpy array of strings?

Numpy has some very useful string operations, which vectorize the usual Python string operations.

Compared to these operation and to

pandas.str
, the numpy strings module seems to be missing a very important one: the ability to slice each string in the array. For example,

a = numpy.array(['hello', 'how', 'are', 'you'])
numpy.char.sliceStr(a, slice(1, 3))
>>> numpy.array(['el', 'ow', 're' 'ou'])

Am I missing some obvious method in the module with this functionality? Otherwise, is there a fast vectorized way to achieve this?

Here's a vectorized approach -

def slicer_vectorized(a,start,end):
b = a.view('S1').reshape(len(a),-1)[:,start:end]
return np.fromstring(b.tostring(),dtype='S'+str(end-start))

Sample run -

In [68]: a = np.array(['hello', 'how', 'are', 'you'])

In [69]: slicer_vectorized(a,1,3)
Out[69]:
array(['el', 'ow', 're', 'ou'],
dtype='|S2')

In [70]: slicer_vectorized(a,0,3)
Out[70]:
array(['hel', 'how', 'are', 'you'],
dtype='|S3')

Runtime test -

Testing out all the approaches posted by other authors that I could run at my end and also including the vectorized approach from earlier in this post.

Here's the timings -

In [53]: # Setup input array
...: a = np.array(['hello', 'how', 'are', 'you'])
...: a = np.repeat(a,10000)
...:

In [54]: %timeit slicer(1, 3)(a)
10 loops, best of 3: 23.5 ms per loop