Adam_MOD - 9 months ago 34

Python Question

I'm trying to assimilate a bunch of information into a usable array like this:

`for (dirpath, dirnames, filenames) in walk('E:/Machin Lerning/Econ/full_set'):`

ndata.extend(filenames)

for i in ndata:

currfile = open('E:/Machin Lerning/Econ/full_set/' + str(i),'r')

rawdata.append(currfile.read().splitlines())

currfile.close()

rawdata = numpy.array(rawdata)

for order,file in enumerate(rawdata[:10]):

for i in rawdata[order]:

r = i.split(',')

pdata.append(r)

fdata.append(pdata)

pdata = []

fdata = numpy.array(fdata)

plt.figure(1)

plt.plot(fdata[:,1,3])

EDIT: After printing ftada.shape when using the first 10 txt files

`for order,file in enumerate(rawdata[:10]):`

I see it is (10, 500, 7). But if i do not limit the size of this, and instead say

`for order,file in enumerate(rawdata):`

Then the fdata.shape is just (447,)

It seems like this happens whenever I increase the number of elements i look through in the rawdata array to above 13... It's not any specific location either - I changed it to

`for order,file in enumerate(rawdata[11:24):`

and that worked fine. aaaaahhh

In case it's useful: here's what a sample of what the text files looks like:

`20080225,A,31.42,31.79,31.2,31.5,30575`

20080225,AA,36.64,38.95,36.48,38.85,225008

20080225,AAPL,118.59,120.17,116.664,119.74,448847

Answer Source

Looks like `fdata`

is an array, and the error is in `fdata[:,1,3]`

. That tries to index `fdata`

with 3 indices, the slice, 1, and 3. But if `fdata`

is a 2d array, this will produce this error - `too many indices`

.

When you get 'indexing' errors, figure out the `shape`

of the offending array. Don't just guess. Add a debug statement `print(fdata.shape)`

.

===================

Taking your file sample, as a list of lines:

```
In [822]: txt=b"""20080225,A,31.42,31.79,31.2,31.5,30575
...: 20080225,AA,36.64,38.95,36.48,38.85,225008
...: 20080225,AAPL,118.59,120.17,116.664,119.74,448847 """
In [823]: txt=txt.splitlines()
In [826]: fdata=[]
In [827]: pdata=[]
```

read one 'file':

```
In [828]: for i in txt:
...: r=i.split(b',')
...: pdata.append(r)
...: fdata.append(pdata)
...:
...:
In [829]: fdata
Out[829]:
[[[b'20080225', b'A', b'31.42', b'31.79', b'31.2', b'31.5', b'30575 '],
....]]]
In [830]: np.array(fdata)
Out[830]:
array([[[b'20080225', b'A', b'31.42', b'31.79', b'31.2', b'31.5',
b'30575 '],
...]]],
dtype='|S8')
In [831]: _.shape
Out[831]: (1, 3, 7)
```

Read an 'identical file"

```
In [832]: for i in txt:
...: r=i.split(b',')
...: pdata.append(r)
...: fdata.append(pdata)
In [833]: len(fdata)
Out[833]: 2
In [834]: np.array(fdata).shape
Out[834]: (2, 6, 7)
In [835]: np.array(fdata).dtype
Out[835]: dtype('S8')
```

Note the dtype - a string of 8 characters. Since on value per line is a string, it can't convert the whole thing to numbers.

Now read a slightly different 'file' (one less line, one less value)

```
In [836]: txt1=b"""20080225,A,31.42,31.79,31.2,31.5,30575
...: 20080225,AA,36.64,38.95,36.48,38.85 """
In [837]: txt1=txt1.splitlines()
In [838]: for i in txt1:
...: r=i.split(b',')
...: pdata.append(r)
...: fdata.append(pdata)
In [839]: len(fdata)
Out[839]: 3
In [840]: np.array(fdata).shape
Out[840]: (3, 8)
In [841]: np.array(fdata).dtype
Out[841]: dtype('O')
```

Now lets add an 'empty' file - no rows so `pdata`

is `[]`

```
In [842]: fdata.append([])
In [843]: np.array(fdata).shape
Out[843]: (4,)
In [844]: np.array(fdata).dtype
Out[844]: dtype('O')
```

Array shape and dtype have totally changed. It can no longer create a uniform 3d array from the lines.

The shape after 10 files, (10, 500, 7), means 10 files, 500 lines each, 7 columns each line. But one file or more of the full 400 is different. My last iteration suggests one is empty.