BigBoy1337 BigBoy1337 - 6 months ago 19
Python Question

How do I load a list of images into an array for each channel in Numpy?

I want an array X of shape (n_samples,n_cols,n_rows,n_channels). I want an array y with a shape (n_sample,n_cols,n_rows,n_channels)

I have tried

import glob
from skimage import io, color
import numpy as np

def loadfunc(files)
for fl in files:
img = color.rgb2lab(io.imread(fl))
L = img[:,:,:1]
ab = img[:,:,1:]
yield L,ab

X,y = np.fromiter(loadfunc(glob.glob('path/to/images/*.png')),float)


and I get this error: ValueError: setting an array element with a sequence.

I figure this must be a somewhat common operation - any time someone wants to load image data into an array in numpy so there must be something Im missing?

Answer

np.fromiter requires that you state the dtype. If you use dtype=float, then each value from the iterable must be a float. If you yield single NumPy arrays from loadfunc, you could use their flat attribute to obtain iterators over the flatten array values which could be concatenated with itertools.chain.from_iterable and then passed to np.fromiter:

def loadfunc(files):
    for fl in files:
        img = skcolor.rgb2lab(skio.imread(fl)[..., :3])
        yield img

arrs = loadfunc(files)
Z = np.fromiter(IT.chain.from_iterable([arr.flat for arr in arrs]), dtype=float)

Since np.fromiter returns a 1D array, you would then need to reshape it:

Z = Z.reshape(len(files), h, w, n)

Note that this relies on each image having the same shape. Finally, to load the L values into X and the ab values into y:

X = Z[..., :1]
y = Z[..., 1:]

import glob
import itertools as IT
import numpy as np
import skimage.io as skio
import skimage.color as skcolor

def loadfunc(files):
    for fl in files:
        img = skcolor.rgb2lab(skio.imread(fl)[..., :3])
        yield img

files = glob.glob('path/to/images/*.png')
arrs = loadfunc(files)
first = next(arrs)
h, w, n = first.shape

Z = np.fromiter(IT.chain.from_iterable(
    [first.flat] + [arr.flat for arr in arrs]), dtype=float)
Z = Z.reshape(len(files), h, w, n)
X = Z[..., :1]
y = Z[..., 1:]

Regarding the question in the comments:

If I wanted to do extra processing to L and ab, where would I do that?

I believe in separating the loading from the processing of the data. By keeping the two functions distinct, you leave open the possibility of passing different data from different sources to the same processing function. If you put both the loading and the processing of the data (such as a KNN classification of the ab values) into loadfunc then there is no way to reuse the KNN classification code without loading the data from files.


If you allow us to change the order of the axes from (n_samples, n_cols, n_rows, n_channels) to (n_cols, n_rows, n_channels, n_samples), then the code could be simplified using np.stack:

import glob
import numpy as np
import skimage.io as skio
import skimage.color as skcolor

def loadfunc(files):
    for fl in files:
        img = skcolor.rgb2lab(skio.imread(fl)[..., :3])
        yield img

files = glob.glob('path/to/images/*.png')
Z = np.stack(loadfunc(files), axis=-1)
X = Z[..., :1, :]
Y = Z[..., 1:, :]

This code is simpler and therefore preferable to the code (using np.fromiter) above.