jbaums jbaums - 5 months ago 38
Python Question

gdal_calc amin fails when passing more than 23 input files

I've written an R function that calls

gdal_calc.py
to calculate the pixel-wise minimum value across a
RasterStack
(series of input raster files). I've done this because it's much faster than
raster::min
for large rasters. The function works well for up to 23 files, but throws a warning when passing 24 or more, returning an output raster filled with zeroes.

Since R is just preparing a system call to
python gdal_calc.py
, this question is not specific to R and I encourage python/numpy aficionados to read on.

Here's the function. The structure of the eventual
gdal_calc
call is shown in the warning message thrown by the problematic usage at the bottom of this post.

gdal_min <- function(infile, outfile, return_raster=FALSE, overwrite=FALSE) {
require(rgdal)
if(return_raster) require(raster)
# infile: The multiband raster file (or a vector of paths to multiple
# raster files) for which to calculate cell minima.
# outfile: Path to raster output file.
# return_raster: (logical) Should the output raster be read back into R?
# overwrite: (logical) Should outfile be overwritten if it exists?
gdal_calc <- Sys.which('gdal_calc.py')
if(gdal_calc=='') stop('gdal_calc.py not found on system.')
if(file.exists(outfile) & !overwrite)
stop("'outfile' already exists. Use overwrite=TRUE to overwrite.",
call.=FALSE)
nbands <- sapply(infile, function(x) nrow(attr(GDALinfo(x), 'df')))
if(length(infile) > 26 || nbands > 26) stop('Maximum number of inputs is 26.')
if(length(nbands) > 1 & any(nbands > 1))
warning('One or more rasters have multiple bands. First band used.')

if(length(infile)==1) {
inputs <- paste0('-', LETTERS[seq_len(nbands)], ' ', infile, ' --',
LETTERS[seq_len(nbands)], '_band ', seq_len(nbands), collapse=' ')
n <- nbands
} else {
inputs <- paste0('-', LETTERS[seq_along(nbands)], ' ', infile, ' --',
LETTERS[seq_along(nbands)], '_band 1', collapse=' ')
n <- length(infile)
}

message('Calculating minimum and writing to ', basename(outfile))
system2('python',
args=c(gdal_calc, inputs,
sprintf("--outfile=%s", outfile),
sprintf('--calc="amin([%s], axis=0)"',
paste0(LETTERS[seq_len(n)], collapse=',')),
'--co="COMPRESS=LZW"',
if(overwrite) '--overwrite'),
stdout=FALSE
)
if(return_raster) raster(outfile) else invisible(NULL)
}


And here is a dummy raster grid, written to
'test.tif'
in the present working directory. For simplicity, and to demonstrate that the problem is not due to a peculiar individual raster, I pass the same input file several times to gdal_calc, rather than passing a number of different files.

library(raster)
writeRaster(raster(matrix(runif(100), 10)), 'test.tif')


The function works fine for 23 input files:

r <- gdal_min(rep('test.tif', 23), f_out <- tempfile(fileext='.tif'),
return_raster=TRUE)
plot(r)


... but not for ≥ 24:

r <- gdal_min(rep('test.tif', 24), f_out <- tempfile(fileext='.tif'),
return_raster=TRUE)
## Calculating minimum and writing to file2310254bcc.tif
## Warning message:
## running command '"python" C:\Python33\Scripts\GDAL_C~1.PY -A test.tif --A_band 1 -B test.tif --B_band 1 -C test.tif --C_band 1 -D test.tif --D_band 1 -E test.tif --E_band 1 -F test.tif --F_band 1 -G test.tif --G_band 1 -H test.tif --H_band 1 -I test.tif --I_band 1 -J test.tif --J_band 1 -K test.tif --K_band 1 -L test.tif --L_band 1 -M test.tif --M_band 1 -N test.tif --N_band 1 -O test.tif --O_band 1 -P test.tif --P_band 1 -Q test.tif --Q_band 1 -R test.tif --R_band 1 -S test.tif --S_band 1 -T test.tif --T_band 1 -U test.tif --U_band 1 -V test.tif --V_band 1 -W test.tif --W_band 1 -X test.tif --X_band 1 --outfile=C:\Users\John\AppData\Local\Temp\RtmpK4heMM\file2310254bcc.tif --calc="amin([A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X], axis=0)" --co="COMPRESS=LZW"' had status 1





The traceback given when executing the latter directly from the terminal is:

"python" C:\Python33\Scripts\GDAL_C~1.PY -A test.tif --A_band 1 -B test.tif --B_band 1 -C test.tif --C_band 1 -D test.tif --D_band 1 -E test.tif --E_band 1 -F test.tif --F_band 1 -G test.tif --G_band 1 -H test.tif --H_band 1 -I test.tif --I_band 1 -J test.tif --J_band 1 -K test.tif --K_band 1 -L test.tif --L_band 1 -M test.tif --M_band 1 -N test.tif --N_band 1 -O test.tif --O_band 1 -P test.tif --P_band 1 -Q test.tif --Q_band 1 -R test.tif --R_band 1 -S test.tif --S_band 1 -T test.tif --T_band 1 -U test.tif --U_band 1 -V test.tif --V_band 1 -W test.tif --W_band 1 -X test.tif --X_band 1 --outfile=C:\Users\John\AppData\Local\Temp\RtmpK4heMM\file2310254bcc.tif --calc="amin([A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X], axis=0)" --co="COMPRESS=LZW"

0.. evaluation of calculation amin([A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X], axis=0) failed
Traceback (most recent call last):
File "c:\Python33\lib\site-packages\numpy\core\fromnumeric.py", line 2208, in amin
amin = a.min
AttributeError: 'list' object has no attribute 'min'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Python33\Scripts\GDAL_C~1.PY", line 329, in <module>
main()
File "C:\Python33\Scripts\GDAL_C~1.PY", line 326, in main
doit(opts, args)
File "C:\Python33\Scripts\GDAL_C~1.PY", line 275, in doit
myResult = eval(opts.calc)
File "<string>", line 1, in <module>
File "c:\Python33\lib\site-packages\numpy\core\fromnumeric.py", line 2211, in amin
out=out, keepdims=keepdims)
File "c:\Python33\lib\site-packages\numpy\core\_methods.py", line 29, in _amin
return umr_minimum(a, axis, None, out, keepdims)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()


Again, this runs as I had expected when removing the reference to
X
.

I am using
Python 3.3.5
,
gdal 1.11.1
, and
numpy 1.9.1
on Win 10 x64.

Why might this be happening?

Answer

The issue you were running into was that the X variable from the calc input was colliding with the variable created by a loop in the doit function:

for X in range(0,nXBlocks):

It appears this has already been fixed by the gdal developers (in not yet released code). It now evals the calc input in a private namespace (a dictionary), rather than in the function's local namespace, so the X variable from the loop doesn't conflict with the X value from the input any more.