m.devrees m.devrees - 6 months ago 25
Linux Question

How to batch resize millions of images to fit a max width and height?

The situation



I'm looking for a way to batch-resize approximately 15 million images of different file types to fit a certain bounding box resolution (in this case the image(s) cannot be bigger than 1024*1024), without distorting the image and thus retaining the correct aspect ratio. All files are currently located on a Linux server on which I have sudo access, so if I need to install anything, I'm good to go.

Things I've tried



After dabbling around with some tools under Windows (Adobe Photoshop and other tools) I am no longer willing to run this on my own machine as this renders it virtually unusable when rendering. Considering the size of this job, I'm really looking for some command-line magic to directly run it on Linux, but my endeavors with ImageMagick so far haven't given me anything to work with as I'm getting nothing but errors.
To be honest, ImageMagick's documentation could use some work... or someone should put in the effort to make a good web-interface to create one of these mythical image convertion one-liners.

Required output format



I need the images to be resized to the same filename and of a format which will fit inside a certain maximum dimension, for example 1024*1024, meaning:


  • a JPG of 2048*1024 becomes a JPG of 1024*512 at 75% quality

  • a PNG of 1024*2048 becomes a PNG of 512*1024



The resulting image should contain no additional transparent pixels to fill up the remaining pixels; I'm just looking for a way to convert the images to a limited resolution.

Thanks for any help!

Answer

The best way I found to convert millions of images like these is by creating a simple bash script which starts converting all the images it finds, like the one listed below:

To edit this bash script, I use nano if you don't have nano: "apt-get install nano" for Ubuntu/Debian or "yum install nano" for CentOS/CloudLinux.. for other distributions: use Google) but you're free to use any editor you want.

Bash script

First, create the bash script by starting your favorite editor (mine's nano):

nano -w ~/imgconv.sh

Then fill it with this content:

#!/bin/bash
find ./ -type f -iname "*.jpeg" -exec mogrify -verbose -format jpeg -layers Dispose -resize 1024\>x1024\> -quality 75% {} \;
find ./ -type f -iname "*.jpg" -exec mogrify -verbose -format jpg -layers Dispose -resize 1024\>x1024\> -quality 75% {} \;
find ./ -type f -iname "*.png" -exec mogrify -verbose -format png -alpha on -layers Dispose -resize 1024\>x1024\> {} \;

Then all you need to do is make it executable with chmod +x ~/imgconv.sh and run it from the main images directory where you want to resize the images in all subdirectories:

cd /var/www/webshop.example.com/public_html/media/
~/imgconv.sh

That should start the conversion process.

Explanation

The way the script works is that it uses find to find the file with extension .jpeg of any capitalization and then runs a command:

find ./ -type f -iname "*.jpeg" -exec <COMMAND> {} \;

.. and then execute the appropriate convert job using the "-exec {} \;" parameter:

mogrify -verbose -format jpeg -layers Dispose -resize 1024\>x1024\> -quality 75% <### the filename goes here, in this case *.jpeg ###>

If you're working with files older than today and you want to prevent re-doing files you've already converted today, you could even tell the 'find' command only convert the files older than today by adding the option -mtime +1 like so:

#!/bin/bash
find ./ -type f -mtime +1 -iname "*.jpeg" -exec mogrify -verbose -format jpeg -layers Dispose -resize 1024\>x1024\> -quality 75% {} \;
find ./ -type f -mtime +1 -iname "*.jpg" -exec mogrify -verbose -format jpg -layers Dispose -resize 1024\>x1024\> -quality 75% {} \;
find ./ -type f -mtime +1 -iname "*.png" -exec mogrify -verbose -format png -alpha on -layers Dispose -resize 1024\>x1024\> {} \;

Performance

A really simple way to use more cores to perform this process is to fork each job to the background by adding a & after each line. Another way would be to use GNU Parallel, especially with the -X parameter as it will use all your CPU cores and get the job done many times quicker.

But no matter what kind of parallelization technique you'll be using, be sure only to do that on your own system and not on a shared disk system where your production platform resides, since going for maximum performance will bog down your hardware or hypervisor performance.

This job is going to take a while, so be sure to set up a screen or a terminal without timeout/noop packets beforehand. On my system, it churned through about 5000 files per minute, so the entire job should take less than ~50-60 hours... sounds like a find job to run over the weekend.

Just be sure to separate all file extensions from each other by writing separate commands; Piling all options on top of each other and having 'mogrify' using all options for all image formats won't work.

ImageMagick is a powerful tool in the right hands.