Thursday, October 28, 2010

Use multiple cores for shell scripts in Ubuntu

So you want to use all your CPU's cores for some shell based batch processing task. Seems there should already be an app for that, and there is, parallel, in the more-utils package in Ubuntu. But it's not that easy to use the target file argument in a shell script. (note: I think there may be more than one version of this utility, I'm referring to the one that ships with Ubuntu).

For example, I wanted to use all cores for this operation:

for i in svg/*.svg; do f=$(basename $i); \
  inkscape-devel --without-gui \
  --export-background=white \
  --export-ps ps/${f%%svg}ps $i; \
  echo $f; done

So parallel has an -i flag which enables replacement of {} with the target argument ($i in the above) but only if it's surrounded by spaces and not quoted, hardly convenient for scripting. This simple wrapper (saved in a file called task, made executable and placed somewhere on your $PATH) gets around that problem:

# helper for parallel
#
# usage: task 'shell-pattern' 'shell commands'

GLOB="$1"
shift
SCRIPT=$(mktemp)
echo "$@" >"$SCRIPT"
chmod +x "$SCRIPT"
parallel "$SCRIPT" -- $GLOB
rm "$SCRIPT"

So now you can use $1 (not $i) in your shell code without any complications. The above example becomes:

task 'svg/*.svg' 'f=$(basename $1); inkscape-devel \
--without-gui --export-background=white --export-ps \
ps/${f%%svg}ps $1; echo $f'

...and running on four cores it's much quicker :-)