Thursday, October 28, 2010

Use multiple cores for shell scripts in Ubuntu

So you want to use all your CPU's cores for some shell based batch processing task. Seems there should already be an app for that, and there is, parallel, in the more-utils package in Ubuntu. But it's not that easy to use the target file argument in a shell script. (note: I think there may be more than one version of this utility, I'm referring to the one that ships with Ubuntu).

For example, I wanted to use all cores for this operation:

for i in svg/*.svg; do f=$(basename $i); \
  inkscape-devel --without-gui \
  --export-background=white \
  --export-ps ps/${f%%svg}ps $i; \
  echo $f; done

So parallel has an -i flag which enables replacement of {} with the target argument ($i in the above) but only if it's surrounded by spaces and not quoted, hardly convenient for scripting. This simple wrapper (saved in a file called task, made executable and placed somewhere on your $PATH) gets around that problem:

# helper for parallel
#
# usage: task 'shell-pattern' 'shell commands'

GLOB="$1"
shift
SCRIPT=$(mktemp)
echo "$@" >"$SCRIPT"
chmod +x "$SCRIPT"
parallel "$SCRIPT" -- $GLOB
rm "$SCRIPT"

So now you can use $1 (not $i) in your shell code without any complications. The above example becomes:

task 'svg/*.svg' 'f=$(basename $1); inkscape-devel \
--without-gui --export-background=white --export-ps \
ps/${f%%svg}ps $1; echo $f'

...and running on four cores it's much quicker :-)

2 comments:

  1. With GNU Parallel you can do it without the helper script:

    parallel inkscape-devel --without-gui --export-background=white --export-ps ps/{.}ps {}\; echo {} ::: svg/*.svg

    Watch the introvideo to learn more: http://www.youtube.com/watch?v=OpaiGYxkSuQ

    ReplyDelete
  2. Thanks tange. Since the original post I've come to think xargs is also a good way of using multi-cores in Ubuntu. And when it comes to running inkscape in automation scripts, you really need to use inkscape's --shell flag which allows processing of multiple files without running inkscape more than once. Which could still be paralleled.

    ReplyDelete