Download multiple files from different machines in parallel

This article is related to my previous post “Run commands simultaneously on different servers”. Once I figured out how to run the command on the different machines, I want to aggregate my load testing tool results to get an aggregated report.

I use two standard Linux command-line utilities, parallel and SCP, to solve my problem.

First, you create a file with all the “commands” you want to execute in parallel like

scp [email protected]:"/root/results.*.bin" .
scp [email protected]:"/root/results.*.bin" .
scp [email protected]:"/root/results.*.bin" .
scp [email protected]:"/root/results.*.bin" .
scp [email protected]:"/root/results.*.bin" .
cp "/root/results.loadtesting.bin" /root/reports/

Next, you run the parallel command and pipe the file with the command instructions into it.

parallel -j 6 < jobs

With the -j flag, you configure the number of jobs ran in parallel. I tend to configure it equally to the number of commands. If you change those, often you can use wc -l and awk to configure it dynamically for you, like

parallel -j "$(wc -l jobs | awk '{print $1}')" < jobs

“Faster” downloads

Initially, I downloaded the files to my local workstation to aggregate the files, but this process is relatively slow. Therefore I recommend using one of the load testing instances inside your cloud/datacenter to speed up collecting the files. On some occasions, the files were several hundred MB’s in size, so you want to leverage the network speeds within your cloud and eliminate waiting on downloading them to your local machine.

Make sure you enable forwarding of the authentication agent connection using the -A ssh option when you ssh to the instance you want to aggregate the files on. Forwarding the authentication agent makes it possible for the SCP sessions you start using parallel to authenticate.

Ps: when you aggregate the files from one of the load generating instances, you might want to replace a scp instruction with an mv command instead. The parallel command isn’t restricted to the SCP command. ๐Ÿ˜‰

Even though I use SCP to accomplish the tasks I read this week, SCP is considered deprecated ๐Ÿ˜… , so I recommend using rsync or sftp instead. You can read more about the details on LWN.net.