Minor Note on ‘dd’ Write Performance

DavidSchmitt

5.00/5 (1 vote)

Nov 20, 2009

CC (ASA 2.5)

2 min read

6342

This is a minor note on 'dd' write performance

Today, I was cleaning out some old logical volumes. Since they resided on rented harddisks, I chose to overwrite them with zeroes to avoid leaving data tracks on someone else’s disks. The first thing that came to my mind was this:

dd if=/dev/zero of=/dev/vg/lv

Since I had ten logical volumes, I also ran ten instances of dd in parallel. They were on a RAID and I was decommissioning the server, so I didn’t really care about performance. Speaking of which, I like to spy on running processes, call my a techno-voyeur if you want!

Anyways, vmstat was telling me that the system was chugging along nicely, reading(sic!) and writing approximately 16MB/s each. Something was clearly wrong. Note the “bi” and “bo” columns, denoting kB/s read and written:

david@hetz:~$ vmstat 10
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
[...]
 0 12   6416 270676 1601220   9628    0    0 16628 16553 4481 7638  1 21 26 52

I had forgotten to specify the block size which dd should use to make the transfer. And if the program is writing the zeroes one by one into the target file, the destination has to be read before it is modified, since the kernel cannot (and should not!) guess that the rest of the block (which is all we’re talking about at this stage) will be overwritten too.

To test my hypotheses, I restarted the processes and told dd to use nice 1MB sized blocks. I had ten threads to keep the hardware busy, so that shouldn’t make any problems.

dd if=/dev/zero of=/dev/vg/lv bs=1M

Really, in this configuration, the system stabilized around a nice 55MB/s writes. The kernel was able to recognize that the 1MB sized writes would cover complete blocks and that their content would be overwritten. No need to load them beforehand:

david@hetz:~$ vmstat 10
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
[...]
 0 12   6416  16308 1749688   5312    0    0     0 55644  551 1576  0 23  1 76

While I was waiting for the last process to finish, I noticed that throughput had risen to 60MB/s:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
[...]
 1  3   6416  15336 1750492   6088    0    0      4 62892  545  232  0 28 32 39

To summarize: having only one process running is ~10% faster than having ten processes running and using a non-trivial blocksize is more than three times faster than specifying none at all.