I purchased a virtual server that had 8 vCPUs, 16G memory, and a 500G ssd volume (which is backed by ceph rbd). Then I used fio to test the server's IO performance. To better understanding the fio results, during the test, I also used blktrace to capture the block layer IO trace.
seqwriete
fio --filename=/dev/vdc --ioengine=libaio --bs=4k --rw=write --size=8G --iodepth=64 --numjobs=8 --direct=1 --runtime=960 --name=seqwrite --group_reporting
fio output for seqwrite parsed blktrace output for seqwrite
randread
fio --filename=/dev/vdc --ioengine=libaio --bs=4k --rw=randread --size=8G --iodepth=64 --numjobs=8 --direct=1 --runtime=960 --name=randread --group_reporting
fio output for randread parsed blktrace output for randread
What I am trying to understand is the difference at block layer between seqwrite and randread.
- why does randread have large portion of I2D but seqwrite does not?
- why doesn't randread have Q2M?
Did you realise each of your 8 numjobs is overwriting the same area as the other numjobs? This means the block layer may be able to throw subsequent requests away if an overwrite for the same region comes in close enough (which is somewhat likely in the sequential case)...
It's hard to back merge random I/O with existing queued I/O as it's often discontiguous!