From: eric@tantalus.nrl.navy.mil (Eric Youngdale) Subject: Re: SCSI Performance (Yet Again) Date: Mon, 23 Aug 1993 20:19:17 GMT
In article <PCG.93Aug22210201@decb.aber.ac.uk> pcg@aber.ac.uk (Piercarlo Grandi) writes:
>Actually one would want to have an idea of how good each of these
>components is *in isolation*;
>
>1) the generic SCSI code
>
>2) the HA specific SCSI code
>
>3) the disk SCSI code
>
>4) the filesystem code
>
>5) the cache handling code.
>
A number of people have been commenting on the iozone benchmark, and it
appears as if people are trying to benchmark a number of things simultaneously,
as has been pointed out by the above poster.
First of all, it has been observed that the numbers for writing are
much larger than those for reading, and this has been attributed to the buffer
cache not being completely flushed to disk when reading starts (inflating the
write numbers and depressing the read numbers). I feel that this is a correct
explanation and I would like to point out that in pl12 there is now an ioctl to
flush and invalidate the buffer cache for a specific device. It is not quite
fsync(), but if you control the conditions of the testing it amounts to the
same thing. If someone wants to play with this, feel free. I suspect that the
numbers for read and write will be much closer if you were to try this.
The ioctl was actually added for a different purpose - for writing patterns to
a disk and then reading them back in search of bad sectors. This could have
been accomplished by implementing raw devices, but for what we wanted this was
much faster, and in fact it would be faster for locating bad sectors as well.
Next, I would like to make sure that everyone knows that it is
recommended that the size of the file that you use with iozone should be at
*least* 2.5 times the size of the buffer cache for the system, basically to
ensure that the reads are not in fact being satisfied by the buffer cache.
Then I would like to point out that filesystem fragmentation and free
space fragmentation can be a very important component of the iozone benchmark
results. If you want to measure the raw speed of the scsi subsystem, then
start with a fresh partition, or run iozone directly on a naked partition (i.e.
/dev/sda1 - this will wipe out the contents of the partition). If you run
iozone on a active partition that you have used for some time, you will
probably get a more accurate indication of what the real-life throughput is,
but you are measuring something that is more of a composite number.
Next, you should be aware that performance can be somewhat better by
using a larger block size with iozone. By doing this, you tend to reduce the
number of actual syscalls. (As a pathological example, consider what would
happen if you were to use a block size of 1 - even on a ramdisk).
Finally, you should be aware that you can get a better performance by
using filesystems with larger allocation blocksizes. I have only had limited
success with this - there is apparently some bug in ext2 which cause the
performance to go down quite significantly when used on filesystems with
allocation block sizes != 1024. One of these days I will dig in and see where
the bottleneck is. Nonetheless, I have measured performance using a naked
partition (bypassing the filesystem), and I have gotten numbers in excess of
1.0Mb/sec with a 1542B.
-Eric
In closing, I give you some numbers that we got last fall when testing some new
scsi code (which is all present in the kernel now). These were all on a
*fresh* minix partition - I would expect that ext2 or xia would give comparable
numbers. I can think of no reason why the numbers would be much different now
than they were last fall.
>iozone gave good results. I tried three block lengths with a 40 MB file and
>the results were:
>
>block w r
> 1024 800 kB/s 650 kB/s
> 4096 700 kB/s 750 kB/s
>16384 800 kB/s 800 kB/s
-- "When Gregor Samsa woke up one morning from unsettling dreams, he found himself changed in his bed into a lawyer."