From: Eric Youngdale (eric@tantalus.nrl.navy.mil)
Date: 11/30/92


From: eric@tantalus.nrl.navy.mil (Eric Youngdale)
Subject: Re: Why are hard disk reads so much slower than writes? 
Date: Tue, 1 Dec 1992 04:38:20 GMT

In article <ByJysC.ILM@news.cso.uiuc.edu> jy10033@ehsn11.cen.uiuc.edu (Joshua M Yelon) writes:
>Juan writes:
>
>>>I have run iozone v. 1.16, a disk benchmark, on my system; I have
>>>a 386SX25 CPU, 4MB RAM, and two 65MB RLL hard disk drives. When
>>>(as the documentation suggests) I use a test file larger than the
>>>cache, iozone telss me that the effective writing speed is approxi-
>>>mately 315,000 Bytes per second, and the reading speed approximately
>>>53,000 Bytes per second.

[...]

>I would imagine that if iozone were performing lots of "read(fd, buf, 8192)"
>calls, then it would effectively cause this behavior:
>
> * iozone calls read(fd, buf, 8192)
> * kernel context-switches in.
> * kernel fetches a disk page (8192) bytes from disk into cache.
> * kernel copies 8192 bytes from cache into user buffer.
> * iozone context switches back in.

        This is a pretty good description.

> * iozone calls read(fd, buf, 4096).
> * kernel context-switches in.
> * kernel fetches a disk page (8192) bytes from disk into cache.
> * kernel copies 4096 bytes from cache into user buffer.
> * iozone context switches back in.
> * iozone calls read(fd, buf, 4096).
> * kernel context-switches in.
> * kernel copies 4096 bytes (already in cache) into user buffer.
> * iozone context switches back in.

        This is also a good description.

>In other words, the smaller block size results in more context-switch
>overhead, but not more disk IO. That wouldn't slow things down by an
>immense factor, unless interleave came into play: if the second disk
>read were even a moment too late, the disk would have to "go around"
>again.

        It is not just the context switching - there is more to it than that.
With scsi disks, there is a lot of time spent in arbitration and so forth. If
we need to read 8192 bytes, then we can request 16 sectors in 1 I/O operation.
If we request 8192 bytes in 512 byte chunks, we make 16 requests to get the 16
sectors. In 0.99 the kernel will sense if you are reading sequentially, and
request sectors that it does not need immediately, so that when the next read
comes along that we have the data on hand.

        I am not as sure how the overhead works with an IDE disk, but I know
that making individual requests for 1024 byte chunks of a file on a scsi disk
can lead to excruciatingly slow I/O. Just by bunching the reads into larger
requests people have seen factors of 5 improvement in iozone results.

>Although neither Juan or Eric said it explicitly, I suppose that the
>read/write speed difference is supposed to be caused by the write-back
>cache's tendency to cluster writes, much like read-ahead has the
>effect of clustering reads? Right?

        Correct. Usually when we write, we just copy the data into the buffer
and mark the block as being dirty. It is only when we sync the disk that the
data is actually written back. In this case, we can bunch many more writes
because we do not need to do anything after each block is written back. In the
case of a read, we need to keep track of which blocks we have requested, and
copy the data out of the buffer cache once the disk read is complete. The fact
that we have to keep track of this will tend to limit the size of an individual
read somewhat.

-Eric

-- 
Eric Youngdale