From: oreillym@tartarus.uwa.edu.au (Michael O'Reilly) Subject: Re: Better FS's (was Re: New .96c , disappointing... : ( ) Date: Tue, 7 Jul 1992 13:34:27 GMT
drew@ophelia.cs.colorado.edu (Drew Eckhardt) writes:
: In article <1992Jul6.113801.101@uniwa.uwa.edu.au> oreillym@tartarus.uwa.edu.au (Michael O'Reilly) writes:
: [Lots of my stuff deleted. ]
: I've been playing with this (meaning it works, is too kludgy and
: being changed) on my system at home, and your implementation sugestion
: is poor.
:
: Managing buffer cache pages in the same way as the rest of the
: pages is a Bad Idea (tm), for several reasons.
:
: 1. When you find a page, you will also have to look at the
: buffer structures to see if a buffer in it is locked -
: you can't discard a locked buffer.
:
: 2. This breaks the LRU replacement mechanism used for the buffer cache,
: and you stand a good chance of discarding a recently
: accessed buffer, causing it to be reread from disk.
Hmm. Maybe I wasn't clear. What I meant was that when get_free_page() is
called, the memory manager runs down it's list of places to get pages
from. I.e. it first looks for a free page. Failing that it
calls shrink_buffers(), failing that it,
calls swap_out_page(), and failing that it dies. ;)
Seriously, I don't think swap_out_page should fail.
: Currently, Linux does not use LRU for pages. Instead,
: it walks through the user process page tables,
: and pages the first one it sees to disk.
You are right. A better scheme is needed. MAYBE something along the
lines of a 'working set'. I don't know that strict LRU for pages is
worth the overhead. (I am not sure what the overhead is!! :)
: I changed swap_out() to call shrink_buffers() before swapping
: pages, where shrink_buffers() uses an LRU algorithm as it should.
: This would be easier if the BLOCK_SIZE was 4096 bytes, same
: as page size.
This is what I am implementing now. I wouldn't mind seeing the code you
have because 4K buffers still need to work with the 1K minix file
system... :(
: Otherwise, you free the four least recently
: used buffers, and consolodate the remaining buffers from
: those pages into pages with empty slots.
:
: The memcpy() of 3k is fairly cheap..
True enough. Depends on how often it is done, which shouldn't be very
often at all.
: There are a few parameters that control buffer allocation :
: 1. minimum, maximum number of buffers
: 2. prefer_new - if num_free_pages <= prefer_new,
: then the oldest buffer is replaced, rather
: than allocating a new one from VM.
:
: Currently, these are set by a modified buffer_init()
: routine called from main.c in the startup code,
: but I plan to add a syscall to set these.
:
:
: We should look at implementing a decent LRU algorithm for linux
: paging - although I'm no VM expert
:
: There are two bits in the page table / directory entries that
: are for "user" use, a counter can be placed in there.
:
: When you swap_out(), and can't shrink buffer cache, you should
: probably update a timeout this field based on the accessed
: bit, in all present directory entries, decrementing by 1
: each time accessed is cleared, incrememnting when accessed
: is set, clearing accessed.
:
: I don't know how efficient it would be to extend this to
: the individual page table entries, but you should definately
: get something, because pages in idle tasks would be swapped
: before pages in present tasks.
Why not just look in process that haven't been scheduled recently? Would
this be cheaper to keep track off?
This doesn't help where you have N (N > 1) process all CPU intensive,
with some of them running out of memory. I think that something along
the lines of a 'working set' would be nice. I.e. you measure the number
of page fault per second, and if it's too high , you give that process a
larger share of memory, takeing it from the process that has the lowest
number of page faults.
This would lead to all processes haveing a fairly equal number of page
faults . The question is weather the total number of page fault per unit
time would be lower than before....
: Also, it might be nice to swap (swap != page) tasks that are
: idle > X seconds, like BSD does. This would free up
: core for buffers.
I think SunOS does something like this but is done in a slightly odd
way. SunOS (from memory here. take with a grain of salt) swaps any
process that hasn't been scheduled in the last 20 seconds, but doesn't
swap it to disk as such, it just marks all the pages as being free. If
something needs the page, THEN it gets swapped, but because the page is
marked free, these pages be used in preference to anything else bar
truely free pages. I think. ;)
: [Stuff about my mistake in kernel alloc()ing deleted]
: >re framented blocks:
: >The bitmap of free disk space is stored with a 512 byte granuality, so
: >files smaller than 4K get a fragment (Need an extra 3 bit in the inode
: >to implement this. ;). Yes, you do read the entire block into memory,
: >all 4K of it, even if you only want 1/2 a K. This make nillo difference
: >to the performance, and , given that the filesystem should strive from
: >some locality, is just the same as read-ahead.
: >
:
: This is a good idea - you get the advantages of page granularity
: block size (ie, simpler buffer cache management, larger reads from
: the block device and less overhead), without the penalty in
: disk usage.
: [lots more stuff deleted that wasn't commented on ]
Michael