The program user/kalloctest stresses xv6’s memory allocator: three processes grow and shrink their address spaces, resulting in many calls to kalloc and kfree. kalloc and kfree obtain kmem.lock. kalloctest prints (as “#fetch-and-add”) the number of loop iterations in acquire due to attempts to acquire a lock that another core already holds, for the kmem lock and a few other locks. The number of loop iterations in acquire is a rough measure of lock contention.
Some hints
You can use the constant NCPU from kernel/param.h;
Let freerange give all free memory to the CPU running freerange;
The function cpuid returns the current core number, but it’s only safe to call it and use its result when interrupts are turned off. You should use push_off() and pop_off() to turn interrupts off and on.
Have a look at the snprintf function in kernel/sprintf.c for string formatting ideas. It is OK to just name all locks “kmem” though.
Answer
这个实验比较简单,目标就是尽量避免这样一种情况:当线程 A 拿到锁时,B 尽量不要再去尝试获取锁。因此,我们要重新设计(改进)原来的内存分配机制,为每一个CPU 都创建一个内存链表,这样每个核心都不会竞争别的 CPU 的内存链表,它们有自己的链表。
但是我们不得不承认,当 CPU0 的内存链表为空时,它无法为跑在本 CPU0 上的线程分配内存。因此,它必须去和其它的 CPU 去沟通,让其它的 CPU 给它分配一些内存。
r = kmem[cpu].freelist; if(r) kmem[cpu].freelist = r->next; release(&kmem[cpu].lock); // 释放锁 if(r) memset((char*)r, 5, PGSIZE); // fill with junk return (void*)r; }
至此,这个实验就完成了:
1 2 3 4 5 6 7 8 9
== Test running kalloctest == $ make qemu-gdb (186.2s) == Test kalloctest: test1 == kalloctest: test1: OK == Test kalloctest: test2 == kalloctest: test2: OK == Test kalloctest: test3 == kalloctest: test3: OK
== Test running kalloctest == $ make qemu-gdb (195.1s) == Test kalloctest: test1 == kalloctest: test1: OK == Test kalloctest: test2 == kalloctest: test2: OK == Test kalloctest: test3 == kalloctest: test3: OK
二、Buffer cache (hard)
Question requirements
If multiple processes use the file system intensively, they will likely contend for bcache.lock, which protects the disk block cache in kernel/bio.c. bcachetest creates several processes that repeatedly read different files in order to generate contention on bcache.lock.
You will likely see different output, but the number of acquire loop iterations for the bcache lock will be high. If you look at the code in kernel/bio.c, you’ll see that bcache.lock protects the list of cached block buffers, the reference count (b->refcnt) in each block buffer, and the identities of the cached blocks (b->dev and b->blockno).
Modify the block cache so that the number of acquire loop iterations for all locks in the bcache is close to zero when running bcachetest. Ideally the sum of the counts for all locks involved in the block cache should be zero, but it’s OK if the sum is less than 500. Modify bget and brelse so that concurrent lookups and releases for different blocks that are in the bcache are unlikely to conflict on locks (e.g., don’t all have to wait for bcache.lock). You must maintain the invariant that at most one copy of each block is cached.
Some hints
It is OK to use a fixed number of buckets and not resize the hash table dynamically. Use a prime number of buckets (e.g., 13) to reduce the likelihood of hashing conflicts.
Searching in the hash table for a buffer and allocating an entry for that buffer when the buffer is not found must be atomic.
Remove the list of all buffers (bcache.head etc.) and instead time-stamp buffers using the time of their last use (i.e., using ticks in kernel/trap.c). With this change brelse doesn’t need to acquire the bcache lock, and bget can select the least-recently used block based on the time-stamps.
It is OK to serialize eviction in bget (i.e., the part of bget that selects a buffer to re-use when a lookup misses in the cache).
Your solution might need to hold two locks in some cases; for example, during eviction you may need to hold the bcache lock and a lock per bucket. Make sure you avoid deadlock.
When replacing a block, you might move a struct buf from one bucket to another bucket, because the new block hashes to a different bucket. You might have a tricky case: the new block might hash to the same bucket as the old block. Make sure you avoid deadlock in that case.
Some debugging tips: implement bucket locks but leave the global bcache.lock acquire/release at the beginning/end of bget to serialize the code. Once you are sure it is correct without race conditions, remove the global locks and deal with concurrency issues. You can also run make CPUS=1 qemu to test with one core.
// Linked list of all buffers, through prev/next. // Sorted by how recently the buffer was used. // head.next is most recent, head.prev is least. structbufhead; } bcache;
// Look through buffer cache for block on device dev. // If not found, allocate a buffer. // In either case, return locked buffer. staticstruct buf* bget(uint dev, uint blockno) { structbuf *b;
acquire(&bcache.lock);
// Is the block already cached? for(b = bcache.head.next; b != &bcache.head; b = b->next){ if(b->dev == dev && b->blockno == blockno){ b->refcnt++; release(&bcache.lock); acquiresleep(&b->lock); return b; } }
// Not cached. // Recycle the least recently used (LRU) unused buffer. for(b = bcache.head.prev; b != &bcache.head; b = b->prev){ if(b->refcnt == 0) { b->dev = dev; b->blockno = blockno; b->valid = 0; b->refcnt = 1; release(&bcache.lock); acquiresleep(&b->lock); return b; } } panic("bget: no buffers"); }
// Release a locked buffer. // Move to the head of the most-recently-used list. void brelse(struct buf *b) { if(!holdingsleep(&b->lock)) panic("brelse");
releasesleep(&b->lock);
acquire(&bcache.lock); b->refcnt--; if (b->refcnt == 0) { // no one is waiting for it. b->next->prev = b->prev; b->prev->next = b->next; b->next = bcache.head.next; b->prev = &bcache.head; bcache.head.next->prev = b; bcache.head.next = b; }
// Buffer cache. // // The buffer cache is a linked list of buf structures holding // cached copies of disk block contents. Caching disk blocks // in memory reduces the number of disk reads and also provides // a synchronization point for disk blocks used by multiple processes. // // Interface: // * To get a buffer for a particular disk block, call bread. // * After changing buffer data, call bwrite to write it to disk. // * When done with the buffer, call brelse. // * Do not use the buffer after calling brelse. // * Only one process at a time can use a buffer, // so do not keep them longer than necessary.
// bucket number for bufmap #define NBUFMAP_BUCKET 13 // hash function for bufmap #define BUFMAP_HASH(dev, blockno) ((((dev)<<27)|(blockno))%NBUFMAP_BUCKET)
// Look through buffer cache for block on device dev. // If not found, allocate a buffer. // In either case, return locked buffer. staticstruct buf* bget(uint dev, uint blockno) { structbuf *b; uint key = BUFMAP_HASH(dev,blockno); acquire(&bcache.bufmap_locks[key]);
// Write b's contents to disk. Must be locked. void bwrite(struct buf *b) { if(!holdingsleep(&b->lock)) panic("bwrite"); virtio_disk_rw(b, 1); }
// Release a locked buffer. // Move to the head of the most-recently-used list. void brelse(struct buf *b) { if(!holdingsleep(&b->lock)) panic("brelse");
== Test running kalloctest == $ make qemu-gdb (195.1s) == Test kalloctest: test1 == kalloctest: test1: OK == Test kalloctest: test2 == kalloctest: test2: OK == Test kalloctest: test3 == kalloctest: test3: OK == Test kalloctest: sbrkmuch == $ make qemu-gdb kalloctest: sbrkmuch: OK (29.8s) == Test running bcachetest == $ make qemu-gdb (61.1s) == Test bcachetest: test0 == bcachetest: test0: OK == Test bcachetest: test1 == bcachetest: test1: OK == Test usertests == $ make qemu-gdb usertests: OK (161.1s) == Test time == time: OK Score: 80/80
三、总结
don’t share if you don’t have to
start with a few coarse-grained locks
instrument your code – which locks are preventing parallelism?
use fine-grained locks only as needed for parallel performance