| Summary: | Kernel SMP Issue CAN-2005-0001 (Vendor-Sec) | ||
|---|---|---|---|
| Product: | Gentoo Security | Reporter: | Sune Kloppenborg Jeppesen (RETIRED) <jaervosz> |
| Component: | Kernel | Assignee: | Gentoo Security <security> |
| Status: | RESOLVED DUPLICATE | ||
| Severity: | normal | ||
| Priority: | High | ||
| Version: | unspecified | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | CLASSIFIED | ||
| Package list: | Runtime testing required: | --- | |
Fix provided.
Description: Fix expand_stack() SMP race
Two threads sharing the same VMA can race in expand_stack, resulting in incorrect VMA
size accounting and possibly a "uncovered-by-VMA" pte leak.
Fix is to check if the stack has already been expanded after acquiring a lock which
guarantees exclusivity (page_table_lock in v2.4 and vma_anon lock in v2.6).
v2.4:
--- linux-2.4.28.orig/include/linux/mm.h 2005-01-07 09:12:48.000000000 -0200
+++ linux-2.4.28/include/linux/mm.h 2005-01-07 14:51:20.595060272 -0200
@@ -647,12 +647,19 @@
unsigned long grow;
/*
- * vma->vm_start/vm_end cannot change under us because the caller is required
- * to hold the mmap_sem in write mode. We need to get the spinlock only
- * before relocating the vma range ourself.
+ * vma->vm_start/vm_end cannot change under us because the caller
+ * is required to hold the mmap_sem in read mode. We need the
+ * page_table_lock lock to serialize against concurrent expand_stacks.
*/
address &= PAGE_MASK;
spin_lock(&vma->vm_mm->page_table_lock);
+
+ /* already expanded while we were spinning? */
+ if (vma->vm_start <= address) {
+ spin_unlock(&vma->vm_mm->page_table_lock);
+ return 0;
+ }
+
grow = (vma->vm_start - address) >> PAGE_SHIFT;
if (vma->vm_end - address > current->rlim[RLIMIT_STACK].rlim_cur ||
((vma->vm_mm->total_vm + grow) << PAGE_SHIFT) > current->rlim[RLIMIT_AS].rlim_cur) {
v2.6:
--- linux-2.6.10-mm1.orig/mm/mmap.c 2005-01-05 15:58:26.000000000 -0200
+++ linux-2.6.10-mm1/mm/mmap.c 2005-01-07 14:47:05.894780600 -0200
@@ -1373,6 +1373,13 @@
*/
address += 4 + PAGE_SIZE - 1;
address &= PAGE_MASK;
+
+ /* already expanded while waiting for anon_vma lock? */
+ if (vma->vm_end >= address) {
+ anon_vma_unlock(vma);
+ return 0;
+ }
+
grow = (address - vma->vm_end) >> PAGE_SHIFT;
/* Overcommit.. */
@@ -1432,6 +1439,12 @@
return -ENOMEM;
anon_vma_lock(vma);
+ /* already expanded while waiting for anon_vma lock? */
+ if (vma->vm_start <= address) {
+ anon_vma_unlock(vma);
+ return 0;
+ }
+
/*
* vma->vm_start/vm_end cannot change under us because the caller
* is required to hold the mmap_sem in read mode. We need the
_
Alternative RH fix (http://rhn.redhat.com/errata/RHBA-2004-550.html): + + /* check if another thread has already expanded the stack */ + if (address >= vma->vm_start) { + spin_unlock(&vma->vm_mm->page_table_lock); + vm_validate_enough("exiting expand_stack - NOTHING TO DO"); + return 0; + } + Disclosure is set to 20050112. This will be handled on a new bug as this one is CLASSIFIED and should _never_ be opened. |
I have found an exploitable flaw in the page fault handler, however only in the SMP case. The problem is this: [A] down_read(&mm->mmap_sem); vma = find_vma(mm, address); if (!vma) goto bad_area; if (vma->vm_start <= address) goto good_area; if (!(vma->vm_flags & VM_GROWSDOWN)) goto bad_area; if (error_code & 4) { /* * accessing the stack below %esp is always a bug. * The "+ 32" is there due to some instructions (like * pusha) doing post-decrement on the stack and that * doesn't show up until later.. */ if (address + 32 < regs->esp) goto bad_area; } if (expand_stack(vma, address)) [B] goto bad_area; an exploitable race scenario looks as follows: 1) one thread issues down_write on the sem (remap, madvise, ...) 2) two other threads faul below a VM_GROWSDOWN segment (note that they can fault anywhere below vm_start since esp is arbitrary) and sleep in [A] 3) first thread releases the sem and the two others run again, both find the same VMA but: thread1 ----------F1----------[ VMA ] thread2 ---------------F2-----[ VMA ] where F1/2 faul address. If timed carefully we get: thread1 expands stack to F1, installs pte1 thread2 expands stack to F2, installs pte2 resulting in pte1 not covered by the VMA. Techniques like in mremap_pte can be applied to further exploit this condition. Please do not argue that the race window is small - I have seen even smaller windows opening like a barn :-] The critical section is only from [A] to [B], we do not care about timings of handle_mm_fault etc, since the VMA is later consulted only for page flags. Note that this also races with ptrace/proc etc (everything using access_process_vm/get_user_pages).