1. #65楼:From my tests it seems that the problem in the Xen paravirt spinlock implementation is the fact that they re-enable interrupts (xen upcall event channel for that vcpu) during the hypercall to poll for the spinlock irq.
3. #79楼:After finally having a breakthrough in understanding the source of the lockup and further discussions upstream, the proper turns out to be to change the way waiters are woken when a spinlock gets freed.
4. #86楼:There is currently a Precise kernel in proposed that will contain the first approach on fixing this (which is not to enable interrupts during the hv call). This should get replaced by the upstream fix (which is to wake up all spinners and not only the first found).
bug发生过程分析
来自Patchwork [25/58] xen: Send spinlock IPI to all waiters:
1. CPU n tries to schedule task x away and goes into a slow wait for the runq lock of CPU n-# (must be one with a lower number).
2. CPU n-#, while processing softirqs, tries to balance domains and goes into a slow wait for its own runq lock (for updating some records). Since this is a spin_lock_irqsave in softirq context, interrupts will be re-enabled for the duration of the poll_irq hypercall used by Xen.
3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives an interrupt (e.g. endio) and when processing the interrupt, tries to wake up task x. But that is in schedule and still on_cpu, so try_to_wake_up goes into a tight loop.
4. The runq lock of CPU n-# gets unlocked, but the message only gets sent to the first waiter, which is CPU n-# and that is busily stuck.
5. CPU n-# never returns from the nested interruption to take and release the lock because the scheduler uses a busy wait. And CPU n never finishes the task migration because the unlock notification only went to CPU n-#.