We use the setpriority() system call with no modification to set the number of tickets that a task holds.
We've left most of the code in the linux scheduler unmodified. Jobs such as handling the bottom half, handling interrupt logic, handling timers, etc. need no modification. Most of our changes are in the piece of code that is responsible for choosing the next task to run from the list of runnable tasks. This is now handled via the simplest possible implemenation of lottery scheduling.
First we run through the list of runnable tasks, and grant them all compensation tickets according to how much of their quantum they used last time they ran. Once the ticket counts have been updated, we sum the total number of tickets held by all runnable tasks. We then choose a random number between 1 and the total number of tickets. Finally, we again walk down the list of runnable tasks, subtracting the number of tickets held by the current task from our random number at each step. We select that task that takes us to zero or below.
The mechanism described above is sufficient for choosing tasks to run. However one other small modification is required in order to be sure that the scheduler is run at all. In the standard linux scheduler, the scheduling routine is only called when the task_struct.need_resched field of the currently running task is set to 1. This most commonly happens in the update_process_times() routine which is called every 10ms by the timer interrupt. update_process_times() normally only resets the need_resched when the counter for the current task has dropped below zero. We have modified this code so that the field is set every time the timer goes off. This guarantees that the scheduler will be called at least once every 10ms.
The end result of this modification is that our lottery scheduler causes many more context switches than the standard linux scheduler. (Which frequently waits as long as 200 ms before rescheduling.) This has some negative impact on throughput, but it increases system responsiveness.