OS: Kernel Scheduler

Scheduler Tuning

Kernel multitasking (Linux) memungkinkan lebih dari satu proses untuk berada pada satu saat dan setiap proses di mungkinkan untuk jalan seperti dia satu-satunya yang ada pada sistem. Banyak thread dari banyak proses seperti berjalan pada saat yang sama. Scheduler memungkinkan ini terjadi dan scheduler berjalan dalam thread dan dibangunkan oleh interupsi waktu, kernel thread lain, atau system call.

kernel/sched.c : di kernel source tree dapat di mengatur nilai ini untuk men-tune scheduler.

MIN_TIMESLICE (minimum timeslice yang akan di peroleh sebuah task)
MAX_TIMESLICE (maximun timeslice yang akan di peroleh sebuah task)
Nilai rata-rata timeslice di tentukan dengan cara me-rata-rata nilai MIN & MAX. Menaikan nilai MIN dan MAx akan menaikan panjang timeslice secara umum. Menaikan panjang TimeSlice akan menaikan EFFICIENCY karena akan lebih context switch (BAGUS untuk HPC dan server systems)
PRIO_BONUS_RATIO : adalah pertengahan dari total range prioritas dimana sebuah task dapat menerima sebagai bonus atau hukuman dalam kalkulasi prioritas dinamis. (Nilai default adalah 25). Jika nilai tinggi, menggunakan nice() untuk menset prioritas statik menjadi kurang effektif sedangkan jika nolai rendah, menset prioritas statik lebih effektif.
MAX_SLEEP_AVG : semakin besar nilainya, semakin lama sebuah task harus sleep sebelum diperhitungkan untuk di aktifkan. Meningkatkan nilai ini akan melukai interaktifitas, akan tetapi untuk beban non-interaktif, kesamaan antar task lebih di sukai.
STARVATION_LIMIT : Mengurangi nilai ini akan melukai interaktifitas karena task akan lebih sering memrelakan CPU time, dan menaikan nilai ini akan menaikan performance interaktif dengan mengorbankan task non-interaktif.

Schedulers : menjaga kebijakan scheduling thread, termasuk kapan, untuk berapa lama, dan untuk beberapa kasus dimana (untuk kernel SMP dimana menjadi penting) sebuah thread akan di jalankan.

SMP Scheduling (scheduling pada beberapa CPU)
SMT Scheduling (Hyper Threading atau Symetric Mult-Thread scheduling)
NUMA Scheduling(Non Uniform Memory Access yang artinya scheduler dapat berjalan di image single system yang tersebar pada beberapa node atau mesin secara fisik. (satu instance kernel pada beberapa mesin secara fisik , HPC) There is aconcept of a scheduler domain that contains all CPU's in the system. CPU's are divided into groups.
Soft Real Time Scheduling : schedule tasks that have strict timing requirements. Real Time tasks are assigned special scheduling modes and the scheduler gives them priority over another task on the system. RT scheduling modes include FIFO (first in first out) and Round Robin. This effectively ignores non RT tasks on the system.

Context Switches : is the process of switching from one thread of executuion to another.

Source Code Organization

arch/ : Architecture specific code
include/ :Header files
kernel/ : Main Kernel Code (non architecture specific)
mm/ : Kernel's memory management code

Programs & Processes

Program : A combination of instructions and data put together to perform a task when executed.
Process : An instance of a program. (An abstraction to embody the state of a program in execution) A process is simply a group of threads that share something called a thread group id (TGID).
Threads : A process can have multiple threads of execution that work together to accomplish its goals. Only 1 thread can be executed on a CPU at any given time.All threads are simply processes.
CPU Bound Threads : threads that spend alot of time doing cpu computations. (ie HPC or alot of number crunching)
I/O Bound Threads : threads that spend time waiting on slow I/O (ie read from disk)

Scheduling Goals

Efficiency : it must try to allow as much real work as possible to be done while staying within the restraints of other requirements. (ie Context Switching is expensive and allowing processes to run for longer periods of time increases efficency) Efficency suffers for the sake of other goals such as interactivity, because interactivity typicaly imples more context switches.
Interactivity : An example of interactivity is a mouse click or a keystroke. Such events require a quick response or more context switches. Many context switches give the impression of immediate response at the cost of efficiency.
Fairness and Starvation : Tasks should be treated with a certain degree of fairness, including the stipulation that no thread starves. Starvation happens when a thread is not allowed to run for an unacceptabilty long period of time due to the prioritization of other threads. Fairness does mean that every thread should ever starve or be able to trick the scheduler into giving a higher priority ot more CPU than it out have.

Scheduler Performance Tuning

There is no universal ideal for schedule performance. There is no single goal that the scheduler can strive for. An example is the give and take of a very interactive desktop sistem that requires a high level of interactivity (context switching) and HPC systems where less context switching results increases efficiency. With perceived performance (multi tasking and interactive desktops), the current executing thread must give up the processor before it's timeslice is up so the context can switch to a mouse click or a key stroke. A server system on the other hand is less concerned with perceived performance and seeks actual performance. In HPC, systems are solving very complex and large problems (takes days on end) and context switching is not as important and hurts efficiency. Hence the art of scheduler tuning.

   Linux 2.4 O(n) algorithm :old and inefficient legacy linux scheduler. The algorithms execution time grows linearly sd the input size grows.
   Linux 2.6 O(1) Alorithm : New Linux scheduler (rewriiten by Ingo Molnar) -tries to create a constant upper bound for the running time of the algorithm or the algorithm is guaranteed to complete in a certain amount of time regardless of the size of the input.
       O(1) algorithm :
       Each CPU is assigned a primitve called a runqueue and it contains 2 priority array. All tasks begin on the active priority array and when they run out of their timeslice, they are moved to a expired priority array. When there are no more tasks on the priority array, it is swapped with the expired array.
       Only 1 task may modify a CPU runqueue at any given time.
       The highest priority task on the system is always scheduled first and if multiple tasks exist at the same priority level, it uses round robin.
       All tasks have a static priority (unix nice value).
       The 2.6 scheduler rewards I/) bound tasks and punishes CPU bound tasks by adding or subtracting from a task's static priority.
       Calculating TimeSlices :TimeSlices are calculated by simply scaling a task's static priority onto the possibile timeslice range and making sure a certain minimum and maximum timeslice is enforeced.
       WaitQueues : are essentially a list of tasks waiting for some event or condition to occur(waiting for I/O etc).
       Schedule() : is the main scheduler function. You cannot preempt the Schedule() function when it is running
       On SMP machines, ther are migration threads that run with high priority and they make sure that runqueues across all CPU's are balanced.

The Future of Linux Schedulers

Swappable Kernels : Being able to switch schedulers (more than 1 scheduler and you get to choose based the system's function.
Scheduler Mode :means breaking scheduler work loads into categories, and allowing root users to pick scheduler behavior of a system dynamically. Using sysctl and /etc/sysctl.conf to create dynamic changes to the kernel's scheduler on the fly.

Referensi

http://www.dsm.fordham.edu/~mathai/scheduler.html

OS: Kernel Scheduler

Contents