OS: Mengerti System Call

From OnnoWiki
Jump to navigation Jump to search

Sumber: http://www.linuxchix.org/content/courses/kernel_hacking/lesson7

Saat kita membaca-baca code device driver akan mulai berfikir, "Bagaimana function foo_read() di panggil?" Atau kita berfikir, "Ketika kita menulis cat /proc/cpuinfo, bagaimana function cpuinfo() dipanggil?"

Setelah kernel selesai booting, flow control berubah dari yang bersifat langsung "Function mana yang akan di panggil selanjutnya?" menjadi tergantung pada system call, exception dan interupsi. Mari kita bahas bagaimana cara sistem call dilakukan?

Apakah system call?

Secara gamblang, system call (biasa di kenal sebagai "syscall") adalah sebuah instruksi, mirip dengan instruksi "add" atau "jump". Pada tingkat tinggi, sebuah system call adalah cara sebuah program pada level user untuk meminta pada sistem operasi untuk menjalankan sesuatu untuknya. Jika kita seorang programmer, dan kita membutuhkkan untuk membaca dari sebuah file, kita akan menggunakan system call untuk meminta sistem operasi untuk membaca file tersebut untuk kita.

Lebih detail tentang System call

Cara system call bekerja adalah sebagai berikut. Pertama-tama, user program akan mensetup argument untuk system call. Salah satu argumen adalah nomor system call. Perlu di catat bahwa semua ini dilakukan secara automatis oleh fungsi library kecuali jika kita menulis menggunakan bahasa assembler. Sesudah semua argumen di setup, program akan menjalankan instruksi "system call". Instruksi ini akan menyebabkan exception: event yang akan menyebabkan processor untuk jump ke satu address dan mulai menjalankan program / code di address tersebut.

Instruksi di alamat yang baru akan menyimpan state user program, menentukan sistem call apa yang kita inginkan, kemudian call fuction tersebut di kernel yang mengimplementasikan system call, setelah selesai maka mengembalikan program state, dan kembali ke user program. Sebuah system call adalah salah satu cara agar function yang di definisikan dalam device driver untuk bisa di panggil.


That was the whirlwind tour of how a system call works. Next, we'll go into minute detail for those who are curious about exactly how the kernel does all this. Don't worry if you don't quite understand all of the details - just remember that this is one way that a function in the kernel can end up being called, and that no magic is involved. You can trace the control flow all the way through the kernel - with difficulty sometimes, but you can do it.

A system call example

This is a good place to start showing code to go along with the theory. We'll follow the progress of a read() system call, starting from the moment the system call instruction is executed. The PowerPC architecture will be used as an example for the architecture specific part of the code. On the PowerPC, when you execute a system call, the processor jumps to the address 0xc00. The code at that location is defined in the file: arch/ppc/kernel/head.S

It looks something like this:

/* System call */
        . = 0xc00
SystemCall:
        EXCEPTION_PROLOG
        EXC_XFER_EE_LITE(0xc00, DoSyscall)

/* Single step - not used on 601 */
        EXCEPTION(0xd00, SingleStep, SingleStepException, EXC_XFER_STD)
        EXCEPTION(0xe00, Trap_0e, UnknownException, EXC_XFER_EE)

What this code does is save some state and call another function called DoSyscall. Here's a more detailed explanation (feel free to skip this part):

EXCEPTION_PROLOG is a macro that handles the switch from user to kernel space, which requires things like saving the register state of the user process. EXC_XFER_EE_LITE is called with the address of this routine, and the address of the function DoSyscall. Eventually, some state will be saved and DoSyscall will be called. The next two lines save two exception vectors on the addresses 0xd00 and 0xe00.

EXC_XFER_EE_LITE looks like this:

#define EXC_XFER_EE_LITE(n, hdlr)       \
        EXC_XFER_TEMPLATE(n, hdlr, n+1, COPY_EE, transfer_to_handler, \
                          ret_from_except)

EXC_XFER_TEMPLATE is another macro, and the code looks like this:

#define EXC_XFER_TEMPLATE(n, hdlr, trap, copyee, tfer, ret)     \
        li      r10,trap;                                       \
        stw     r10,TRAP(r11);                                  \
        li      r10,MSR_KERNEL;                                 \
        copyee(r10, r9);                                        \
        bl      tfer;                                           \
i##n:                                                           \
        .long   hdlr;                                           \
        .long   ret

li stands for "load immediate", which means that a constant value known at compile time is stored in a register. First, trap is loaded into the register r10. On the next line, that value is stored on the address given by TRAP(r11). TRAP(r11) and the next two lines do some hardware specific bit manipulation. After that we call the tfer function (i.e. the transfer_to_handler function), which does yet more housekeeping, and then transfers control to hdlr (i.e. DoSyscall). Note that transfer_to_handler loads the address of the handler from the link register, which is why you see .long DoSyscall instead of bl DoSyscall.

Now, let's look at DoSyscall. It's in the file:

arch/ppc/kernel/entry.S

Eventually, this function loads up the address of the syscall table and indexes into it using the system call number. The syscall table is what the OS uses to translate from a system call number to a particular system call. The system call table is named sys_call_table and defined in:

arch/ppc/kernel/misc.S

The syscall table contains the addresses of the functions that implement each system call. For example, the read() system call function is named sys_read. The read() system call number is 3, so the address of sys_read() is in the 4th entry of the system call table (since we start numbering the system calls with 0). We read the data from the address sys_call_table + (3 * word_size) and we get the address of sys_read().

After DoSyscall has looked up the correct system call address, it transfers control to that system call. Let's look at where sys_read() is defined, in the file: fs/read_write.c

This function finds the file struct associated with the fd number you passed to the read() function. That structure contains a pointer to the function that should be used to read data from that particular kind of file. After doing some checks, it calls that file-specific read function in order to actually read the data from the file, and then returns. This file-specific function is defined somewhere else - the socket code, filesystem code, or device driver code, for example. This is one of the points at which a specific kernel subsystem finally interfaces with the rest of the kernel. After our read function finishes, we return from the sys_read(), back to DoSyscall(), which switches control to ret_from_except, which is in defined in:

arch/ppc/kernel/entry.S

This checks for tasks that might need to be done before switching back to user mode. If nothing else needs to be done, we fall through to the restore function, which restores the user process's state and returns control back to the user program. There! Your read() call is done! If you're lucky, you even got your data back.

You can explore syscalls further by putting printks at strategic places. Be sure to limit the amount of output from these printks. For example, if you add a printk to sys_read() syscall, you should do something like this:

static int mycount = 0;

if (mycount < 10) {
         printk ("sys_read called\n");
         mycount++;
  }

Have fun!


Referensi

Pranala Menarik