The Linux Kernel

The Linux Kernel

INTRODUCTION

The Linux/GNU operating system has gained much popularity and recognition in recent years. Mainly because it is an open-source operating system whose code can be modified and distributed by developers worldwide. The Linux kernel, which is the engine room, oversees the entire operation of the operating system. In this article, I will dissect the Linux kernel's architecture and bring to light the various components that give it such control.

Brief History of Linux

It’s always great to start from the roots of whatever you are working on, and I would treat the Linux kernel no differently. Linux was a pet project of LINUX TORVALDS initiated in the early 1990’s. It was released as a free source code that anyone could modify, but it didn’t have a free software license.

At the time, Richard Stallman had already developed a free software called GNU( General Public License), but it lacked a kernel: a software program at the core of an operating system that oversees all operations and processes in the computer system.

Well, Linux needed a free license, and GNU didn’t have a kernel. GNU and Linux married and created LINUX/GNU, the first completely free software operating system.

There have been a few modifications to the operating system over the years, but Linux/Gnu remains the best and most used open-source operating system.

Linux Kernel Architecture

Fig 1. Linux Kernel Architecture

Just as the nucleus is the central unit of a cell, so is the Linux kernel. It is the heart of the Linux/GNU operating system. It is that interface that oversees communication and the exchange of resources between a computer’s hardware and its processes.

I like to look at the kernel as the paddy between the buns(The userspace and the hardware). The user space is at the very top, where user-mode applications and programs the operating system uses to interact with the kernel are placed. The GNU C library(gilc) provides the interface that permits interaction between these user applications and the kernel. These user programs/applications run individually in their user address space.

Fig 2.

At the bottom is the hardware comprising the memory and the central processor, where the sent requests from the userspace are processed.

And at the center of it all is the Linux Kernel, living in its little world called the Kernel Space. The Linux Kernel has a monolithic architecture in which the entire operating system runs in the kernel space. All the operating system components needed to manage services, like system calls, process management, memory management, interfacing of input/output devices, etc, are all embedded in that single kernel address space.

Due to this form of architecture, components are in direct communication, allowing for faster execution of processes, quicker interaction, and easier maintenance and troubleshooting.

However, since all these components are embedded in a single mode, a problem with one might affect the others, replacing a single piece might be difficult, and a significant crash in a particular service might spell danger for the whole kernel.

Components of the Linux kernel

Discussing the architecture of a sophisticated system like the Linux Kernel involves structurally breaking down into its core components to understand its internal operations fully.

System Call Interface

The system call interface serves as that line through which the user space communicates with the kernel and, ultimately, the hardware. These system calls in the kernel come in the form of function calls like read(), write(), fork(), etc. They represent various services the kernel can perform.

User space uses these calls to access the kernel to perform various services the user mode programs need.

The user space is restricted from communicating with the hardware, but the kernel space has unrestricted access to all hardware resources. Once systems calls are initiated, it means the kernel has received instructions from the user space. Therefore, the activated kernel mode executes the task of receiving those instructions and pushes them to the hardware for processing. Once the hardware is done, it returns the processed data to the kernel, which then communicates these results to the user space through its system call interface.

Fig 3. System Calls

Process Management

Knowing which processes should be run, how many are on ground, which ones should be executed first, and which ones can be paused for another to be executed is all in the Job description of the Linux kernel. An excellent process management system is essential for the Linux/GNU operating system to function correctly.

Running a program, command, or application on the Linux systems initiates or creates a new process. Whatever starts the new process can be seen as a parent process, while more subprocesses, also known as threads, can be initiated under the same memory space as the parent.

The Linux Kernel creates new processes through the system call interface via the function call fork() and uses exit() to terminate a process.

These processes can assume any of these five states: running & runnable, interrutable_sleep, uninterrutable_sleep, stopped, zombie.

  • Running & Runnable states: The running and runnable states are queued together. However, the significant difference is that processes in the running state already have all the resources needed and are actively running. Whereas, processes in the runnable state are yet to be executed but are ready and queued by the CPU.

  • The Interrutable_Sleep State: Some programs, processes, or applications run to the point where they might need to request external resources like a confirmation input from the user or some form of connection prompt from a server. In this situation, they no longer have to be in an actively running state. They enter the Interrutable_Sleep State. This gives the kernel the power to put it on pause and allow other processes to run or terminate it safely. Not only does it allow for efficient use of CPU run time, but it also enhances the overall processor speed.

  • The Uninterruable_Sleep State: Some processes might be waiting for external resources but can't be terminated as it might cause significant issues. An example can be when data needs to be read from a network file system(NFS), and there is some interruption with the connection; the process would be stuck in the Uninterruable_Sleep State, waiting for the network connection to be restored.

  • The Stopped State: Stopped state could be viewed as suspending a process in the running state. Linux systems use the "control + Z" command to send a "SIGSTOP" signal to put a running process or program in the stopped state. A process in the stopped state can be terminated entirely using the "SIGKILL" signal or switched back to the running state using the "SIGCONT" signal.

  • The zombie state: When a process completes execution or gets terminated, it doesn't exit the operating system immediately. The parent process has recognized that the subprocess has been completed and then clears it off from the system. Until this happens, the completed process will remain in the zombie state.

Memory Management

The memory management component of the Linux kernel ensures the efficient utilization of the computer system's memory. It determines how much memory can be allocated and accessed by running processes and programs. The kernel creates a fantasy called Virtual Memory for these programs or processes. Making these processes think they have access to as much memory as they want even if the physical memory ( RAM ) is limited. Using a concept called Paging, it breaks the virtual memory into smaller blocks called pages and never allocates more memory space than the processes need at a particular time.

When the RAM is filled up, and more memory is needed to be accessed by programs and processes, the kernel carries out a process called swapping, where inactive pages are moved from the RAM to the hard disk to create more memory for programs with active pages.

The kernel also protects the system's memory by restricting other programs from accessing memory spaces given to another program. When a program is done using its allocated memory space, the kernel carries out a clean-up and makes that memory space available for another program in need of it.

If there is a critical issue with the memory system, the kernel triggers a "kernel panic," ensuring safety against data corruption. Memory management is as important as any other kernel component, as faulty memory would mean an inefficient operating system.

Device Drivers

The Linux kernel allows the operating system to communicate with hardware devices like printers, keyboards, monitors, etc., allowing them to sync and connect. The kernel achieves this synchronization using small written programs called device drivers.

Device drivers can be viewed as expert translators and intermediaries. They make it possible for the operating system to know what kind of hardware device it's connected to, how to use it, and what type of instructions can be given. Device drivers also ensure that these hardware devices receive these instructions in a very easily understandable form. The Linux system has a wide range of device drivers, allowing it to run on various machines like supercomputers, personal computers, servers, mainframes, mobile devices, and embedded devices.

Device drivers can be compiled into the kernel in two ways: when the drivers are directly built into the kernel and when they are compiled as loadable kernel modules(LKMs). The LKMs can be loaded and unloaded dynamically, even with the kernel at runtime.

File System

Fig 4. FIle System

The Linux kernel carries out the management and organization of files and directories. It structures the system to allow processes to store, retrieve, and modify data as needed. The file system can be broken down into logical, virtual, and physical.

At the very top is the Logical file system, which ensures that applications in the user space can interact with the file system, permitting accessing, modifying, closing, and also deleting files from the user space. You can see it as the front end of the file system. At the center is the [Virtual File System(VFS)](kernel.org/doc/html/next/filesystems/vfs.ht.. that permits access to all of the different instances of the physical file systems within a particular machine. The VFS ensures there are no complexities when users try to access these different physical file systems simultaneously. It fosters compatibility and interoperability between the file systems. At the bottom is the physical file system that interacts directly with the various memory blocks in the hard disk, retrieving, allocating, and managing files in a well-organized manner.

Linux is known for its high degree of compatibility with different systems, and its file system permits various file system partitions to be mounted on it. Some popular ones include ext4, ext3, XFS, Btrfs, NTFS, and many more.

Linux file system follows a hierarchical tree structure, starting from the root directory("/") down and branching into various subdirectories. Some common directories a Linux user would encounter include;

  • /bin: Contains user binaries, executable files, and Linux commands used by users

  • /etc: Contains system configurations

  • /home: Contains user-specific files and accounts

  • /lib: Contains essential system libraries needed by the executable files in the /bin to boot the system and run commands in the root file system

  • /usr: Mainly contains read-only commands, libraries, and data.

  • /var: Contains variable data like logs and databases.

Linux files and directories can be assigned different permissions to determine which user can read, write, or execute their contents and provide commands for navigating and managing the files. Commands include cd(change directory), rm(remove), cp(copy), ls(list files), and touch(create files).

Network Stack

The Linux kernel also holds a network stack, also called Transmission control protocol/internet protocol (TCP/IP) stack, containing various protocols, network configurations, Routes, and socket APIs that enable communication over a network. The network stack is structured into a layered architecture comprising the Link, Internet, Transport, and Application layers.

The link layer oversees interaction with hardware components like device drivers, network interface cards (NICs), and protocols like Ethernet. It encases data into packets for transportation over the local network. The Internet layer is where the Internet protocol (IP) operates. It is responsible for routing the packets across different networks. The two main internet protocol versions are the older 32-bit IPv4 and the current 128-bit IPv6.

The transport layer manages the delivery of data streams between devices and provides error checking and correction. The main transport layer protocols are the Transmission Control Protocol(TCP) and the User Datagram Protocol (UDP). While TCP provides data delivery and error checking, UDP doesn't carry out error checking but presents a faster data delivery model.

Users interact with the network stack using Socket APIs at the Application layer. These sockets permit applications to establish connections and send and receive data over a network without worrying about the networking protocols handling these interactions. Linux supports various network protocols, and developers can create applications that use them. Some include HTTP for web browsing, SMTP for emailing, and FTP for file transfer. Etc.

Conclusion

The Linux kernel has come so far from where it started as a pet project to be one of the most recognized operating systems in the world. Its adaptability to different systems, flexibility, and lightweight infrastructure make it stand out even more. There are other alternatives(like Windows NT Kernel and Zircon kernel) to the Linux kernel, but it remains the best because it supports all kinds of programming languages, is secure, and is open-source.