0% found this document useful (0 votes)
55 views86 pages

Aos Notes

Uploaded by

test9546866192
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views86 pages

Aos Notes

Uploaded by

test9546866192
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Uttarakhand Technical University, Dehradun

New Scheme of Examination as per AICTE Flexible Curricula


Computer Science and Engineering, VIII-Semester
CS 801 Advanced Operating Systems Course Objectives:

UNIT I: Overview of UNIX system calls. The anatomy of a system call and x86
mechanisms for system call implementation. How the MMU/memory translation,
segmentation, and hardware traps interact to create kernel–user context separation.
What makes virtualization work? The kernel execution and programming context.
Live debugging and tracing. Hardware and software support for debugging.

UNIT II: DTrace: programming, implementation/design, internals. Kprobes and


SysTrace: Linux catching up. Linking and loading. Executable and Linkable Format
(ELF). Internals of linking and dynamic linking. Internals of effective spinlock
implementations on x86. OpenSolaris adaptive mutexes: rationale and
implementation optimization. Pre-emptive kernels. Effects of modern memory
hierarchies and related optimizations.

UNIT III: Process and thread kernel data structures, process table traversal, lookup,
allocation and management of new structures, /proc internals, optimizations. Virtual
File System and the layering of a file system call from API to driver. Object-
orientation patterns in kernel code; a review of OO implementation generics (C++
vtables, etc).

UNIT IV: OpenSolaris and Linux virtual memory and address space structures. Tying
top-down and bottom-up object and memory page lookups with the actual x86 page
translation and segmentation. How file operations, I/O buffering, and swapping all
converged to using the same mechanism. Kmem and Vmem allocators. OO
approach to memory allocation. Challenges of multiple CPUs and memory hierarchy.
Security: integrity, isolation, mediation, auditing. From MULTICS and MLS to modern
UNIX. SELinux type enforcement: design, implementation, and pragmatics. Kernel
hook systems and policies they enable. Trap systems and policies they enable.
Tagged architectures and multi-level UNIX.

UNIT V: ZFS overview. OpenSolaris boot environments and snapshots. OpenSolaris


and UNIX System V system administration pragmatics: service startup,
dependencies, management, system updates. Overview of the kernel network stack
implementation. Path of a packet through a kernel. Berkeley Packet Filter
architecture. Linux Netfilter architecture.
UNIT I

Overview of UNIX system calls

UNIX system calls are used to manage the file system, control processes,and to provide interprocess
communication. The UNIX system interface consists of about 80 system calls (as UNIX evolves this
number will increase). The following table lists about 40 of the more important system call:

GENERAL CLASS SPECIFIC CLASS SYSTEM CALL


---------------------------------------------------------------------
File Structure Creating a Channel creat()
Related Calls open()
close()
Input/Output read()
write()
Random Access lseek()
Channel Duplication dup()
Aliasing and Removing link()
Files unlink()
File Status stat()
fstat()
Access Control access()
chmod()
chown()
umask()
Device Control ioctl()
---------------------------------------------------------------------
Process Related Process Creation and exec()
Calls Termination fork()
wait()
exit()
Process Owner and Group getuid()
geteuid()
getgid()
getegid()
Process Identity getpid()
getppid()
Process Control signal()
kill()
alarm()
Change Working Directory chdir()
----------------------------------------------------------------------
Interprocess Pipelines pipe()
Communication Messages msgget()
msgsnd()
msgrcv()
msgctl()
Semaphores semget()
semop()
Shared Memory shmget()
shmat()
shmdt()
----------------------------------------------------------------------

[NOTE: The system call interface is that aspect of UNIX that has changed the most since the
inception of the UNIX system. Therefore, when you write a software tool, you should protect that tool
by putting system calls in other subroutines within your program and then calling only those
subroutines. Should the next version of the UNIX system change the syntax and semantics of the
system calls you've used, you need only change your interface routines.]
The anatomy of a System call and x86 mechanisms for system call
implementation.

Anatomy of a System call

NaCl syscalls are the interface between untrusted code and the trusted codebase. They are the
means by which a NaCl process can execute code outside the inner sandbox. This is kind of a big
deal, because the entire point of NaCl is to prevent untrusted code from getting out of the inner
sandbox. Accordingly, the design and implementation of the syscall interface is a crucial part of the
NaCl system.

The purpose of a syscall is to transfer control from an untrusted execution context to a trusted one, so
that the thread can execute trusted code. The details of this implementation vary from platform to
platform, but the general flow is the same. This figure shows the flow of control:

The syscall starts as a call from untrusted code to a trampoline, which is a tiny bit of code (less than
one NaCl bundle) that resides at the bottom of the untrusted address space. Each syscall has its own
trampoline, but all trampolines are identical--in fact, they're all generated by the loader from a simple
template. The trampoline does at most two things:

1. Exits the hardware sandbox (on non-SFI implementations) by restoring the original system value of
%ds.
2. Calls the untrusted-to-trusted context switch function (NaClSyscallSeg)
The many ways of implementing user-to-kernel transitions on x86, i.e. system calls. Let’s first quickly
review what system calls actually need to accomplish.

System call implementation

In modern operating systems there is a distinction between user mode (executing normal application
code) and kernel mode1 (being able to touch system configuration and devices). System calls are the
way for applications to request services from the operating system kernel and bridge the gap. To
facilitate that, the CPU needs to provide a mechanism for applications to securely transition from user
to kernel mode.

Secure in this context means that the application cannot just jump to arbitrary kernel code, because
that would effectively allow the application to do what it wants on the system. The kernel must be able
to configure defined entry points and the system call mechanism of the processor must enforce these.
After the system call is handled, the operating system also needs to know where to return to in the
application, so the system call mechanism also has to provide this information.

I came up with four mechanisms that match this description that work for 64-bit environments. I’m
going to save the weirder ones that only work on 32-bit for another post. So we have:

1. Software Interrupts using the int instruction


2. Call Gates
3. Fast system calls using sysenter / sysexit
4. Fast system calls using syscall / sysret

Software interrupts are the oldest mechanism. The key idea is to use the same method to enter the
kernel as hardware interrupts do. In essence, it is still the mechanism that was introduced
with Protected Mode in 1982 on the 286, but even the earlier CPUs already had cruder versions of
this.

Because interrupt vector 0x80 can still be used to invoke system calls 2 on 64-bit Linux, we are going
to stick with this example:

The processor finds the kernel entry address by taking the interrupt vector number from
the int instruction and looking up the corresponding descriptor in the Interrupt Descriptor Table (IDT).
This descriptor will be an Interrupt or Trap Gate3 to kernel mode and it contains the pointer to the
handling function in the kernel.

How the MMU/memory translation


The Memory Management Unit (MMU) performs translations.
The MMU contains the following:
The table walk unit, which contains logic that reads the translation tables from memory.
Translation Lookaside Buffers (TLBs), which cache recently used translations.
All memory addresses that are issued by software are virtual. These memory addresses are passed
to the MMU, which checks the TLBs for a recently used cached translation. If the MMU does not find
a recently cached translation, the table walk unit reads the appropriate table entry, or entries, from
memory, as shown here:
A virtual address must be translated to a physical address before a memory access can take place
(because we must know which physical memory location we are accessing). This need for translation
also applies to cached data, because on Armv6 and later processors, the data caches store data
using the physical address (addresses that are physically tagged). Therefore, the address must be
translated before a cache lookup can complete.

Note: Architecture is a behavioural specification. The caches must behave as if they are physically
tagged. An implementation might do something different, as long as this is not software-visible.

Table entry

The translation tables work by dividing the virtual address space into equal-sized blocks and by
providing one entry in the table per block.

Entry 0 in the table provides the mapping for block 0, entry 1 provides the mapping for block 1, and so
on. Each entry contains the address of a corresponding block of physical memory and the attributes
to use when accessing the physical address.

Table lookup

A table lookup occurs when a translation takes place. When a translation happens, the virtual address
that is issued by the software is split in two, as shown in this diagram:
This diagram shows a single-level lookup.

The upper-order bits, which are labelled 'Which entry' in the diagram, tell you which block entry to look
in and they are used as an index into the table. This entry block contains the physical address for the
virtual address.

The lower-order bits, which are labelled 'Offset in block' in the diagram, are an offset within that block
and are not changed by the translation.

Multilevel translation

In a single-level lookup, the virtual address space is split into equal-sized blocks. In practice, a
hierarchy of tables is used.

The first table (Level 1 table) divides the virtual address space into large blocks. Each entry in this
table can point to an equal-sized block of physical memory or it can point to another table which
subdivides the block into smaller blocks. We call this type of table a 'multilevel table'. Here we can see
an example of a multilevel table that has three levels:
In Armv8-A, the maximum number of levels is four, and the levels are numbered 0 to 3. This multilevel
approach allows both larger blocks and smaller blocks to be described. The characteristics of large
and small blocks are as follows:

Large blocks require fewer levels of reads to translate than small blocks. Plus, large blocks are more
efficient to cache in the TLBs.

Small blocks give software fine-grain control over memory allocation. However, small blocks are less
efficient to cache in the TLBs. Caching is less efficient because small blocks require multiple reads
through the levels to translate.

To manage this trade-off, an OS must balance the efficiency of using large mappings against the
flexibility of using smaller mappings for optimum performance.

Note: The processor does not know the size of the translation when it starts the table lookup. The
processor works out the size of the block that is being translated by performing the table walk.

Segmentation
In Operating Systems, Segmentation is a memory management technique in which the memory is
divided into the variable size parts. Each part is known as a segment which can be allocated to a
process.

The details about each segment are stored in a table called a segment table. Segment table is stored
in one (or many) of the segments.

Segment table contains mainly two information about segment:

1. Base: It is the base address of the segment

2. Limit: It is the length of the segment.


Why Segmentation is required?

Till now, we were using Paging as our main memory management technique. Paging is more close to
the Operating system rather than the User. It divides all the processes into the form of pages regardless
of the fact that a process can have some relative parts of functions which need to be loaded in the same
page

Operating system doesn't care about the User's view of the process. It may divide the same function
into different pages and those pages may or may not be loaded at the same time into the memory. It
decreases the efficiency of the system.

It is better to have segmentation which divides the process into the segments. Each segment contains
the same type of functions such as the main function can be included in one segment and the library
functions can be included in the other segment.

Translation of Logical address into physical address by segment table

CPU generates a logical address which contains two parts:

1. Segment Number

2. Offset

For Example:

Suppose a 16 bit address is used with 4 bits for the segment number and 12 bits for the segment offset
so the maximum segment size is 4096 and the maximum number of segments that can be refereed is
16.

When a program is loaded into memory, the segmentation system tries to locate space that is large
enough to hold the first segment of the process, space information is obtained from the free list
maintained by memory manager. Then it tries to locate space for other segments. Once adequate space
is located for all the segments, it loads them into their respective areas.
The operating system also generates a segment map table for each program.

With the help of segment map tables and hardware assistance, the operating system can easily
translate a logical address into physical address on execution of a program.

The Segment number is mapped to the segment table. The limit of the respective segment is compared
with the offset. If the offset is less than the limit then the address is valid otherwise it throws an error as
the address is invalid.

In the case of valid addresses, the base address of the segment is added to the offset to get the physical
address of the actual word in the main memory.

The above figure shows how address translation is done in case of segmentation.

Advantages of Segmentation

1. No internal fragmentation

2. Average Segment Size is larger than the actual page size.

3. Less overhead

4. It is easier to relocate segments than entire address space.

5. The segment table is of lesser size as compared to the page table in paging.
Disadvantages

1. It can have external fragmentation.

2. it is difficult to allocate contiguous memory to variable sized partition.

3. Costly memory management algorithms.

Create kernel-user context separation


There are two modes of operation in the operating system to make sure it works correctly. These are
user mode and kernel mode.
They are explained as follows −

User Mode
The system is in user mode when the operating system is running a user application such as handling
a text editor. The transition from user mode to kernel mode occurs when the application requests the
help of operating system or an interrupt or a system call occurs.

The mode bit is set to 1 in the user mode. It is changed from 1 to 0 when switching from user mode to
kernel mode.

Kernel Mode
The system starts in kernel mode when it boots and after the operating system is loaded, it executes
applications in user mode. There are some privileged instructions that can only be executed in kernel
mode.

These are interrupt instructions, input output management etc. If the privileged instructions are executed
in user mode, it is illegal and a trap is generated.

The mode bit is set to 0 in the kernel mode. It is changed from 0 to 1 when switching from kernel mode
to user mode.
An image that illustrates the transition from user mode to kernel mode and back again is −

In the above image, the user process executes in the user mode until it gets a system call. Then a
system trap is generated and the mode bit is set to zero. The system call gets executed in kernel mode.
After the execution is completed, again a system trap is generated and the mode bit is set to 1. The
system control returns to kernel mode and the process execution continues.
Necessity of Dual Mode (User Mode and Kernel Mode) in Operating System
The lack of a dual mode i.e user mode and kernel mode in an operating system can cause serious
problems. Some of these are −

 A running user program can accidentaly wipe out the operating system by overwriting it with
user data.
 Multiple processes can write in the same system at the same time, with disastrous results.

What makes virtualization work ?

Operating system-based Virtualization refers to an operating system feature in which the kernel
enables the existence of various isolated user-space instances. The installation of virtualization
software also refers to Operating system-based virtualization. It is installed over a pre-existing
operating system and that operating system is called the host operating system.
In this virtualization, a user installs the virtualization software in the operating system of his system
like any other program and utilizes this application to operate and generate various virtual
machines. Here, the virtualization software allows direct access to any of the created virtual
machines to the user. As the host OS can provide hardware devices with the mandatory support,
operating system virtualization may affect compatibility issues of hardware even when the hardware
driver is not allocated to the virtualization software.
Virtualization software is able to convert hardware IT resources that require unique software for
operation into virtualized IT resources. As the host OS is a complete operating system in itself,
many OS-based services are available as organizational management and administration tools can
be utilized for the virtualization host management.

Some major operating system-based services are mentioned below:


1. Backup and Recovery.
2. Security Management.
3. Integration to Directory Services.
Various major operations of Operating System Based Virtualization are described below:
1. Hardware capabilities can be employed, such as the network connection and CPU.
2. Connected peripherals with which it can interact, such as webcam, printer, keyboard, or
Scanners.
3. Data that can be read or written, such as files, folders, and network shares.

The Operating system may have the capability to allow or deny access to such resources based on
which the program requests them and the user account in the context of which it runs. OS may also
hide these resources, which leads that when a computer program computes them, they do not
appear in the enumeration results. Nevertheless, from a programming perspective, the computer
program has interacted with those resources and the operating system has managed an act of
interaction.
With operating-system-virtualization or containerization, it is probable to run programs within
containers, to which only parts of these resources are allocated. A program that is expected to
perceive the whole computer, once run inside a container, can only see the allocated resources and
believes them to be all that is available. Several containers can be formed on each operating
system, to each of which a subset of the computer’s resources is allocated. Each container may
include many computer programs. These programs may run parallel or distinctly, even interrelate
with each other.
Operating system-based virtualization can raise demands and problems related to performance
overhead, such as:
1. The host operating system employs CPU, memory, and other hardware IT resources.
2. Hardware-related calls from guest operating systems need to navigate numerous layers to and
from the hardware, which shrinkage overall performance.
3. Licenses are frequently essential for host operating systems, in addition to individual licenses
for each of their guest operating systems.

The kernel execution and programming context.

Kernel execution
The kernel execution configuration defines the dimensions of a grid and its blocks. Unique
coordinates in blockIdx and threadIdx variables allow threads of a grid to identify themselves and their
domains of data. It is the programmer’s responsibility to use these variables in kernel functions so that
the threads can properly identify the portion of the data to process. This model of programming
compels the programmer to organize threads and their data into hierarchical and multidimensional
organizations.

In the dictionary a kernel is a softer, usually edible part of a nut, seed, or fruit stone contained within its
shell such as “the kernel of a walnut”. It can also be the central or most important part of something
“this is the kernel of the argument”.

In computing the kernel is a computer program that is the core of a computer’s operating system,
with complete control over everything in the system.
The kernel is often one of the first programs loaded up on start-up before the boot loader.

“A boot loader is a type of program that loads and starts the boot time tasks and processes of an
operating system or the computer system. It enables loading the operating system within the computer
memory when a computer is started or booted up. A boot loader is also known as a boot manager or
bootstrap loader.”

You have probably heard the expression of ‘booting up’ a system. The bootloader translates the data-
processing instructions for the central processing unit. The bootloader handles memory and
peripherals like keyboards, monitors and speakers.
The boot system for all standard computers and operating systems — image by Neosmart retrieved
the 27th of September.

I had an inkling that the kernel was important as part of the computer system operation, however I was
unsure of how it operates. As such I found more information about the Linux Kernel in particular.

Demystifying the Linux Kernel from Digilent blog retrieved the 27th of September.

“…the kernel is a barrier between applications, CPU, memory, and devices. Applications are what
people use all the time, with everything from video games to the Internet”

The Linux kernel is a free and open-source, monolithic, Unix-like operating system kernel. This can be
represented as such.
Of course likely simplified the advantages are as follows.
 Since there is less software involved it is faster.
 As it is one single piece of software it should be smaller both in source and compiled forms.
 Less code generally means fewer bugs which can translate to fewer security problems.
All OS services run along with the main kernel thread, thus also residing in the same memory area.
The main disadvantages of monolithic kernels are:
 The dependencies between system components — a bug in a device driver might crash the entire
system
 Large kernels can become very difficult to maintain.
Most work in the monolithic kernel is done via system calls.

A system call is a way for programs to interact with the operating system. A computer program makes
a system call when it makes a request to the operating system’s kernel. System call provides the
services of the operating system to the user programs via Application Program Interface(API).
In an article on the geek stuff the interaction between the computer hardware, OS Kernel, System
Functions, Application Code and Library Functions:

In computer science, a library is a collection of non-volatile resources used by computer programs,


often for software development.
What is application code and library code. More importantly what is the difference between the two?
According to Passion for Coding these can be defined as:
Library code meant to be reusable in different applications, under different circumstances –
extendable and adaptable without code changes.

Application code used in one environment and can be changed to alter the behaviour. As an example
of the difference, I’ll implement a sample logging mechanism. One written as application code and one
written as library code.
As such the difference between a microkernel and a monolithic kernel lies within the system calls as
well as the ‘kernel space’.

Image retrieved from Tech Difference on the 28th of September. A more detailed explanation can be
found at https://techdifferences.com/difference-between-microkernel-and-monolithic-kernel.html
The main differences were listed as the following:
1. The basic point on which microkernel and monolithic kernel is distinguished is that microkernel
implement user services and kernel services in different address spaces and monolithic kernel
implement both user services and kernel services under same address space.
2. The size of microkernel is small as only kernel services reside in the kernel address space.
However, the size of monolithic kernel is comparatively larger than microkernel because both
kernel services and user services reside in the same address space.
3. The execution of monolithic kernel is faster as the communication between application and
hardware is established using the system call. On the other hands, the execution of microkernel is
slow as the communication between application and hardware of the system is established
through message passing.
4. It is easy to extend microkernel because new service is to be added in user address space that is
isolated from kernel space, so the kernel does not require to be modified. Opposite is the case
with monolithic kernel if a new service is to be added in monolithic kernel then entire kernel needs
to be modified.
5. Microkernel is more secure than monolithic kernel as if a service fails in microkernel the operating
system remain unaffected. On the other hands, if a service fails in monolithic kernel entire system
fails.
6. Monolithic kernel designing requires less code, which further leads to fewer bugs. On the other
hands, microkernel designing needs more code which further leads to more bugs.

Live debugging and tracing.


Debug and trace enables you to monitor the application for errors and exception with out VS.NET
IDE. In Debug mode compiler inserts some debugging code inside the executable. As the debugging
code is the part of the executable they run on the same thread where the code runs and they do not
given you the exact efficiency of the code ( as they run on the same thread). So for every full
executable DLL you will see a debug file also as shown in figure ‘Debug Mode'.Trace works in both
debug as well as release mode. The main advantage of using trace over debug is to do performance
analysis which can not be done by debug. Trace runs on a different thread thus it does not impact the
main code thread. There is also a fundamental difference in thinking when we want to use trace and
when want to debug. Tracing is a process about getting information regarding program's execution.
On the other hand debugging is about finding errors in the code.

Hardware and software support for debugging


Hardware Debugging : In computer programming and software development, debugging is the
process of finding and resolving bugs (defects or problems that prevent correct operation)
within computer programs, software, or systems.
Debugging tactics can involve interactive debugging, control flow analysis, unit testing, integration
testing, log file analysis, monitoring at the application or system level, memory dumps, and profiling.
Many programming languages and software development tools also offer programs to aid in
debugging, known as debuggers.

Software Debugging : Debugging is the process of detecting and removing of existing and
potential errors (also called as ‘bugs’) in a software code that can cause it to behave unexpectedly or
crash. To prevent incorrect operation of a software or system, debugging is used to find and resolve
bugs or defects. When various subsystems or modules are tightly coupled, debugging becomes
harder as any change in one module may cause more bugs to appear in another. Sometimes it takes
more time to debug a program than to code it.

Description: To debug a program, user has to start with a problem, isolate the source code of the
problem, and then fix it. A user of a program must know how to fix the problem as knowledge about
problem analysis is expected. When the bug is fixed, then the software is ready to use. Debugging
tools (called debuggers) are used to identify coding errors at various development stages. They are
used to reproduce the conditions in which error has occurred, then examine the program state at that
time and locate the cause. Programmers can trace the program execution step-by-step by evaluating
the value of variables and stop the execution wherever required to get the value of variables or reset
the program variables. Some programming language packages provide a debugger for checking the
code for errors while it is being written at run time.
Here’s the debugging process:

1. Reproduce the problem.

2. Describe the bug. Try to get as much input from the user to get the exact reason.

3. Capture the program snapshot when the bug appears. Try to get all the variable values and states
of the program at that time.

4. Analyse the snapshot based on the state and action. Based on that try to find the cause of the bug.

5. Fix the existing bug, but also check that any new bug does not occur.
UNIT II
DTrace
DTrace is a comprehensive dynamic tracing framework originally created by Sun
Microsystems for troubleshooting kernel and application problems on production systems in real time.
Originally developed for Solaris, it has since been released under the free Common Development and
Distribution License (CDDL) in OpenSolaris and its descendant illumos, and has been ported to
several other Unix-like systems.
DTrace can be used to get a global overview of a running system, such as the amount of memory,
CPU time, filesystem and network resources used by the active processes. It can also provide much
more fine-grained information, such as a log of the arguments with which a specific function is being
called, or a list of the processes accessing a specific file.

Command line examples

DTrace scripts can be invoked directly from the command line, providing one or more probes and
actions as arguments. Some examples:

# New processes with arguments


dtrace -n 'proc:::exec-success { trace(curpsinfo->pr_psargs); }'

# Files opened by process


dtrace -n 'syscall::open*:entry { printf("%s %s",execname,copyinstr(arg0)); }'

# Syscall count by program


dtrace -n 'syscall:::entry { @num[execname] = count(); }'

# Syscall count by syscall


dtrace -n 'syscall:::entry { @num[probefunc] = count(); }'

# Syscall count by process


dtrace -n 'syscall:::entry { @num[pid,execname] = count(); }'

# Disk size by process


dtrace -n 'io:::start { printf("%d %s %d",pid,execname,args[0]->b_bcount); }'

# Pages paged in by process


dtrace -n 'vminfo:::pgpgin { @pg[execname] = sum(arg0); }'

Scripts can also be written which can reach hundreds of lines in length, although typically only tens of
lines are needed for advanced troubleshooting and analysis.

DTrace Programming
When you use the dtrace command, you invoke the compiler for the D language. Once DTrace has
compiled your program, it sends it to the operating system kernel for execution, where it activates the
probes that your program uses.

DTrace enables probes only when you are using them. No instrumented code is present for inactive
probes, so your system does not experience performance degradation when you are not using
DTrace. Once your D program exits, all of the probes it used are automatically disabled and their
instrumentation is removed, returning your system to its original state. No effective difference exists
between a system where DTrace is not active and one where the DTrace software is not installed.
DTrace implements the instrumentation for each probe dynamically on the live, running operating
system. DTrace neither quiesces nor pauses the system in any way, and it adds instrumentation code
only for the probes that you enable. As a result, the effect of using DTrace probes is limited to exactly
what you ask DTrace to do. DTrace instrumentation is designed to be as efficient as possible, and
enables you to use it in production to solve real problems in real time.

The DTrace framework provides support for an arbitrary number of virtual clients. You can run as
many simultaneous D programs as you like, limited only by your system's memory capacity, and all
the programs operate independently using the same underlying instrumentation. This same capability
also permits any number of distinct users on the system to take advantage of DTrace simultaneously
on the same system without interfering with one another.

Unlike a C or C++ program, but similar to a Java program, DTrace compiles your D program into a
safe intermediate form that it executes when a probe fires. DTrace validates whether this intermediate
form can run safely, reporting any run-time errors that might occur during the execution of your D
program, such as dividing by zero or dereferencing invalid memory. As a result, you cannot construct
an unsafe D program. You can use DTrace in a production environment without worrying about
crashing or corrupting your system. If you make a programming mistake, DTrace disables the
instrumentation and reports the error to you.

illustrates the different components of the DTrace architecture, including probe providers, the DTrace
driver, the DTrace library, and the dtrace command.

Components of the DTrace Architecture


DTrace Windows architecture

Users interact with DTrace through the DTrace command, which serves as a front-end to the DTrace
engine. D scripts get compiled to an intermediate format (DIF) in user-space and sent to the DTrace
kernel component for execution, sometimes called as the DIF Virtual Machine. This runs in the
dtrace.sys driver.

Traceext.sys (trace extension) is a Windows kernel extension driver, which allows Windows to expose
functionality that DTrace relies on to provide tracing. The Windows kernel provides callouts during
stackwalk or memory accesses which are then implemented by the trace extension.

Installing DTrace under Windows

1. Check that you are running a supported version of Windows. The current download of DTrace
is supported in the Insider builds of 20H1 Windows after version 18980 and Windows Server
Insider Preview Build 18975. Installing this version of DTrace on older versions of Windows can
lead to system instability and is not recommended.

The archived version of DTrace for 19H1 is available at Archived Download DTrace on
Windows. Note that this version of DTrace is no longer supported.

2. Download the MSI installation file (Download DTrace on Windows) from the Microsoft
Download Center.
3. Select the Complete install.

Important

Before using bcdedit to change boot information you may need to temporarily suspend
Windows security features such as Patchguard, BitLocker and Secure Boot on the test PC. Re-
enable these security features when testing is complete and appropriately manage the test PC,
when the security features are disabled.
Internals
A simple illustration of how DTrace internals work.

Consider this probe of the kernel's "worker" function


for the getpid() syscall:

dtrace -n 'fbt::getpid:entry {printf("%s[%d] kthread: %p proc: %p\n", execname, curthread->t_procp-


>p_pidp->pid_id, curthread, curthread->t_procp)}'

This probe shows how to chase pointer chains from the


kthread_t * pointer to the current thread's context to
the process the thread belongs to, and then to the integer
PID of that process.

Consider the disassembly of the getpid function:

root@openindiana:/home/sergey# mdb -k
> getpid::dis
getpid: pushq %rbp
getpid+1: movq %rsp,%rbp
getpid+4: subq $0x10,%rsp
getpid+8: movq %gs:0x18,%rax
getpid+0x11: movq 0x190(%rax),%r9
getpid+0x18: movq 0xb0(%r9),%r8
getpid+0x1f: movl 0x4(%r8),%eax
getpid+0x23: movl %eax,-0x8(%rbp)
getpid+0x26: testl $0x400,0xcc(%r9)
getpid+0x31: jne +0x9 <getpid+0x3c>
getpid+0x33: movl 0x34(%r9),%eax
getpid+0x37: movl %eax,-0x4(%rbp)
getpid+0x3a: jmp +0x2c <getpid+0x68>
getpid+0x3c: movq %gs:0x18,%rax
getpid+0x45: movq 0x190(%rax),%r8
getpid+0x4c: movq 0x620(%r8),%r8
getpid+0x53: movq 0x150(%r8),%r8
getpid+0x5a: movq 0xb0(%r8),%r8
getpid+0x61: movl 0x4(%r8),%eax
getpid+0x65: movl %eax,-0x4(%rbp)
getpid+0x68: movq -0x8(%rbp),%rax
getpid+0x6c: leave
getpid+0x6d: ret

Note the "mov* <offset-in-struct>(<register-with-pointer-to-struct>), <register>" instructions. They do


the actual pointer chasing
behind the C '->' operator.

Kprobes and sysTrace


Kernel Dynamic Probes (Kprobes) provides a lightweight interface for kernel modules to implant
probes and register corresponding probe handlers. A probe is an automated breakpoint that is
implanted dynamically in executing (kernel-space) modules without the need to modify their
underlying source. Probes are intended to be used as an ad hoc service aid where minimal disruption
to the system is required. They are particularly advocated in production environments where the use
of interactive debuggers is undesirable. Kprobes also has substantial applicability in test and
development environments. During test, faults may be injected or simulated by the probing module. In
development, debugging code (for example a printk) may be easily inserted without having to
recompile to module under test.
Systrace is a computer security utility which limits an application's access to the system by enforcing
access policies for system calls. This can mitigate the effects of buffer overflows and other security
vulnerabilities. It was developed by Niels Provos and runs on various Unix-like operating systems.
Systrace is particularly useful when running untrusted or binary-only applications and provides
facilities for privilege elevation on a system call basis, helping to eliminate the need for potentially
dangerous setuid programs. It also includes interactive and automatic policy generation features, to
assist in the creation of a base policy for an application.

Use sysTrace to debug your android app performance


Linux catching up

Linux distributions have been around for decades now, for both servers ( business oriented
applications ) and home applications. While for specialized IT applications, different kind of hardware,
or data science, Linux may be the system of choice, for home applications, or general business Linux
has never seemed to be a viable option.

Why hasn’t Linux managed to catch on, in the past?

There are three main reasons for this, at least from my point of view, as a regular home user.

The first reason, which applies, to older versions especially, is that Linux was not ready to be used by
non-tech geeks. Choosing what could be the right distribution for you meant spending a lot of time on
the internet checking differences between everything that existed, looking at the main features and
trying to figure out which distribution would be the easiest for you to use. If we are looking at even older
versions of Linux, the internet search was not available, so the only information you got was from IT
specialized magazines, which did not always include the data you were looking for. When you
managed to find the what you thought was the right distribution you needed to install it on PC. This
meant that you needed to look for guides to explain each step, as most distributions had non-graphic
and hard to use installers.

The second reason is related to available software. Even you have put up the time to install a Linux
distribution on your PC, you were not able to install regular Windows software on Linux, which meant
that most commonly used software wasn’t available on Linux. Many of these applications had open
source versions, which could be used for Linux, but you would have to deal with compatibility issues,
and not being able to exchange data between Linux and Windows users easily.

The third reason was that, even you would find the right software for you, and wouldn’t need to worry
about compatibility, you first needed to install that software. This installation process was, in most
cases, painful. In most Linux distributions ( e.g. Red Hat or, later, Fedora ) you were required to
download the sources of the software you wanted to use, and run a set of commands in the shell, in
order to get that software compiled and installed.

The above steps might have been doable and relatively easy for someone with an interest in IT, but for
most home users, who just want a usable out of the box system and a one-click way of installing their
needed applications, this was a major repellent.

Loading and Linking

Linking and Loading are the utility programs that play a important role in the execution of a program.
Linking intakes the object codes generated by the assembler and combines them to generate the
executable module. On the other hand, the loading loads this executable module to the main memory
for execution.
Loading:
Bringing the program from secondary memory to main memory is called Loading.
Linking:
Establishing the linking between all the modules or all the functions of the program in order to
continue the program execution is called linking.
Differences between Linking and Loading:

1. The key difference between linking and loading is that the linking generates the executable file of
a program whereas, the loading loads the executable file obtained from the linking into main
memory for execution.
2. The linking intakes the object module of a program generated by the assembler. However, the
loading intakes the executable module generated by the linking.
3. The linking combines all object modules of a program to generate executable modules it also links
the library function in the object module to built-in libraries of the high-level programming
language. On the other hand, loading allocates space to an executable module in main memory.

Loading and Linking are further categorized into 2 types:

STATIC DYNAMIC
Loading the entire program into the main memory
before start of the program execution is called as
static loading.
Loading the program into the main memory on
demand is called as dynamic loading.
Inefficient utilization of memory because whether Efficient utilization of memory.
it is required or not required entire program is
brought into the main memory.

Program execution will be faster. Program execution will be slower.


Statically linked program takes constant load time Dynamic linking is performed at run time by the
every time it is loaded into the memory for operating system.
execution.

If the static loading is used then accordingly static If the dynamic loading is used then accordingly
linking is applied. dynamic linking is applied.

Static linking is performed by programs called In dynamic linking this is not the case and
linkers as the last step in compiling a program. individual shared modules can be updated and
Linkers are also called link editors. recompiled. This is one of the greatest
advantages dynamic linking offers.

In static linking if any of the external programs In dynamic linking load time might be reduced if
has changed then they have to be recompiled the shared library code is already present in
and re-linked again else the changes won’t reflect memory.
in existing executable file.
Executable and linking format (ELF)
ELF is the standard binary format on operating systems such as Linux. Some of the capabilities of
ELF are dynamic linking, dynamic loading, imposing run-time control on a program, and an improved
method for creating shared libraries. The ELF representation of control data in an object file is
platform independent, which is an additional improvement over previous binary formats.

The ELF representation permits object files to be identified, parsed, and interpreted simil arly,
making the ELF object files compatible across multiple platforms and architectures of different size.
The three main types of ELF files are:

 Executable
 Relocatable
 Shared object

These file types hold the code, data, and information about the program that the operating system
and linkage editor need to perform the appropriate actions on these files.

Executable and linking format (ELF)


Internals of linking and Dynamic Linking

Static Linking:
When we click the .exe (executable) file of the program and it starts running, all the necessary
contents of the binary file have been loaded into the process’s virtual address space. However, most
programs also need to run functions from the system libraries, and these library functions also need to
be loaded.
In the simplest case, the necessary library functions are embedded directly in the program’s
executable binary file. Such a program is statically linked to its libraries, and statically linked
executable codes can commence running as soon as they are loaded.
Disadvantage:
Every program generated must contain copies of exactly the same common system library functions.
In terms of both physical memory and disk-space usage, it is much more efficient to load the system
libraries into memory only once. Dynamic linking allows this single loading to happen.

Dynamic Linking:
Every dynamically linked program contains a small, statically linked function that is called when the
program starts. This static function only maps the link library into memory and runs the code that the
function contains. The link library determines what are all the dynamic libraries which the program
requires along with the names of the variables and functions needed from those libraries by reading
the information contained in sections of the library.
After which it maps the libraries into the middle of virtual memory and resolves the references to the
symbols contained in those libraries. We don’t know where in the memory these shared libraries are
actually mapped: They are compiled into position-independent code (PIC), that can run at any
address in memory.
Advantage:
Memory requirements of the program are reduced. A DLL is loaded into memory only once, whereas
more than one application may use a single DLL at the moment, thus saving memory space.
Application support and maintenance costs are also lowered.

Internals of effective spinlock implementations on x86


In software engineering, a spinlock is a lock that causes a thread trying to acquire it to simply wait in
a loop ("spin") while repeatedly checking whether the lock is available. Since the thread remains
active but is not performing a useful task, the use of such a lock is a kind of busy waiting. Once
acquired, spinlocks will usually be held until they are explicitly released, although in some
implementations they may be automatically released if the thread being waited on (the one that holds
the lock) blocks or "goes to sleep".
Because they avoid overhead from operating system process rescheduling or context switching,
spinlocks are efficient if threads are likely to be blocked for only short periods. For this
reason, operating-system kernels often use spinlocks. However, spinlocks become wasteful if held for
longer durations, as they may prevent other threads from running and require rescheduling. The
longer a thread holds a lock, the greater the risk that the thread will be interrupted by the OS
scheduler while holding the lock. If this happens, other threads will be left "spinning" (repeatedly trying
to acquire the lock), while the thread holding the lock is not making progress towards releasing it. The
result is an indefinite postponement until the thread holding the lock can finish and release it. This is
especially true on a single-processor system, where each waiting thread of the same priority is likely
to waste its quantum (allocated time where a thread can run) spinning until the thread that holds the
lock is finally finished.
Implementing spinlocks correctly is challenging because programmers must take into account the
possibility of simultaneous access to the lock, which could cause race conditions. Generally, such an
implementation is possible only with special assembly-language instructions, such as atomic test-and-
set operations and cannot be easily implemented in programming languages not supporting truly
atomic operations. On architectures without such operations, or if high-level language implementation
is required, a non-atomic locking algorithm may be used, e.g. Peterson's algorithm. However, such an
implementation may require more memory than a spinlock, be slower to allow progress after
unlocking, and may not be implementable in a high-level language if out-of-order execution is allowed.

Example implementation

The following example uses x86 assembly language to implement a spinlock. It will work on
any Intel 80386 compatible processor.

; Intel syntax

locked: ; The lock variable. 1 = locked, 0 = unlocked.


dd 0

spin_lock:
mov eax, 1 ; Set the EAX register to 1.
xchg eax, [locked] ; Atomically swap the EAX register with
; the lock variable.
; This will always store 1 to the lock,
leaving
; the previous value in the EAX register.
test eax, eax ; Test EAX with itself. Among other things,
this will
; set the processor's Zero Flag if EAX is 0.
; If EAX is 0, then the lock was unlocked and
; we just locked it.
; Otherwise, EAX is 1 and we didn't acquire
the lock.
jnz spin_lock ; Jump back to the MOV instruction if the Zero
Flag is
; not set; the lock was previously locked,
and so
; we need to spin until it becomes unlocked.
ret ; The lock has been acquired, return to the
calling
; function.

spin_unlock:
xor eax, eax ; Set the EAX register to 0.
xchg eax, [locked] ; Atomically swap the EAX register with
; the lock variable.
ret ; The lock has been released.

Opensolaris adaptive mutexes


Solaris implements variety of locks to support multitasking, multithreading and multiprocessing. It uses
adaptive mutexes, conditional variables, semaphores, read-write locks, turnstiles to control access to
critical sections.
An adaptive mutex uses for protecting every critical data item which are only accessed by short code
segments.
On A multiprocessor system it starts as a standard semaphore spin-lock. If the lock is held by a thread
which is running on another CPU then the thread spins. If the lock is held by a thread which is currently
in run state,the thread blocks, going to sleep until it is awakened by the signal of releasing the lock.
The spin-waiting method is exceedingly inefficient if code segment is longer. So conditional variables,
semaphores are used for them.
Solaris provides Read-Write lock to protect the data are frequently accessed by long section of code
usually in read-only manner.
It uses turnstiles to order the list of threads waiting to acquire either an adaptive mutex or read-writer
lock. Turnstile is a queue structure containing threads blocked on a lock. They are per lock holding
thread, not per object. Turnstiles are organized according to priority-inheritance which gives the running
thread the highest of the priorities of the threads in its turnstiles to prevent priority inversion.

Locking mechanisms are used by kernel is also used by user-level threads, so that the locks are
available both inside and outside of the kernel. The difference is only that priority-inheritance in only
used in kernel, user-level thread does not provide this functionality.
To optimize Solaris performance, developers refine the locking methods as locks are used frequently
and typically for crucial kernel functions, tuning their implementations and use to gain great
performance.

Rationale and implementation optimization.

Contrary to popular belief, the Windows operating system is not any less in regard to performance
compared to its contemporary like macOS and Linux. In this article, we will look into various ways you
can optimize the performance of your Windows 10 machine.
Here are 13 ways you can optimize the performance of your Windows 10 computer.
1. Upgrading Windows
Updating Windows contains new features, better security, eliminate bugs, and also included
performance enhancements. To update Windows, follow these steps –
1. Open Settings.
2. Click on Update & Security.
3. Click on Windows Update.
4. Click the Check for updates button and download the updates.

2. Updating Device drivers


A device driver is a computer program that operates or works intermediate between the operating
system and particular devices to communicate. So, it is necessary to update device drivers by using
these steps :
1. Open Start.
2. Search for Device Manager.
3. Expand the branch for the device you want to update.
4. Right-click the device, and select the Update Driver option.

3. Optimize your Hard Drive


Fragmentation occurs when the allotted memory is larger than the memory requested by the program,
then the difference between allotted and requested memory is known as fragmentation. So, we need
to run a disk defragmenter to rearrange fragmentation which makes drives work more efficiently. To
do so follow the below steps:
1. Select the search bar on the taskbar and enter defrag.
2. Select Defragment and Optimize Drives.
3. Select the disk drive you want to optimize.
4. Select the Optimize button.
4. Disk Clean-up
Disk Clean-up is a computer maintenance utility tool used to remove temporary files, files in the
recycle bin, and other items that you no longer need. To do so follow the below steps:
1. Type Disk clean-up in the search bar on the taskbar.
2. Select the drives you want to clean.
3. Select the folder or file types to get rid of.
4. Select OK.
To claim more space, you can also delete system files as follows:
1. In Disk Clean-up, select Clean up system files.
2. Select the file types to get rid of. To get a description of the file type, just click on it.
3. Select OK.
5. Adjust for Best Performance
Lowering or customizing the settings of Windows visual effects will increase the performance of your
devices. Follow the below steps to achieve so:
1. Search “System” in the search bar on the taskbar.
2. Click Advanced system settings and it will open the System Properties menu.
3. On the Advanced tab under Performance, click Settings and it will open Performance Options.
4. Click on the Visual Effects tab, either choose to adjust for best performance or go for Custom.
6. Turn off Transparency Effects
Transparency effects will make your Windows device slow as it consumes resources. You can disable
transparency to reclaim that resources by the following steps:
1. Open the Settings by right-clicking on Start from the taskbar.
2. Go to Personalization.
3. Go to Colors.
4. Turn off Transparency effects.

7. Uninstall Programs you never use


Reclaim the hard disk space by uninstalling the programs which are not in use for a long time. To
reclaim your memory on the hard disk, follow the below steps:
1. In the search box on the taskbar, type Control Panel.
2. After opening the Control Panel, select Programs and Features.
3. Select the program you want to remove and select Uninstall or Uninstall/Change.
4. Then follow the directions on the screen.

8. Disable Background Apps


There may be some bloatware (pre-installed applications) on your Windows device which runs in the
background even if you are using them. You can even disable the background applications by
following the below steps:
1. Search the “Background applications” on the search bar.
2. Disable the application which is not required.

9. Limit Start-up programs


Start-up programs are the programs that are configured to start when you log in. In most cases, apps
will start minimized or may only start in background tasks. To disable start-up programs, follow the
below steps:
1. Right-click on Start and open Settings.
2. Go to Apps.
3. Go to Startup and disable the application which causes more impactful task in the background.
10. Check for Viruses and Spyware
Virus and spyware can not only slow down your device but even can corrupt your data and do identity
theft. So, it’s better to scan for viruses and spyware to remove it by using Windows Defender/Security
or other anti-spyware programs. To use Windows Security follow the below steps:
1. Select the search bar on the taskbar and enter Windows Security.
2. Select Virus and threat protection and click on the Scan option.
3. Select one of the scan options like Quick Scan, Full Scan, and Custom Scan.
4. Press Scan now button.

11. Change the size of Virtual Memory


A paging file is an area on the hard disk that Window uses as if it were RAM. We can adjust this
paging file size to improve window performance. To change the size of virtual memory follow the
below steps:
1. Select the search bar on the taskbar and open View advanced system settings, which will
open System Properties.
2. In System Properties, select the Advanced tab.
3. Then select Settings under Performance.
4. In Performance Options, select the Advanced tab.
5. Then select Change under the Virtual memory.
6. By default, Windows automatically manage paging file size for all drives check box is checked we
can uncheck it to makes changes.
7. Under Paging file size for each drive, click on the drive that contains the paging file to change the
paging file size.
8. Click Custom size, type a new size in megabytes in the Initial size (MB) or Maximum size (MB)
box, click Set, and then click OK.
9. Restart your device.
12. Close running applications that are not in use
Every running application consumes resources like CPU and memory. So, it is necessary to close all
the application which is not used to reclaim the allocated resources.
13. Turn Off Tips and Notifications
We can also stop some background services that provide tips and suggestions to use Windows. To
do that follow these steps:
1. Search the “Notifications and actions” in the search bar on the taskbar.
2. Uncheck Get tips, tricks, and suggestions as you use windows.

14. Improve Hardware


To improve the performance of Windows PC we can also upgrade the hardware. Adding better
technology hardware will definitely improve the read and write speed of your device.
1. Add more RAM.
2. Use solid-state disk.
Difference between Preemptive and Non-Preemptive Kernel

1. Preemptive Kernel :
Preemptive Kernel, as name suggests, is a type of kernel that always executes the highest priority
task that are ready to run. It cannot use-non-reentrant functions unless and until functions are mutual
exclusive.
Example : Linux 2-6

2. Non-Preemptive Kernel :
Non-Preemptive Kernel, as name suggests, is a type of kernel that is free from race conditions on
kernel data structures as only one process is active in kernel at a time. It is considered as a serious
drawback for real time applications as it does not allow preemption of process running in kernel mode.
Example : Linux 2.4

Difference between Preemptive and Non-Preemptive Kernel in OS :

Preemptive Kernel Non-Preemptive Kernel

It is a process that might be replaced It is a process that continuous to run until it finishes
immediately. handling execution handler or voluntarily
relinquishes CPU.

It is more suitable for real time programming It is less suitable for real-time programming as
as compared to non-preemptive kernels. compared to preemptive kernel.

In this, higher priority task that are ready to In this, each and every task are explicitly given up
run is given CPU control. CPU control.

It generally allows preemption even in kernel It generally does not allow preemption of process
mode. running in kernel mode.

Responsive time is deterministic and is more Response time is nondeterministic and is less
responsive as compared to non-preemptive responsive as compared to preemptive kernel.
kernel.

Higher priority task becomes ready, currently Higher priority task might have to wait for long
running task is suspended and moved to time.
ready queue.

It does not require semaphores. Shared data generally requires semaphores.

It cannot use non-reentrant code. It can use non-reentrant code.

It is more difficult to design preemptive It is less difficult to design non-preemptive kernels


kernels as compared to non-preemptive as compared to preemptive kernels.
kernel.

They are more secure and more useful in They are less secure and less useful in real-world
real-world scenarios. scenarios.
Effects of modern Memory hierarchies and related optimizations
In computer architecture, the memory hierarchy separates computer storage into a hierarchy based
on response time. Since response time, complexity, and capacity are related, the levels may also be
distinguished by their performance and controlling technologies.Memory hierarchy affects
performance in computer architectural design, algorithm predictions, and lower
level programming constructs involving locality of reference.
Designing for high performance requires considering the restrictions of the memory hierarchy, i.e. the
size and capabilities of each component. Each of the various components can be viewed as part of a
hierarchy of memories (m 1, m2, ..., mn) in which each member mi is typically smaller and faster than
the next highest member mi+1 of the hierarchy. To limit waiting by higher levels, a lower level will
respond by filling a buffer and then signaling for activating the transfer.
There are four major storage levels.

 Internal – Processor registers and cache.


 Main – the system RAM and controller cards.
 On-line mass storage – Secondary storage.
 Off-line bulk storage – Tertiary and Off-line storage.
This is a general memory hierarchy structuring. Many other structures are useful. For example, a
paging algorithm may be considered as a level for virtual memory when designing a computer
architecture, and one can include a level of nearline storage between online and offline storage.
Unit III
Process and thread Kernel data structures

1. Process:
Process is an activity of executing a program. Process is of two types – User process and System
process. Process control block controls the operation of the process.

2. Kernel Thread:
Kernel thread is a type of thread in which threads of a process are managed at kernel level. Kernel
threads are scheduled by operating system (kernel mode).

Difference between Process and Kernel Thread:


PROCESS KERNEL THREAD

Process is a program being executed. Kernel thread is the thread managed at kernel
level.

It is high overhead. It is medium overhead.

There is no sharing between processes. Kernel threads share address space.

Process is scheduled by operating system Kernel thread is scheduled by operating system


using process table. using thread table.

It is heavy weight activity. It is light weight as compared to process.

It can be suspended. It can not be suspended.

Suspension of a process does not affect Suspension of kernel thread leads to all the
other processes. threads stop running.

Its types are – user process and system Its types are – kernel level single thread and
process. kernel level multi thread.
Process Table and Process Control Block (PCB)
While creating a process the operating system performs several operations. To identify the
processes, it assigns a process identification number (PID) to each process. As the operating system
supports multi-programming, it needs to keep track of all the processes. For this task, the process
control block (PCB) is used to track the process’s execution status. Each block of memory contains
information about the process state, program counter, stack pointer, status of opened files, scheduling
algorithms, etc. All these information is required and must be saved when the process is switched
from one state to another. When the process makes a transition from one state to another, the
operating system must update information in the process’s PCB.
A process control block (PCB) contains information about the process, i.e. registers, quantum, priority,
etc. The process table is an array of PCB’s, that means logically contains a PCB for all of the current
processes in the system.

 Pointer – It is a stack pointer which is required to be saved when the process is switched from
one state to another to retain the current position of the process.
 Process state – It stores the respective state of the process.
 Process number – Every process is assigned with a unique id known as process ID or PID which
stores the process identifier.
 Program counter – It stores the counter which contains the address of the next instruction that is
to be executed for the process.
 Register – These are the CPU registers which includes: accumulator, base, registers and general
purpose registers.
 Memory limits – This field contains the information about memory management system used by
operating system. This may include the page tables, segment tables etc.
 Open files list – This information includes the list of files opened for a process.

Miscellaneous accounting and status data – This field includes information about the amount of
CPU used, time constraints, jobs or process number, etc.
The process control block stores the register content also known as execution content of the
processor when it was blocked from running. This execution content architecture enables the
operating system to restore a process’s execution context when the process returns to the running
state. When the process makes a transition from one state to another, the operating system updates
its information in the process’s PCB. The operating system maintains pointers to each process’s PCB
in a process table so that it can access the PCB quickly.
Lookup
You can configure the stage to complete a lookup operation on the database for each input record
(referred to as the key record) and return rows that match the criteria that are specified by that record.

The lookup operation is performed by running a parameterized SELECT statement which contains
a WHERE clause with parameters that are associated with the columns marked as Key columns in
the records that represent key records for the lookup.

On server canvas the job configuration for the database lookup requires that the Transformer stage is
used in combination with the database stage. The Transformer stage has an input link on which the
key records arrive to be used as input for the lookup query. It also has one or more reference links
coming from the database stage. The database stage is provided with input key records on this link.
For each input key record, the database stage runs the parameterized SELECT statement with the
key record values used in the WHERE clause and provides the corresponding matching records to
the Transformer stage. Those records are then processed and routed by the Transformer stage to
one or more of its output links to be further processed by the downstream stages in the job.

In some cases the SELECT lookup statement may return multiple record matches. The user can
specify whether the stage should log a message when this happens.

Allocation and management of new structures

Memory allocation:

To gain proper memory utilization, memory allocation must be allocated efficient manner. One of the
simplest methods for allocating memory is to divide memory into several fixed-sized partitions and
each partition contains exactly one process. Thus, the degree of multiprogramming is obtained by
the number of partitions.
Multiple partition allocation: In this method, a process is selected from the input queue and loaded
into the free partition. When the process terminates, the partition becomes available for other
processes.
Fixed partition allocation: In this method, the operating system maintains a table that indicates
which parts of memory are available and which are occupied by processes. Initially, all memory is
available for user processes and is considered one large block of available memory. This available
memory is known as “Hole”. When the process arrives and needs memory, we search for a hole that
is large enough to store this process. If the requirement fulfills then we allocate memory to process,
otherwise keeping the rest available to satisfy future requests. While allocating a memory sometimes
dynamic storage allocation problems occur, which concerns how to satisfy a request of size n from a
list of free holes. There are some solutions to this problem:
First fit:-
In the first fit, the first available free hole fulfills the requirement of the process allocated.

Here, in this diagram 40 KB memory block is the first available free hole that can store process A
(size of 25 KB), because the first two blocks did not have sufficient memory space.
Best fit:-
In the best fit, allocate the smallest hole that is big enough to process requirements. For this, we
search the entire list, unless the list is ordered by size.

Here in this example, first, we traverse the complete list and find the last hole 25KB is the best
suitable hole for Process A(size 25KB).
In this method memory utilization is maximum as compared to other memory allocation techniques.
Worst fit:-In the worst fit, allocate the largest available hole to process. This method produces the
largest leftover hole.
Here in this example, Process A (Size 25 KB) is allocated to the largest available memory block
which is 60KB. Inefficient memory utilization is a major issue in the worst fit.

/proc internals (proc file system in Linux)

Proc file system (procfs) is virtual file system created on fly when system boots and is dissolved at
time of system shut down.
It contains useful information about the processes that are currently running, it is regarded as control
and information center for kernel.
The proc file system also provides communication medium between kernel space and user space.
Below is snapshot of /proc from my PC.

ls -l /proc

total 0
dr-xr-xr-x 9 root root 0 Mar 31 21:34 1
dr-xr-xr-x 9 root root 0 Mar 31 21:34 10
dr-xr-xr-x 9 avahi avahi 0 Mar 31 21:34 1034
dr-xr-xr-x 9 root root 0 Mar 31 21:34 1036
dr-xr-xr-x 9 root root 0 Mar 31 21:34 1039
dr-xr-xr-x 9 root root 0 Mar 31 21:34 1041
dr-xr-xr-x 9 root root 0 Mar 31 21:34 1043
dr-xr-xr-x 9 root root 0 Mar 31 21:34 1044
dr-xr-xr-x 9 root root 0 Mar 31 21:34 1048
dr-xr-xr-x 9 root root 0 Mar 31 21:34 105
dr-xr-xr-x 9 root root 0 Mar 31 21:34 1078
dr-xr-xr-x 9 root root 0 Mar 31 21:34 11
dr-xr-xr-x 9 root root 0 Mar 31 21:34 1121
dr-xr-xr-x 9 lp lp 0 Mar 31 21:34 1146
dr-xr-xr-x 9 postgres postgres 0 Mar 31 21:34 1149
dr-xr-xr-x 9 mysql mysql 0 Mar 31 21:34 1169
dr-xr-xr-x 9 postgres postgres 0 Mar 31 21:34 1180
dr-xr-xr-x 9 postgres postgres 0 Mar 31 21:34 1181
dr-xr-xr-x 9 postgres postgres 0 Mar 31 21:34 1182
dr-xr-xr-x 9 postgres postgres 0 Mar 31 21:34 1183
dr-xr-xr-x 9 postgres postgres 0 Mar 31 21:34 1184
dr-xr-xr-x 9 root root 0 Mar 31 21:34 1186
dr-xr-xr-x 9 root root 0 Mar 31 21:34 12

...
If you list the directories, you will find that for each PID of a process there is
dedicated directory.
You can check directories only on terminal using
ls -l /proc | grep '^d'
Now let’s check for particular process of assigned PID, you can get the PID of any running process
from ps command
ps -aux
Output:

ps -aux command output

Now check the highlighted process with PID=7494, you can check that there is entry
for this process in /proc file system.
ls -ltr /proc/7494
Output:
total 0
-rw-r--r-- 1 mandeep mandeep 0 Apr 1 01:14 oom_score_adj
dr-xr-xr-x 13 mandeep mandeep 0 Apr 1 01:14 task
-r--r--r-- 1 mandeep mandeep 0 Apr 1 01:16 status
-r--r--r-- 1 mandeep mandeep 0 Apr 1 01:16 stat
-r--r--r-- 1 mandeep mandeep 0 Apr 1 01:16 cmdline
-r--r--r-- 1 mandeep mandeep 0 Apr 1 01:17 wchan
-rw-r--r-- 1 mandeep mandeep 0 Apr 1 01:17 uid_map
-rw-rw-rw- 1 mandeep mandeep 0 Apr 1 01:17 timerslack_ns
-r--r--r-- 1 mandeep mandeep 0 Apr 1 01:17 timers
-r-------- 1 mandeep mandeep 0 Apr 1 01:17 syscall
-r--r--r-- 1 mandeep mandeep 0 Apr 1 01:17 statm
-r-------- 1 mandeep mandeep 0 Apr 1 01:17 stack
-r--r--r-- 1 mandeep mandeep 0 Apr 1 01:17 smaps
-rw-r--r-- 1 mandeep mandeep 0 Apr 1 01:17 setgroups
-r--r--r-- 1 mandeep mandeep 0 Apr 1 01:17 sessionid
-r--r--r-- 1 mandeep mandeep 0 Apr 1 01:17 schedstat
-rw-r--r-- 1 mandeep mandeep 0 Apr 1 01:17 sched
lrwxrwxrwx 1 mandeep mandeep 0 Apr 1 01:17 root ->
/proc/2341/fdinfo
-rw-r--r-- 1 mandeep mandeep 0 Apr 1 01:17 projid_map
-r-------- 1 mandeep mandeep 0 Apr 1 01:17 personality

...
In linux, /proc includes a directory for each running process, including kernel
processes, in directories named /proc/PID, these are the directories present:
directory description
/proc/PID/cmdline Command line arguments.
/proc/PID/cpu Current and last cpu in which it was executed.
/proc/PID/cwd Link to the current working directory.
/proc/PID/environ Values of environment variables.
/proc/PID/exe Link to the executable of this process.
/proc/PID/fd Directory, which contains all file descriptors.
/proc/PID/maps Memory maps to executables and library files.
/proc/PID/mem Memory held by this process.
/proc/PID/root Link to the root directory of this process.
/proc/PID/stat Process status.
/proc/PID/statm Process memory status information.
/proc/PID/status Process status in human readable form.

Some other files in /proc file system are:


file description
/proc/crypto list of available cryptographic modules
/proc/diskstats information (including device numbers) for each of the logical disk
devices
/proc/filesystems list of the file systems supported by the kernel at the time of
listing
/proc/kmsg holding messages output by the kernel
/proc/meminfo summary of how the kernel is managing its memory.
/proc/scsi information about any devices connected via a SCSI or RAID
controller
/proc/tty information about the current terminals
/proc/version containing the Linux kernel version, distribution number, gcc
version number (used to build the kernel) and any other pertinent
information relating to the version of the kernel currently running
For example, the contents of /proc/crypto are
less /proc/crypto

name : ccm(aes)
driver : ccm_base(ctr(aes-aesni), cbcmac(aes-aesni))
module : ccm
priority : 300
refcnt : 2
selftest : passed
internal : no
type : aead
async : no
blocksize : 1
ivsize : 16
maxauthsize : 16
geniv :

name : ctr(aes)
driver : ctr(aes-aesni)
module : kernel
priority : 300
refcnt : 3
selftest : passed
internal : no
type : blkcipher
blocksize : 1
min keysize : 16
max keysize : 32
ivsize : 16
geniv : chainiv

Optimizations
Optimization and observability go hand in hand in the sense that optimizing performance first requires
that you have visibility. When a system is observable, you’re able to know the current state/behavior
of the system and where performance bottlenecks exist. If a team lacks this insight into their system,
they will resort to guessing, so observability plays a key role in managing and optimizing
performance.

Virtual file system and the layering of a file system call from API to
driver

A virtual file system (VFS) is programming that forms an interface between an operating
system's kernel and a more concrete file system.
The VFS serves as an abstraction layer that gives applications access to different types of file
systems and local and network storage devices. For that reason, a VFS may also be known as
a virtual file system switch. It also manages the data storage and retrieval between the operating
system and the storage sub-system. The VFS maintains a cache of directory lookups to enable easy
location of frequently accessed directories.

Sun Microsystems introduced one of the first VFSes on Unix-like systems. The VMware Virtual
Machine File System (VMFS), NTFS, Linux's Global File System (GFS) and the Oracle Clustered File
System (OCFS) are all examples of virtual file systems.

API Layering requires that binaries in Windows Driver packages call only those APIs and DDIs that
are included in UWP-based editions of Windows 10 or are from a curated set of Win32 APIs. API
Layering is an extension of the previous "U" requirement that was a part of DCHU design principles.

To see which platform an API supports, visit the documentation page for the API and examine
the Target Platform entry of the Requirements section. Windows Drivers must only use APIs or DDIs
that have a Target Platform listed as Universal, meaning the subset of functionality that is available
on all Windows offerings.

The Windows API Sets page describes a set of best practices and tools for determining whether an
API is available on a particular platform.

Validating API Layering

Api Validator is the main tool used to validate API Layering compliance for Windows Drivers. Api
Validator ships as part of the Windows Driver Kit (WDK).

See Validating Windows Drivers for more details on using Api Validator to verify that a Windows
Driver meets the API Layering requirement.

Object-orientation patterns in kernel code


Despite the fact that the Linux Kernel is mostly written in C, it makes broad use of some techniques
from the field of object-oriented programming. Developers wanting to use these object-oriented
techniques receive little support or guidance from the language and so are left to fend for themselves.
As is often the case, this is a double-edged sword. The developer has enough flexibility to do really
cool things, and equally the flexibility to do really stupid things, and it isn't always clear at first glance
which is which, or more accurately: where on the spectrum a particular approach sits.

Instead of looking to the language to provide guidance, a software engineer must look to established
practice to find out what works well and what is best avoided. Interpreting established practice is not
always as easy as one might like and the effort, once made, is worth preserving. To preserve that
effort on your author's part, this article brings another installment in an occasional series on Linux
Kernel Design Patterns and attempts to set out - with examples - the design patterns in the Linux
Kernel which effect an object-oriented style of programming.

Rather than providing a brief introduction to the object-oriented style, tempting though that is, we will
assume the reader has a basic knowledge of objects, classes, methods, inheritance, and similar
terms. For those as yet unfamiliar with these, there are plenty of resources to be found elsewhere on
the web.

Method Dispatch

The large variety of styles of inheritance and rules for its usage in languages today seems to suggest
that there is no uniform understanding of what "object-oriented" really means. The term is a bit like
"love": everyone thinks they know what it means but when you get down to details people can find
they have very different ideas. While what it means to be "oriented" might not be clear, what we mean
by an "object" does seem to be uniformly agreed upon. It is simply an abstraction comprising both
state and behavior. An object is like a record (Pascal) or struct (C), except that some of the names of
members refer to functions which act on the other fields in the object. These function members are
sometimes referred to a "methods".

NULL function pointers

The first observation is that some function pointers in some vtables are allowed to be NULL. Clearly
trying to call such a function would be futile, so the code that calls into these methods generally
contains an explicit test for the pointer being NULL. There are a few different reasons for these NULL
pointers. Probably easiest to justify is the incremental development reason. Because of the way
vtable structures are initialized, adding a new function pointer to the structure definition causes all
existing table declarations to initialise that pointer to NULL. Thus it is possible to add a caller of the
new method before any instance supports that method, and have it check for NULL and perform a
default behavior. Then as incremental development continues those vtable instances which need it
can get non-default methods.

A review of OO implementations generics (C++ vtables, etc)

Generics is the idea to allow type (Integer, String, … etc and user-defined types) to be a parameter to
methods, classes and interfaces. For example, classes like an array, map, etc, which can be used
using generics very efficiently. We can use them for any type.
The method of Generic Programming is implemented to increase the efficiency of the code. Generic
Programming enables the programmer to write a general algorithm which will work with all data types.
It eliminates the need to create different algorithms if the data type is an integer, string or a character.
The advantages of Generic Programming are
1. Code Reusability
2. Avoid Function Overloading
3. Once written it can be used for multiple times and cases.
Generics can be implemented in C++ using Templates. Template is a simple and yet very powerful
tool in C++. The simple idea is to pass data type as a parameter so that we don’t need to write the
same code for different data types. For example, a software company may need sort() for different
data types. Rather than writing and maintaining the multiple codes, we can write one sort() and pass
data type as a parameter.

Generic Functions using Template:

We write a generic function that can be used for different data types. Examples of function templates
are sort(), max(), min(), printArray()

#include <iostream>

using namespace std;


// One function works for all data types.

// This would work even for user defined types

// if operator '>' is overloaded

template <typename T>

T myMax(T x, T y)

return (x > y) ? x : y;

int main()

// Call myMax for int

cout << myMax<int>(3, 7) << endl;

// call myMax for double

cout << myMax<double>(3.0, 7.0) << endl;

// call myMax for char

cout << myMax<char>('g', 'e') << endl;


return 0;

Output:
7
7
g

Virtual method table


A virtual method table (VMT), virtual function table, virtual call table, dispatch table, vtable,
or vftable is a mechanism used in a programming language to support dynamic dispatch (or run-
time method binding).
Whenever a class defines a virtual function (or method), most compilers add a hidden member
variable to the class that points to an array of pointers to (virtual) functions called the virtual method
table. These pointers are used at runtime to invoke the appropriate function implementations,
because at compile time it may not yet be known if the base function is to be called or a derived one
implemented by a class that inherits from the base class.
There are many different ways to implement such dynamic dispatch, but use of virtual method tables
is especially common among C++ and related languages (such as D and C#). Languages that
separate the programmatic interface of objects from the implementation, like Visual Basic and Delphi,
also tend to use this approach, because it allows objects to use a different implementation simply by
using a different set of method pointers.
UNIT IV

Opensolaris and linux virtual memory and address space


structures
Linux supports virtual memory, that is, using a disk as an extension of RAM so that the effective
size of usable memory grows correspondingly. The kernel will write the contents of a currently
unused block of memory to the hard disk so that the memory can be used for another purpose.

Address space is the amount of memory allocated for all possible addresses for a
computational entity -- for example, a device, a file, a server or a networked computer. The system
provides each device and process address space that holds a specific portion of the processor's
address space.
Tying top-down and bottom-up object and memory page lookups
with the actual x86 page translation and segmentation
Paging:
Paging is a method or techniques which is used for non-contiguous memory allocation. It is a fixed
size partitioning theme (scheme). In paging, both main memory and secondary memory are divided
into equal fixed size partitions. The partitions of secondary memory area unit and main memory
area unit known as as pages and frames respectively.
Paging is a memory management method accustomed fetch processes from the secondary
memory into the main memory in the form of pages. in paging, each process is split into parts
wherever size of every part is same as the page size. The size of the last half could also be but the
page size. The pages of process area unit hold on within the frames of main memory relying upon
their accessibility.

Segmentation:
Segmentation is another non-contiguous memory allocation scheme like paging. like paging, in
segmentation, process isn’t divided indiscriminately into mounted(fixed) size pages. It is variable
size partitioning theme. like paging, in segmentation, secondary and main memory are not divided
into partitions of equal size. The partitions of secondary memory area unit known as as segments.
The details concerning every segment are hold in a table known as as segmentation table.
Segment table contains two main data concerning segment, one is Base, which is the bottom
address of the segment and another is Limit, which is the length of the segment.
In segmentation, CPU generates logical address that contains Segment number and segment
offset. If the segment offset is a smaller amount than the limit then the address called valid address
otherwise it throws miscalculation because the address is invalid.
The above figure shows the translation of logical address to physical address.

S.NO Paging Segmentation


1. In paging, program is divided into In segmentation, program is divided into
fixed or mounted size pages. variable size sections.
2. For paging operating system is For segmentation compiler is accountable.
accountable.
3. Page size is determined by Here, the section size is given by the user.
hardware.
4. It is faster in the comparison of Segmentation is slow.
segmentation.
5. Paging could result in internal Segmentation could result in external
fragmentation. fragmentation.
6. In paging, logical address is split Here, logical address is split into section
into page number and page offset. number and section offset.

7. Paging comprises a page table While segmentation also comprises the


which encloses the base address segment table which encloses segment
of every page. number and segment offset.
8. Page table is employed to keep up Section Table maintains the section data.
the page data.
9. In paging, operating system must In segmentation, operating system maintain
maintain a free frame list. a list of holes in main memory.
10. Paging is invisible to the user. Segmentation is visible to the user.
11. In paging, processor needs page In segmentation, processor uses segment
number, offset to calculate number, offset to calculate full address.
absolute address.
12. It is hard to allow sharing of Facilitates sharing of procedures between
procedures between processes. the processes.
13 In paging, a programmer cannot It can efficiently handle data structures.
efficiently handle data structure.
14. In this protection is hard to apply. Easy to apply protection in segmentation.

How file operations


As you know that files are used to store the required information for its later uses.

There are many file operations that can be perform by the computer system.

Here are the list of some common file operations:

 File Create operation


 File Delete operation
 File Open operation
 File Close operation
 File Read operation
 File Write operation
 File Append operation
 File Seek operation
 File Get attribute operation
 File Set attribute operation
 File Rename operation

Now let's describe briefly about all the above most common operations that can be performed with
files.

File Create Operation

The file is created with no data.

The file create operation is the first step of the file.

Without creating any file, there is no any operation can be performed.

File Delete Operation

File must has to be deleted when it is no longer needed just to free up the disk space.

The file delete operation is the last step of the file.


After deleting the file, it doesn't exist.

File Open Operation

The process must open the file before using it.

File Close Operation

The file must be closed to free up the internal table space, when all the accesses are finished and the
attributes and the disk addresses are no longer needed.

File Read Operation

The file read operation is performed just to read the data that are stored in the required file.

File Write Operation

The file write operation is used to write the data to the file, again, generally at the current position.

File Append Operation

The file append operation is same as the file write operation except that the file append operation only
add the data at the end of the file.

File Seek Operation

For random access files, a method is needed just to specify from where to take the data. Therefore,
the file seek operation performs this task.

File Get Attribute Operation

The file get attributes operation are performed by the processes when they need to read the file
attributes to do their required work.

File Set Attribute Operation

The file set attribute operation used to set some of the attributes (user settable attributes) after the file
has been created.

File Rename Operation

The file rename operation is used to change the name of the existing file.

I/O buffering

A buffer is a memory area that stores data being transferred between two devices or between a
device and an application.

Uses of I/O Buffering :


 Buffering is done to deal effectively with a speed mismatch between the producer and consumer
of the data stream.
 A buffer is produced in main memory to heap up the bytes received from modem.
 After receiving the data in the buffer, the data get transferred to disk from buffer in a single
operation.
 This process of data transfer is not instantaneous, therefore the modem needs another buffer in
order to store additional incoming data.
 When the first buffer got filled, then it is requested to transfer the data to disk.
 The modem then starts filling the additional incoming data in the second buffer while the data in
the first buffer getting transferred to disk.
 When both the buffers completed their tasks, then the modem switches back to the first buffer
while the data from the second buffer get transferred to the disk.
 The use of two buffers disintegrates the producer and the consumer of the data, thus minimizes
the time requirements between them.
 Buffering also provides variations for devices that have different data transfer sizes.
Types of various I/O buffering techniques :

1. Single buffer :
A buffer is provided by the operating system to the system portion of the main memory.
Block oriented device –
 System buffer takes the input.
 After taking the input, the block gets transferred to the user space by the process and then the
process requests for another block.
 Two blocks works simultaneously, when one block of data is processed by the user process, the
next block is being read in.
 OS can swap the processes.
 OS can record the data of system buffer to user processes.
Stream oriented device –

 Line- at a time operation is used for scroll made terminals. User inputs one line at a time, with a
carriage return signaling at the end of a line.
 Byte-at a time operation is used on forms mode, terminals when each keystroke is significant.

2. Double buffer :
Block oriented –
 There are two buffers in the system.
 One buffer is used by the driver or controller to store data while waiting for it to be taken by higher
level of the hierarchy.
 Other buffer is used to store data from the lower level module.
 Double buffering is also known as buffer swapping.
 A major disadvantage of double buffering is that the complexity of the process get increased.
 If the process performs rapid bursts of I/O, then using double buffering may be deficient.
Stream oriented –
 Line- at a time I/O, the user process need not be suspended for input or output, unless process
runs ahead of the double buffer.
 Byte- at a time operations, double buffer offers no advantage over a single buffer of twice the
length.
3. Circular buffer :

 When more than two buffers are used, the collection of buffers is itself referred to as a circular
buffer.
 In this, the data do not directly passed from the producer to the consumer because the data would
change due to overwriting of buffers before they had been consumed.
 The producer can only fill up to buffer i-1 while data in buffer i is waiting to be consumed.

Swapping all converged to using the same mechanism


Swapping is a memory management scheme in which any process can be temporarily swapped from
main memory to secondary memory so that the main memory can be made available for other
processes. It is used to improve main memory utilization. In secondary memory, the place where the
swapped-out process is stored is called swap space.

The purpose of the swapping in operating system

is to access the data present in the hard disk and bring it to RAM
so that the application programs can use it. The thing to remember is that swapping is used only
when data is not present in RAM
.

Although the process of swapping affects the performance of the system, it helps to run larger and more
than one process. This is the reason why swapping is also referred to as memory compaction.

The concept of swapping has divided into two more concepts: Swap-in and Swap-out.

o Swap-out is a method of removing a process from RAM and adding it to the hard disk.

o Swap-in is a method of removing a program from a hard disk and putting it back into the main
memory or RAM.
Example: Suppose the user process's size is 2048KB and is a standard hard disk where swapping
has a data transfer rate of 1Mbps. Now we will calculate how long it will take to transfer from main
memory to secondary memory.

1. User process size is 2048Kb


2. Data transfer rate is 1Mbps = 1024 kbps
3. Time = process size / transfer rate
4. = 2048 / 1024
5. = 2 seconds
6. = 2000 milliseconds
7. Now taking swap-in and swap-out time, the process will take 4000 milliseconds.

Advantages of Swapping
1. It helps the CPU to manage multiple processes within a single main memory.

2. It helps to create and use virtual memory.

3. Swapping allows the CPU to perform multiple tasks simultaneously. Therefore, processes do
not have to wait very long before they are executed.

4. It improves the main memory utilization.

Disadvantages of Swapping
1. If the computer system loses power, the user may lose all information related to the program in
case of substantial swapping activity.

2. If the swapping algorithm is not good, the composite method can increase the number of Page
Fault and decrease the overall processing performance.

Note:

o In a single tasking operating system, only one process occupies the user program area of
memory and stays in memory until the process is complete.

o In a multitasking operating system, a situation arises when all the active processes cannot
coordinate in the main memory, then a process is swap out from the main memory so that other
processes can enter it.

Kmem and Vmem allocators

The Solaris kernel memory (kmem) allocator provides a powerful set of debugging features that can
facilitate analysis of a kernel crash dump. This chapter discusses these debugging features, and the
MDB dcmds and walkers designed specifically for the allocator. Bonwick provides an overview of the
principles of the allocator itself. Refer to the header file <sys/kmem_impl.h> for the definitions of
allocator data structures. The kmem debugging features can be enabled on a production system to
enhance problem analysis, or on development systems to aid in debugging kernel software and device
drivers.

Note –
This guide reflects Solaris 9 implementation; this information might not be relevant, correct, or
applicable to past or future releases, since it reflects the current kernel implementation. It does not
define a public interface of any kind. All of the information provided about the kernel memory allocator
is subject to change in future Solaris releases.

Vmem Allocator

The kmem allocator relies on two lower-level system services to create slabs: a virtual address
allocator to provide kernel virtual addresses, and VM routines to back those addresses with physical
pages and establish virtual-to-physical translations. The scalability of large systems was limited by the
old virtual address allocator (the resource map allocator). It tended to fragment the address space
badly over time, its latency was linear in the number of fragments, and the whole thing was single-
threaded.

Virtual address allocation is, however, just one example of the more general problem of resource
allocation. For our purposes, a resource is anything that can be described by a set of integers. For
example: virtual addresses are subsets of the 64-bit integers; process IDs are subsets of the integers
[0, 30000]; and minor device numbers are subsets of the 32-bit integers.

In this section we describe the new general-purpose resource allocator, vmem, which provides
guaranteed constant-time performance with low fragmentation. Vmem appears to be the first resource
allocator that can do this.

We begin by providing background on the current state of the art. We then lay out the objectives of
vmem, describe the vmem interfaces, explain the implementation in detail, and discuss vmem's
performance (fragmentation, latency, and scalability) under both benchmarks and real-world
conditions.

OO approach to memory allocations

To gain proper memory utilization, memory allocation must be allocated efficient manner. One of the
simplest methods for allocating memory is to divide memory into several fixed-sized partitions and
each partition contains exactly one process. Thus, the degree of multiprogramming is obtained by
the number of partitions.
Multiple partition allocation: In this method, a process is selected from the input queue and loaded
into the free partition. When the process terminates, the partition becomes available for other
processes.
Fixed partition allocation: In this method, the operating system maintains a table that indicates
which parts of memory are available and which are occupied by processes. Initially, all memory is
available for user processes and is considered one large block of available memory. This available
memory is known as “Hole”. When the process arrives and needs memory, we search for a hole that
is large enough to store this process. If the requirement fulfills then we allocate memory to process,
otherwise keeping the rest available to satisfy future requests. While allocating a memory sometimes
dynamic storage allocation problems occur, which concerns how to satisfy a request of size n from a
list of free holes. There are some solutions to this problem:
First fit:-
In the first fit, the first available free hole fulfills the requirement of the process allocated.
Here, in this diagram 40 KB memory block is the first available free hole that can store process A
(size of 25 KB), because the first two blocks did not have sufficient memory space.
Best fit:-
In the best fit, allocate the smallest hole that is big enough to process requirements. For this, we
search the entire list, unless the list is ordered by size.

Here in this example, first, we traverse the complete list and find the last hole 25KB is the best
suitable hole for Process A(size 25KB).
In this method memory utilization is maximum as compared to other memory allocation techniques.
Worst fit:-In the worst fit, allocate the largest available hole to process. This method produces the
largest leftover hole.

Here in this example, Process A (Size 25 KB) is allocated to the largest available memory block
which is 60KB. Inefficient memory utilization is a major issue in the worst fit.
Challenges of multiple CPUs and Memory Hierarchy

In the Computer System Design, Memory Hierarchy is an enhancement to organize the memory such
that it can minimize the access time. The Memory Hierarchy was developed based on a program
behavior known as locality of references.The figure below clearly demonstrates the different levels of
memory hierarchy :

This Memory Hierarchy Design is divided into 2 main types:


1. External Memory or Secondary Memory –
Comprising of Magnetic Disk, Optical Disk, Magnetic Tape i.e. peripheral storage devices which
are accessible by the processor via I/O Module.
2. Internal Memory or Primary Memory –
Comprising of Main Memory, Cache Memory & CPU registers. This is directly accessible by the
processor.
We can infer the following characteristics of Memory Hierarchy Design from above figure:
1. Capacity:
It is the global volume of information the memory can store. As we move from top to bottom in the
Hierarchy, the capacity increases.
2. Access Time:
It is the time interval between the read/write request and the availability of the data. As we move
from top to bottom in the Hierarchy, the access time increases.
3. Performance:
Earlier when the computer system was designed without Memory Hierarchy design, the speed
gap increases between the CPU registers and Main Memory due to large difference in access
time. This results in lower performance of the system and thus, enhancement was required. This
enhancement was made in the form of Memory Hierarchy Design because of which the
performance of the system increases. One of the most significant ways to increase system
performance is minimizing how far down the memory hierarchy one has to go to manipulate data.
4. Cost per bit:
As we move from bottom to top in the Hierarchy, the cost per bit increases i.e. Internal Memory is
costlier than External Memory.
Security
Security refers to providing a protection system to computer system resources such as CPU, memory,
disk, software programs and most importantly data/information stored in the computer system. If a
computer program is run by an unauthorized user, then he/she may cause severe damage to computer
or data stored in it. So a computer system must be protected against unauthorized access, malicious
access to system memory, viruses, worms etc. We're going to discuss following topics in this chapter.

 Authentication
 One Time passwords
 Program Threats
 System Threats
 Computer Security Classifications

Integrity
An integrity checking and recovery (ICAR) system is presented here, which protects file system
integrity and automatically restores modified files. The system enables files cryptographic hashes
generation and verification, as well as configuration of security constraints. All of the crucial data,
including ICAR system binaries, file backups and hashes database are stored in a physically write-
protected storage to eliminate the threat of unauthorised modification. A buffering mechanism was
designed and implemented in the system to increase operation performance. Additionally, the system
supplies user tools for cryptographic hash generation and security database management. The
system is implemented as a kernel extension, compliant with the Linux security model. Experimental
evaluation of the system was performed and showed an approximate 10% performance degradation
in secured file access compared to regular access.

Isolation
Process isolation in computer programming is the segregation of different software processes to
prevent them from accessing memory space they do not own. The concept of process isolation helps
to improve operating system security by providing different privilege levels to certain programs and
restricting the memory those programs can use. Although there are many implementations of process
isolation, it is frequently used in web browsers to separate multiple tabs and to protect the core
browser itself should a process fail. It can be hardware based or software based, but both serve the
same purpose of limiting access to system resources and keeping programs isolated to their
own virtual address space.

Mediation
Mediation is an informal and flexible dispute resolution process. The mediator's role is to guide
the parties toward their own resolution. Through joint sessions and separate caucuses with parties,
the mediator helps both sides define the issues clearly, understand each other's position and move
closer to resolution.

Auditing
Auditing systems in modern operating systems collect detailed information about security-related
events. The audit or security logs generated by an auditing system facilitate identification of
attempted attacks, security policy improvement, security incident investigation, and review by
auditors.
MULTICS and MLS to modern UNIX
Multics ("Multiplexed Information and Computing Service") is an influential early time-
sharing operating system based on the concept of a single-level memory.It has been said that Multics
"has influenced all modern operating systems since, from microcomputers to mainframes."
Initial planning and development for Multics started in 1964, in Cambridge, Massachusetts. Originally
it was a cooperative project led by MIT (Project MAC with Fernando Corbató) along with General
Electric and Bell Labs. It was developed on the GE 645 computer, which was specially designed for it;
the first one was delivered to MIT in January, 1967.
Multics was conceived as a commercial product for General Electric, and became one for Honeywell,
albeit not very successfully. Due to its many novel and valuable ideas, Multics has had a significant
influence on computer science despite its faults.
Multics has numerous features intended to ensure high availability so that it would support
a computing utility similar to the telephone and electricity utilities. Modular hardware structure and
software architecture are used to achieve this. The system can grow in size by simply adding more of
the appropriate resource, be it computing power, main memory, or disk storage. Separate access
control lists on every file provide flexible information sharing, but complete privacy when needed.
Multics has a number of standard mechanisms to allow engineers to analyze the performance of the
system, as well as a number of adaptive performance optimization mechanisms.

SELinux type enforcement


Security-Enhanced Linux (SELinux) is a security architecture for Linux® systems that allows
administrators to have more control over who can access the system. It was originally developed by
the United States National Security Agency (NSA) as a series of patches to the Linux kernel using
Linux Security Modules (LSM).

SELinux was released to the open source community in 2000, and was integrated into the upstream
Linux kernel in 2003.

How does SELinux work?

SELinux defines access controls for the applications, processes, and files on a system. It uses
security policies, which are a set of rules that tell SELinux what can or can’t be accessed, to enforce
the access allowed by a policy.
When an application or process, known as a subject, makes a request to access an object, like a file,
SELinux checks with an access vector cache (AVC), where permissions are cached for subjects and
objects.

If SELinux is unable to make a decision about access based on the cached permissions, it sends the
request to the security server. The security server checks for the security context of the app or
process and the file. Security context is applied from the SELinux policy database. Permission is then
granted or denied.

If permission is denied, an "avc: denied" message will be available in /var/log.messages.

Implementations SELinux

SELinux is set up to default-deny, which means that every single access for which it has a hook in the
kernel must be explicitly allowed by policy. This means a policy file is comprised of a large amount of
information regarding rules, types, classes, permissions, and more. A full consideration of SELinux is
out of the scope of this document, but an understanding of how to write policy rules is now essential
when bringing up new Android devices. There is a great deal of information available regarding
SELinux already. See Supporting documentation for suggested resources.

Key files

To enable SELinux, integrate the latest Android kernel and then incorporate the files found in
the system/sepolicy directory. When compiled, those files comprise the SELinux kernel security policy
and cover the upstream Android operating system.

In general, you should not modify the system/sepolicy files directly. Instead, add or edit your own
device-specific policy files in the /device/manufacturer/device-name/sepolicy directory. In Android 8.0
and higher, the changes you make to these files should only affect policy in your vendor directory. For
more details on separation of public sepolicy in Android 8.0 and higher, see Customizing SEPolicy in
Android 8.0+. Regardless of Android version, you're still modifying these files:
Policy files

Files that end with *.te are SELinux policy source files, which define domains and their labels. You
may need to create new policy files in /device/manufacturer/device-name/sepolicy, but you should try
to update existing files where possible.

Context files

Context files are where you specify labels for your objects.

 file_contexts assigns labels to files and is used by various userspace components. As you create new
policies, create or update this file to assign new labels to files. To apply new file_contexts, rebuild the
filesystem image or run restorecon on the file to be relabeled. On upgrades, changes
to file_contexts are automatically applied to the system and userdata partitions as part of the upgrade.
Changes can also be automatically applied on upgrade to other partitions by
adding restorecon_recursive calls to your init.board.rc file after the partition has been mounted read-
write.

 genfs_contexts assigns labels to filesystems, such as proc or vfat that do not support extended
attributes. This configuration is loaded as part of the kernel policy but changes may not take effect for
in-core inodes, requiring a reboot or unmounting and re-mounting the filesystem to fully apply the
change. Specific labels may also be assigned to specific mounts, such as vfat using
the context=mount option.

 property_contexts assigns labels to Android system properties to control what processes can set
them. This configuration is read by the init process during startup.

 service_contexts assigns labels to Android binder services to control what processes can add
(register) and find (lookup) a binder reference for the service. This configuration is read by
the servicemanager process during startup.

 seapp_contexts assigns labels to app processes and /data/data directories. This configuration is read
by the zygote process on each app launch and by installd during startup.

 mac_permissions.xml assigns a seinfo tag to apps based on their signature and optionally their
package name. The seinfo tag can then be used as a key in the seapp_contexts file to assign a
specific label to all apps with that seinfo tag. This configuration is read by system_server during
startup.

 keystore2_key_contexts assigns labels to Keystore 2.0 namespaces. These namespace are enforced
by the keystore2 daemon. Keystore has always provided UID/AID based namespaces. Keystore 2.0
additionally enforces sepolicy defined namespaces. A detailed description of the format and
conventions of this file can be found here.

Kernel Hook systems and policies they enable


The term hooking covers a range of techniques used to alter or augment the behavior of an
operating system, an application or any other software components by intercepting function calls,
messages and events passed between the different software component. The code that performs the
interception of function calls, events or messages is called a hook. Typically hooks are inserted while
software is already running, but hooking is a tactic that can also be employed prior to the application
being started.

The two main methods of hooking are:

 Physical modification: Achieved by physically modifying an executable or library before an


application is running. Through techniques of reverse engineering, you can also achieve hooking.
This is typically used to intercept function calls to either monitor them or replace them entirely. For
example, by using a disassembler, the entry point of a function within a module can be found. It
can then be altered to instead dynamically load some other library module and then have it execute
desired methods within that loaded library. If applicable, another related approach by which
hooking can be achieved is by altering the import table of an executable. This table can be
modified to load any additional library modules as well as changing what external code is invoked
when a function is called by the application. An alternative method for achieving function hooking is
by intercepting function calls through a wrapper library. When creating a wrapper, you make your
own version of a library that an application loads with all the same functionality of the original library
that it will replace. That is, all the functions that are accessible are essentially the same between
the original and the replacement. This wrapper library can be designed to call any of the
functionality from the original library, or replace it with an entirely new set of logic.
 Runtime modification: Operating systems and software may provide the means to easily insert
event hooks at runtime. Microsoft Windows for example, allows you to insert hooks that can be
used to process or modify system events and application events for dialogs, scrollbars, and menus
as well as other items. It also allows a hook to insert, remove, process or modify keyboard and
mouse events. Linux provides another example where hooks can be used in a similar manner to
process network events within the kernel through NetFilter.

Hooking and Malware


When it comes to malware, and malware analysis, three main categories emerges:

 Malware that is non-intrusive, which means that the malware does not perform any modification to
the OS or processes within the OS in any way.
 Malware that is intrusive by modifying things which should never be modified (For example
kernel code, BIOS which has its HASH storedin TPM, MSR registers, and so on.). This type of
malware in general is easier to spot.
 Malware that is intrusive by modifying things which are designed to be modified (DATA sections),
this malware category in general is much harder to spot.

Hooking Techniques
There are about five main hooking techniques:

 IAT Hooks
 Inline Hooks
 SSDT Hooks
 SYSENTER_EIP Hook
 IRP Major Function Hook
IAT Hooks
The Import Address Table (IAT) is a really a lookup table to when an application is calling a function in
a different module. It can be in the form of both import by ordinal and import by name. Because a
compiled program cannot know the memory location of the libraries it depends upon, an indirect jump
is required whenever an API call is made. The general format looks like the following: "jmp dword ptr
ds:[address]". Because functions in DLLs change address, instead of calling a DLL function directly,
the application will make a call to the relevant jmp in its own jump table. When the application is
executed the loader will place a pointer to each required DLL function at the appropriate address in
the IAT.

The following diagram visualize an Import address table:


Based on the explanation above, if a rootkit for example would inject itself inside the application and
modify the addresses in the IAT, that rootkit would be able to gain control every time a target function
is called.

Bypass - Detection
Since the Export Address Table (EAT) of each DLL remains intact, an application could easily bypass
IAT hooks by just calling ‘GetProcAddress’ in order to get the real address of each DLL function. In
order to prevent this type of a bypass, a rootkit would likely hook
‘GetProcAddress’/’LdrGetProcedureAddress’ and use that to return fake addresses. In order to
bypass these type of hooks, manually implementing a local ‘GetProcAddress’ and use that local
function to get the actual or real function address.

Inline Hooks
Also called trampoline or detours hooks is a method of receiving control when calling a
function, before the function has done its job. The flow of execution is redirected by modifying the first
few (usually five) bytes of a target function. A standard way to achieve this is to overwrite the first 5
bytes of the function with a jump to a malicious block of code, the malicious code can then read the
function arguments and do whatever it needs. If the malicious code requires results from the original
function (the one it hooked). it may call the function by executing the five bytes that were overwritten,
then jump five bytes into the original function, which will miss the malicious jump/call to avoid infinite
recursion or redirection.

The following diagram illustrates the concept of inline hooks:


Bypass – Detection on Ring 3
While in user mode, inline hooks are usually place inside functions that are exported by a DLL. The
most accurate way to detect and bypass these hooks would be to compare each DLL against the
original code. First, the program would need to get a list of each DLL that is loaded, find the original
file and read it, align and load the sections into memory then perform base relocation. Once the new
copy of the DLL is loaded into memory, the application can walk the export address table and
compare each function vs that in the original DLL. In order to bypass hooks, an application can then
either replace the overwritten code using the code from the newly loaded DLL, this could resolve
imports in the newly loaded DLL and use it instead. It is important to be aware of the fact that some
DLLs will not work if more than one instance is loaded.

The above method of bypassing DLL hooks practically involves writing your own implementation
of LoadLibrary. Another method is to use manual DLL loading to detect or even fix EAT hooks, also,
EAT hooks are very uncommon.

Bypass – Detecting on Ring 0


Inter-modular jumps occur rarely in Kernel mode. Hooks in ntoskrnl can usually be detected by
disassembling each instruction in each function, then searching for jumps or calls that point outside
of ntoskrnl. It is also possible to use the same method explained for user mode hook detection by
using a driver that would be able to read each ntoskrnl module from disk, load it into memory, and
then compare the instructions against the original.

For inline hooks within drivers, scanning for jmp/call instructions that point outside of the driver body
is much more likely to result in false positives, however, non-standard drivers that are the target of
jumps/calls inside standard kernel drivers should raise a red flag. It is also possible to read drivers
from disk. As drivers generally do not export many functions, and IRP major function pointers are only
initialized at runtime, it would probably be required to compare the entire code section of the original
and new driver. It is important to note that relative calls/jumps are susceptible to changes during
relocation, this means that there will be some differences between the original and new driver,
however both relative calls/jumps should point to the same place.
SSDT Hooks
The System Service Dispatch Table (SSDT) is a table of pointers for various Zw/Nt [6] functions, that
are callable from user mode. A malicious application can replace pointers in the SSDT with pointers to
its own code.

Detection
All pointers in the SSDT should point to code within ntoskrnl, if any pointer is pointing outside
of ntsokrnl it is likely hooked. It's possible a rootkit could modify ntoskrnl.exe or one of the related
modules in memory, and slip some code into an empty space, in which case the pointer would still
point to within ntoskrnl. As far as I'm aware, functions starting with "Zw" are intercepted by SSDT
hooks, but those beginning with "Nt" are not, therefore an application should be able to detect SSDT
hooks by comparing Nt* function addresses with the equivalent pointer in the SSDT.

Bypass
A simple way to bypass SSDT hooks would be by calling only Nt* functions instead of the Zw*
equivalent. It is also possible to find the original SSDT by loading ntoskrnl.exe (this can be done with
LoadLibraryEx in user mode) then finding the export "KeServiceDescriptorTable" and using it to
calculate the offset of KiServiceTable within the disk image (user mode applications can use
NtQuerySystemInformation to get the kernel base address), a kernel driver would be required to
replace the SSDT.

SYSENTER_EIP Hook
SYSENTER_EIP points to the code to be executed when the SYSENTER instruction is used. User
mode applications use SYSENTER to transition into kernel mode and call a kernel function (Those
beginning with Nt/Zw), usually it would point to KiFastCallEntry, but can be replaced in order to hook
all user mode calls to kernel functions.

Detection / Bypass

SYSENTER_EIP hooking does not affect kernel mode drivers, and cannot be bypassed
from user mode. In order to allow user mode applications to bypass this hook, a kernel driver must set
SYSENTER_EIP to its original value (KiFastCallEntry), this can be done using the WRMSR
instruction, however since KiFastCallEntry is not exported by ntoskrnl, getting the address would not
be simple.
IRP Major Function Hook

The driver object of each driver, contains a table of twenty-eight functions pointer, these pointers are
to be called by other drivers via IoCallDriver, the pointers correspond to operations such as read/write
(IRP_MJ_READ/IRP_MJ_WRITE). These pointers can easily be replaced by another driver.

Detection

In general, all IRP major function pointers for a driver should point to code within the driver's address
space. This is not always the case, but is a good start to identifying malicious drivers which have
redirected the IRP major functions of legitimate drivers to their own code.

Bypass

Due to IRP major function pointers being initialized from within the driver entry point - during runtime,
it's not really possible to get the original values by reading the original driver from disk, there are also
issues with loading a new copy of the driver due to collisions. The only way for bypassing these types
of hooks would be calling the lower driver (Drivers are generally stacked and the top driver passes the
data to the driver below and so on, if the lowest driver isn't hooked, an application could just send the
request directly to the lowest driver).

Trap systems and policies they enable


Traps are occurred by the user program to invoke the functionality of the OS. Assume the user
application requires something to be printed on the screen, and it would set off a trap, and the operating
system would write the data to the screen.

A trap is a software-produced interrupt that can be caused by various factors, including an error in
instruction, such as division by zero or illegal memory access. A trap may also be generated when a
user program makes a definite service request from the OS.

Traps are called synchronous events because the execution of the present instructions much more
likely causes traps. System calls are another type of trap in which the program asks the operating
system to seek a certain service, and the operating system subsequently generates an interrupt to allow
the program to access the services.

35.2M

706

How to find Nth Highest Salary in SQL

The traps are more active as an interrupt because the code will heavily depend on the fact that the trap
may be used to interact with the OS. Therefore, traps would repeat the trap's function to access any
system service.

Mechanism of Trap in the Operating System

The user program on the CPU usually makes use of library calls to make system calls. The library
routine check's job is to validate the program's parameters, create a data structure to transfer the
arguments from the application to the operating system's kernel, and then execute special instructions
known as traps or software interrupts.
These special instructions or traps has operands that aid in determining which kernel service the
application inputs require. As a result, when the process is set to execute the traps, the interrupt saves
the user code's state, switches to supervisor mode, and then dispatches the relevant kernel procedure
that may offer the requested service.

Tagged architectures and multi-level UNIX


In computer science, a tagged architecture is a particular type of computer architecture where
every word of memory constitutes a tagged union, being divided into a number of bits of data, and
a tag section that describes the type of the data: how it is to be interpreted, and, if it is a reference, the
type of the object that it points to. In contrast, program and data memory are indistinguishable in
the von Neumann architecture, making the way the memory is referenced critical to interpret the
correct meaning.
Notable examples of American tagged architectures were the Lisp machines, which had tagged
pointer support at the hardware and opcode level, the Burroughs large systems, which had a data-
driven tagged and descriptor-based architecture, and the non-commercial Rice Computer. Both the
Burroughs and Lisp machine were examples of high-level language computer architectures, where
the tagging was used to support types from a high-level language at the hardware level.
In addition to this, the original Xerox Smalltalk implementation used the least-significant bit of each
16-bit word as a tag bit: if it was clear then the hardware would accept it as an aligned memory
address while if it was set it was treated as a (shifted) 15-bit integer. Current Intel documentation
mentions that the lower bits of a memory address might be similarly used by some interpreter-based
systems.

Unix File System

Unix file system is a logical method of organizing and storing large amounts of information in a way
that makes it easy to manage. A file is a smallest unit in which the information is stored. Unix file
system has several important features. All data in Unix is organized into files. All files are organized
into directories. These directories are organized into a tree-like structure called the file system.
Files in Unix System are organized into multi-level hierarchy structure known as a directory tree. At
the very top of the file system is a directory called “root” which is represented by a “/”. All other files
are “descendants” of root.

Directories or Files and their description –

 / : The slash / character alone denotes the root of the filesystem tree.
 /bin : Stands for “binaries” and contains certain fundamental utilities, such as ls or cp, which are
generally needed by all users.
 /boot : Contains all the files that are required for successful booting process.
 /dev : Stands for “devices”. Contains file representations of peripheral devices and pseudo-
devices.
 /etc : Contains system-wide configuration files and system databases. Originally also contained
“dangerous maintenance utilities” such as init,but these have typically been moved to /sbin or
elsewhere.
 /home : Contains the home directories for the users.
 /lib : Contains system libraries, and some critical files such as kernel modules or device drivers.
 /media : Default mount point for removable devices, such as USB sticks, media players, etc.
 /mnt : Stands for “mount”. Contains filesystem mount points. These are used, for example, if the
system uses multiple hard disks or hard disk partitions. It is also often used for remote (network)
filesystems, CD-ROM/DVD drives, and so on.
 /proc : procfs virtual filesystem showing information about processes as files.
 /root : The home directory for the superuser “root” – that is, the system administrator. This
account’s home directory is usually on the initial filesystem, and hence not in /home (which may
be a mount point for another filesystem) in case specific maintenance needs to be performed,
during which other filesystems are not available. Such a case could occur, for example, if a hard
disk drive suffers physical failures and cannot be properly mounted.
 /tmp : A place for temporary files. Many systems clear this directory upon startup; it might have
tmpfs mounted atop it, in which case its contents do not survive a reboot, or it might be explicitly
cleared by a startup script at boot time.
 /usr : Originally the directory holding user home directories,its use has changed. It now holds
executables, libraries, and shared resources that are not system critical, like the X Window
System, KDE, Perl, etc. However, on some Unix systems, some user accounts may still have a
home directory that is a direct subdirectory of /usr, such as the default as in Minix. (on modern
systems, these user accounts are often related to server or system use, and not directly used by
a person).
 /usr/bin : This directory stores all binary programs distributed with the operating system not
residing in /bin, /sbin or (rarely) /etc.
 /usr/include : Stores the development headers used throughout the system. Header files are
mostly used by the #include directive in C/C++ programming language.
 /usr/lib : Stores the required libraries and data files for programs stored within /usr or elsewhere.
 /var : A short for “variable.” A place for files that may change often – especially in size, for
example e-mail sent to users on the system, or process-ID lock files.
 /var/log : Contains system log files.
 /var/mail : The place where all the incoming mails are stored. Users (other than root) can access
their own mail only. Often, this directory is a symbolic link to /var/spool/mail.
 /var/spool : Spool directory. Contains print jobs, mail spools and other queued tasks.
 /var/tmp : A place for temporary files which should be preserved between system reboots.
Types of Unix files – The UNIX files system contains several different types of files :

1. Ordinary files – An ordinary file is a file on the system that contains data, text, or program
instructions.
 Used to store your information, such as some text you have written or an image you have drawn.
This is the type of file that you usually work with.
 Always located within/under a directory file.
 Do not contain other files.
 In long-format output of ls -l, this type of file is specified by the “-” symbol.

2. Directories – Directories store both special and ordinary files. For users familiar with Windows
or Mac OS, UNIX directories are equivalent to folders. A directory file contains an entry for every file
and subdirectory that it houses. If you have 10 files in a directory, there will be 10 entries in the
directory. Each entry has two components.
(1) The Filename
(2) A unique identification number for the file or directory (called the inode number)

contain “real” information which you would work with (such as text). Basically, just used for
organizing files.
In long-format output of ls –l , this type of file is specified by the “d” symbol.

3. Special Files – Used to represent a real physical device such as a printer, tape drive or
terminal, used for Input/Output (I/O) operations. Device or special files are used for device
Input/Output(I/O) on UNIX and Linux systems. They appear in a file system just like an ordinary file or
a directory.
On UNIX systems there are two flavors of special files for each device, character special files and
block special files :
 When a character special file is used for device Input/Output(I/O), data is transferred one
character at a time. This type of access is called raw device access.
 When a block special file is used for device Input/Output(I/O), data is transferred in large fixed-
size blocks. This type of access is called block device access.
For terminal devices, it’s one character at a time. For disk devices though, raw access means reading
or writing in whole chunks of data – blocks, which are native to your disk.
 In long-format output of ls -l, character special files are marked by the “c” symbol.
 In long-format output of ls -l, block special files are marked by the “b” symbol.

4. Pipes – UNIX allows you to link commands together using a pipe. The pipe acts a temporary file
which only exists to hold data from one command until it is read by another.A Unix pipe provides a
one-way flow of data.The output or result of the first command sequence is used as the input to the
second command sequence. To make a pipe, put a vertical bar (|) on the command line between two
commands.For example: who | wc -l
In long-format output of ls –l , named pipes are marked by the “p” symbol.

5. Sockets – A Unix socket (or Inter-process communication socket) is a special file which allows
for advanced inter-process communication. A Unix Socket is used in a client-server application
framework. In essence, it is a stream of data, very similar to network stream (and network sockets),
but all the transactions are local to the filesystem.
In long-format output of ls -l, Unix sockets are marked by “s” symbol.

6. Symbolic Link – Symbolic link is used for referencing some other file of the file
system.Symbolic link is also known as Soft link. It contains a text form of the path to the file it
references. To an end user, symbolic link will appear to have its own name, but when you try reading
or writing data to this file, it will instead reference these operations to the file it points to. If we delete
the soft link itself , the data file would still be there.If we delete the source file or move it to a different
location, symbolic file will not function properly.
In long-format output of ls –l , Symbolic link are marked by the “l” symbol (that’s a lower case L).
UNIX V
Z File System (ZFS)
The Z File System (ZFS) is an open-source logical volume manager and file system created by Sun
Microsystems, originally for its Solaris operating system. It is now used in many operating systems
including FreeBSD, NetBSD, Mac OS X Server 10.5 and various Linux distributions through ZFS-
FUSE. The most distinguishing feature of ZFS is pooled storage, where multiple storage devices are
treated as one big pool rather than as separate devices and logical drives. Storage can be taken from
the pool and allocated to other file systems, and the pool can be increased by adding new storage
devices to the pool. This is the same method of resource allocation used in a multitenant cloud
environment.

Techopedia Explains Z File System (ZFS)


ZFS is an advanced file system designed by Sun Microsystems to overcome many of the problems
that previous file system designs had such as error prevention and volume management. ZFS
includes data corruption protection, support for multiple storage devices and high storage capacities
without degrading performance, and uses concepts like volume management, copy-on-write clones,
snapshots, continuous checking of integrity and automatic repair when errors are found. It also uses a
data replication model similar to RAID-5, which is called RAID-Z, and eliminates a fatal flaw in RAID-5
called the "write hole," which causes a problem when a data block is written to a stripe but a power
failure or interruption happens just before the parity block can be written, resulting in the data being
inconsistent.

Major design goals of ZFS:

 Data integrity — Checksum is always written with data and is calculated again when those
data are read back. If there is a mismatch in the checksum, which indicates an error, then
ZFS attempts to automatically correct the error if data redundancy is available (backups).
 Pooled storage — All storage devices are added to a pool, which can be allocated to other file
systems or returned. This makes it easier to manage since a single pool is simpler than
multiple physical and logical drives. To increase the pool, new storage devices can be added.
 Performance — Performance is increased by employing multiple caching mechanisms. ZFS
uses an adaptive replacement cache (ARC), which is an advanced memory-based read
cache, along with a second L2ARC, which can be added when needed, and a disk-based
synchronous write cache, which is available through ZIL (ZFS intent log).

OpenSolaris Boot Environment and snapshots


A BE is a bootable instance of an Oracle Solaris 11 operating system plus any other application
software packages installed into that image. System administrators can maintain multiple BEs on their
systems, and each BE can have different software versions installed.

With multiple BEs, the process of updating software becomes a low-risk operation because system
administrators can create backup BEs before making any software updates to their system. If needed,
they have the option of booting a backup BE.\

You do not have to create a backup BE as a separate step if you are updating IPS packages. When
you use the pkg install or pkg update command, use the –require-backup-be, –backup-be-name, –
be-name, or –require-new-be option to make the changes in a new boot environment, not in the
current boot environment.

How to Create a New Boot Environment

1. Become the root role.


2. Create a boot environment by using the beadm create command.

# beadm create beName

where beName is a variable for the name of the new boot environment. This new boot
environment is inactive.

Note - The beadm create command does not create a partial boot environment. Either a new, full
boot environment is successfully created, or the command fails.

3. (Optional) Mount the new boot environment.

# beadm mount beName mountpoint

If the directory for the mount point does not exist, the beadm command creates the directory,
then mounts the boot environment on that directory. If the boot environment is already mounted,
the beadm mount command fails and does not remount the boot environment at the new location.

The boot environment is mounted, but remains inactive. Note that you can upgrade a mounted,
inactive boot environment. Also, remember to unmount the boot environment before rebooting
your system.

4. (Optional) To boot from the new boot environment, first activate the boot environment.

# beadm activate beName

where beName is a variable for the name of the boot environment to be activated. Upon reboot,
the newly active boot environment becomes the default boot entry that is listed in the GRUB
menu.

Snapshots
Snapshots of file systems are accessible in the . zfs/snapshot directory within the root of the file
system. For example, if tank/home/ahrens is mounted on /home/ahrens , then the
tank/home/ahrens@thursday snapshot data is accessible in the /home/ahrens/. zfs/snapshot/thursday
directory.

Displaying and Accessing ZFS Snapshots

You can enable or disable the display of snapshot listings in the zfs list output by using
the listsnapshots pool property. This property is enabled by default.

If you disable this property, you can use the zfs list -t snapshot command to display snapshot
information. Or, enable the listsnapshots pool property. For example:

# zpool get listsnapshots tank


NAME PROPERTY VALUE SOURCE
tank listsnapshots on default
# zpool set listsnapshots=off tank
# zpool get listsnapshots tank
NAME PROPERTY VALUE SOURCE
tank listsnapshots off local

OpenSolaris and UNIX systems V system administration


progmatics
Unix System V (pronounced: "System Five") is one of the first commercial versions of
the Unix operating system. It was originally developed by AT&T and first released in 1983. Four major
versions of System V were released, numbered 1, 2, 3, and 4. System V Release 4 (SVR4) was
commercially the most successful version, being the result of an effort, marketed as Unix System
Unification, which solicited the collaboration of the major Unix vendors. It was the source of several
common commercial Unix features. System V is sometimes abbreviated to SysV.
As of 2021, the AT&T-derived Unix market is divided between four System V
variants: IBM's AIX, Hewlett Packard Enterprise's HP-UX and Oracle's Solaris, plus the free-
software illumos forked from OpenSolaris.
Introduction

Unix history tree

AT&T System V license plate

UNIX System V Release 1 on SIMH (PDP-11)


System V was the successor to 1982's UNIX System III. While AT&T developed and sold hardware
that ran System V, most customers ran a version from a reseller, based on AT&T's reference
implementation. A standards document called the System V Interface Definition outlined the default
features and behavior of implementations.
AT&T support
During the formative years of AT&T's computer business, the division went through several phases of
System V software groups, beginning with the Unix Support Group (USG), followed by Unix System
Development Laboratory (USDL), followed by AT&T Information Systems (ATTIS), and finally Unix
System Laboratories (USL).
Rivalry with BSD
In the 1980s and early-1990s, UNIX System V and the Berkeley Software Distribution (BSD) were the
two major versions of UNIX. Historically, BSD was also commonly called "BSD Unix" or "Berkeley
Unix".[2] Eric S. Raymond summarizes the longstanding relationship and rivalry between System V
and BSD during the early period:[3]
In fact, for years after divestiture the Unix community was preoccupied with the first phase of the Unix
wars – an internal dispute, the rivalry between System V Unix and BSD Unix. The dispute had several
levels, some technical (sockets vs. streams, BSD tty vs. System V termio) and some cultural. The
divide was roughly between longhairs and shorthairs; programmers and technical people tended to
line up with Berkeley and BSD, more business-oriented types with AT&T and System V.
While HP, IBM and others chose System V as the basis for their Unix offerings, other vendors such
as Sun Microsystems and DEC extended BSD. Throughout its development, though, System V was
infused with features from BSD, while BSD variants such as DEC's Ultrix received System V features.
AT&T and Sun Microsystems worked together to merge System V with BSD-based SunOS to
produce Solaris, one of the primary System V descendants still in use today[when?]. Since the early
1990s, due to standardization efforts such as POSIX and the success of Linux, the division between
System V and BSD has become less important.

Service startup
The Advanced Boot Options screen lets you start Windows in advanced troubleshooting modes. You
can access the menu by turning on your computer and pressing the F8 key before Windows starts.

Some options, such as safe mode, start Windows in a limited state, where only the bare essentials
are started. If a problem doesn't reappear when you start in safe mode, you can eliminate the default
settings and basic device drivers and services as possible causes. Other options start Windows with
advanced features intended for use by system administrators and IT professionals. For more
information, go to the Microsoft website for IT professionals.

Repair Your Computer

Shows a list of system recovery tools you can use to repair startup problems, run diagnostics, or
restore your system. This option is available only if the tools are installed on your computer's hard
disk. If you have a Windows installation disc, the system recovery tools are located on that disc.

Safe Mode

Starts Windows with a minimal set of drivers and services.

To start in safe mode:

1. Remove all floppy disks, CDs, and DVDs from your computer, and then restart your computer. Click
the Start button , click the arrow next to the Shut Down button (or the arrow next to the Lock button),
and then click Restart.
2. Do one of the following:
 If your computer has a single operating system installed, press and hold the F8 key as your computer
restarts. You need to press F8 before the Windows logo appears. If the Windows logo appears, you'll
need to try again by waiting until the Windows logon prompt appears, and then shutting down and
restarting your computer.
 If your computer has more than one operating system, use the arrow keys to highlight the operating
system you want to start in safe mode, and then press F8.
3. On the Advanced Boot Options screen, use the arrow keys to highlight the safe mode option you
want, and then press Enter.
4. Log on to your computer with a user account that has administrator rights.

 Safe Mode with Networking. Starts Windows in safe mode and includes the network drivers and
services needed to access the Internet or other computers on your network.
 Safe Mode with Command Prompt. Starts Windows in safe mode with a command prompt window
instead of the usual Windows interface. This option is intended for IT professionals and
administrators.
 Enable Boot Logging. Creates a file, ntbtlog.txt, that lists all the drivers that are installed during
startup and that might be useful for advanced troubleshooting.
 Enable low-resolution video (640×480). Starts Windows using your current video driver and using
low resolution and refresh rate settings. You can use this mode to reset your display settings. For
more information, see Change your screen resolution.
 Last Known Good Configuration (advanced). Starts Windows with the last registry and driver
configuration that worked successfully.
 Directory Services Restore Mode. Starts Windows domain controller running Active Directory so
that the directory service can be restored. This option is intended for IT professionals and
administrators.
 Debugging Mode. Starts Windows in an advanced troubleshooting mode intended for IT
professionals and system administrators.
 Disable automatic restart on system failure. Prevents Windows from automatically restarting if an
error causes Windows to fail. Choose this option only if Windows is stuck in a loop where Windows
fails, attempts to restart, and fails again repeatedly.
 Disable Driver Signature Enforcement. Allows drivers containing improper signatures to be
installed.
 Start Windows Normally. Starts Windows in its normal mode.

Dependencies
A dependency is a broad software engineering term used to refer when software relies on other
software in order to be functional. For R package dependencies, the installation of
the synapser package will take care of installing other R packages that synapser depends on, using
dependencies specified in the DESCRIPTION file. However, lower level system dependencies are not
automatically installed.
Most Windows and Mac machines have the required system dependencies. Linux machines,
including most Amazon Web Services EC2 machines, will need to be configured before
installing synapser.

Required System Dependencies

 libssl-dev
 libcurl-dev
 libffi-dev
 zlib-dev

Ubuntu Installation
To install these system dependencies on Ubuntu machines:

apt-get update -y
apt-get install -y dpkg-dev zlib1g-dev libssl-dev libffi-dev
apt-get install -y curl libcurl4-openssl-dev

Another option is to use the provided Dockerfile. For more information on installing synapser with
Docker, please see our Docker vignettes.

Redhat Installation
To install these system dependencies on Redhat machines:

yum update -y
yum install -y openssl-devel curl-devel libffi-devel zlib-devel
yum install -y R

Management

A management operating system (MOS) is the set of tools, meetings and behaviours used to manage
your people and processes to deliver results. A Management Operating System (MOS) follows the
Plan, Do, Check, Act improvement cycle to get control and steadily improve process performance.
The tools and work practices of a Management Operating System enable Front Line Leaders to
detect and correct mistakes before they become serious and helps to minimize wastage and loss.
Awareness of the performance of the process at short intervals enables corrective actions to be taken
and ensures efficiency by monitoring the allocation and use of resources.
Effective management operating systems improve operational performance, however most systems
are built using clunky spreadsheets and whiteboards that are hard to implement and sustain. Fewzion
changes this by putting an easy to use, visual system in the hands of planners and front line leaders
so that they can get control of their processes and improve performance one shift at a time.
Plan the Work
Fewzion helps you build an effective plan by connecting the work you want to do and the targets you
want to meet with the resources and equipment you need to get the plan done. Fewzion has roster
and equipment planning tools in our shiftly planning system to ensure plans can be resource balanced
easily.
Work (Do) the Plan
Shift managers and supervisors should adapt their plan for changing conditions then take it with then
on a tablet or print it (including attachments such as work orders and tool box talks) to communicate
and execute with their crew during the shift.
Check Performance for Accountability
Throughout and at the end of shift, supervisors complete their “Actuals” touch screens with KPI
results, tasks they have completed and answer any critical questions. Shift Managers should discuss
performance with their supervisor using these screens to close the “leadership loop”
Report and Improve
Each day the gathered data is used to provide management with daily and weekly operating
reports about production and KPI performance for use in review meetings. We also send reports on
system usage and tie and attendance to help identify issues with the behaviours you should expect
from your frontline leaders.
Fewzion is a web based Management Operating System (MOS) that replaces the cluster of
paperwork, spreadsheets and whiteboards used in shift planning, rosters, leave and equipment
scheduling and performance reporting to simplify your operational management. Fewzion is fast to get
going and simple to use making it easier for your planners, managers and front line leaders to safely
improve production and performance.
From weekly and daily planning to handovers, short interval control to daily review meetings Fewzion
covers off the critical elements of your Management System so you can sustainably Plan, Do, Check
& Act your way to improved results with a better Management Operating System.
Systems update
Operating system updates contain new software that helps keep your computer current.

Examples of updates include service packs, version upgrades, security updates, drivers, or other
types of updates.

Important and high-priority updates are critical to the security and reliability of your computer. They
offer the latest protection against malicious online activities.

You need to update all of your programs, including Windows, Internet Explorer, Microsoft Office, and
others. Visit Microsoft Update to scan your computer and see a list of updates, which you can then
decide whether to download and install.

NOTE: Microsoft offers security updates on the second Tuesday of the month.

It's important to install new security updates as soon as they become available.
The easiest way to do this is to turn on automatic updating and use the recommended setting, which
downloads recommended updates and installs them on a schedule you set.

In Windows Vista, you control the automatic updating settings through the Windows Update Control
Panel. For more information, see Turn automatic updating on or off.

Overview of the kernel network stack implementation

Scope of Module This module introduces the students to the concepts involved in implementing
network stacks (the software and firmware that implement a computer networking protocol suite such
as TCP/IP over Ethernet).

Goals and Learning Outcomes

The aim of the module is to introduce students to the software embedded in network devices such as
routers to implement network protocols. Where possible, open source implementations of protocols
used in live networks will be studied. Both the data plane and the control plane will be studied,
including data-link layer protocols, network layer protocols and transport layer protocols. Optimisation
techniques, hardware acceleration and other approaches to achieving “wire speed” operation will be
investigated. Protocols appropriate to the Internet of Things, to data centres, and to the future Internet
will be considered. Having successfully completed this course, the students will be able to:

 classify network functionality as belonging to the control plane and the data plane respectively
 explain how a typical operating system processes packets from arrival from an interface card
to forwarding to user space
 describe the principles involved in implementing a network stack in software
 decompose the software of “middleboxes” such as network routers into a software
architecture
 evaluate the trade-offs involved in hardware versus software implementation of packet
processing functions
 demonstrate advanced theoretical knowledge of networking
 add functionality to an open-source network stack
 adapt existing software to meet new networking requirements

Note about software

The module involves exploring the lower layers (the transport layer and below) and also the control
plane (which straddles the layers) of protocol stacks. Conventional courses on network programming
treat the network as something neatly wrapped inside an Applications Programming Interface such as
the BSD sockets API. In this module, we look “under the hood” at the software that lies below this
interface.

We’ll be using the Linux implementation of the network stack as our reference for how an Operating
System implements network protocols. Ideally we’d study a real-time OS or one used in commercial
routers, but these are, unfortunately, not open-source. For the same reason, we will not be looking at
how the lower layers are implemented in the market leader OS.

Disclaimer
This is not a training course on Linux kernel software development; it is an academic module about
the principles involved in implementing network stacks, which is illustrated by examples drawn largely
from the Linux kernel. The Linux kernel is updated frequently, and many optimisations and bug fixes
obscure the principles involved. If a concept is clearly illustrated in vX.Y of the kernel, and obfuscated
in later versions, the concept will continue to be illustrated using the vX.Y source code.

The flow of the packet through the linux network stack is quite intriguing and has been a topic for
research, with an eye for performance enhancement in end systems. This document is based on the
TCP/IP protocol suite in the linux kernel version 2.6.11 - the kernel core prevalent at the time of
writing this document. The sole purpose of this document is to take the reader through the path of a
network packet in the kernel with pointers to LXR targets where one can have a look at the functions
in the kernel which do the actual magic.

This document can serve as a ready look up for understanding the network stack, and its discussion
includes KURT DSKI instrumentation points, which are highly useful in monitoring the packet behavior
in the kernel.

We base our discussion on the scenario where data is written to a socket and the path of the resulting
packet is traced in a code walk through sense

TCP/IP - Overview
TCP/IP is the most ubiquitous network protocol one can find in today’s network. The protocol has its
roots in the 70’s even before the formulation of the ISO OSI standards. Therefore, there are four well
defined layers in the TCP/IP protocol suite which encapsulate the popular seven layered architecture,
within it.

Relating TCP/IP to the OSI model - The application layer in the TCP/IP protocol suite
comprises of the application, presentation and the sessions layer of the ISO OSI model.

The socket layer acts as the interface to and from the application layer to the transport layer. This
layer is also called as the Transport Layer Interface. It is worth mentioning that there are two kinds
of sockets which operate in this layer, namely the connection oriented (streaming sockets) and the
connectionless (datagram sockets).

The next layer which existsin the stack is the Transport Layer which encapsulates the TCP and UDP
functionality within it. This forms Layer 4 of the TCP/IP 3 protocol stack in the kernel. The Network
Layer in the TCP/IP protocol suite is called IP layer as this layer contains the information about the
Network topology, and this forms Layer 3 of the TCP/IP protocol stack. This layer also understands
the addressing schemes and the routing protocols.

Link Layer forms Layer 2 of the stack and takes care of the error correction routines which are
required for error free and reliable data transfer.

The last layer is the Physical Layer which is responsible for the various modulation and electrical
details of data communication.

When Data is sent through socket


Let us examine the packet flow through a TCP socket as a model, to visualize the Network Stack
operations in the linux kernel.

NOTE: All bold faced text are LXR search strings and the corresponding files are mostly mentioned
alongside each LXR target. In the event of file names not being mentioned, an identifier search with
the LXR targets will lead to the correct location which is in context.

Berkeley Packet Filter architecture


The Berkeley Packet Filter (BPF) is a technology used in certain computer operating systems for
programs that need to, among other things, analyze network traffic (and eBPF is an extended
BPF JIT virtual machine in the Linux kernel). It provides a raw interface to data link layers, permitting
raw link-layer packets to be sent and received.BPF is available on most Unix-like operating systems
and eBPF for Linux and for Microsoft Windows .In addition, if the driver for the network interface
supports promiscuous mode, it allows the interface to be put into that mode so that all packets on
the network can be received, even those destined to other hosts.
BPF supports filtering packets, allowing a userspace process to supply a filter program that specifies
which packets it wants to receive. For example, a tcpdump process may want to receive only packets
that initiate a TCP connection. BPF returns only packets that pass the filter that the process supplies.
This avoids copying unwanted packets from the operating system kernel to the process, greatly
improving performance.
BPF is sometimes used to refer to just the filtering mechanism, rather than to the entire interface.
Some systems, such as Linux and Tru64 UNIX, provide a raw interface to the data link layer other
than the BPF raw interface but use the BPF filtering mechanisms for that raw interface.
Linux Netfilter Architecture

I would like to briefly explain the structure of Linux Netfilter architecture, How it works and how does
packet flow through Linux machine.

What is a Firewall?

A firewall is a device software or hardware which is used to filter out the packets going through the
network on the basis of some rules and policies.

The firewall has two components one is packet filtering and the second is an application-level gateway.
Both of these technologies used to filter out packets depending on packet header and payload
information. Packet filter works up to layer 4 (Transport Layer) in the TCP/IP model. Additionally, if we
wanna filter out the packet on the basis of payload or data then Application-level gateway is used. I
don't wanna go into details of both this article is just to understand packet filters architecture and how it
works. Please look at ipv4 header, TCP header, UDP header and ARP header on
Wikipedia. https://en.wikipedia.org/wiki/IPv4
How Does Packet Filter work?

packet filter is a component of a firewall that is used to filter out packets on specified rules. Packet filter
takes the packet and matches with specified rules in iptables (program provided by Linux kernel firewall
) , Then the header information of a packet is compared with features specified in rules. If header
properties of the packet do match with Rule features then corresponding actions triggered for a
particular rule. Remeber again each rule has two sub-components features that have all the
information that is compared against packet header information and the second component is the
target in which actions are specified e.g drop the packet, send the packet to another rule chain or
accept the packet. All rules work in linear order keep in mind order is so crucial. Default rule works
such a way if the above rule does not accept packets then default accepts, or another way around but
what to accept or deny that should be manually specified.

Packet filters with rules (1.0 figure)

what is a Packet?

A packet is a chunk of information that flows through the internet. Additionally, a packet contains all the
information that is important for intermediate stations to get to the destination point. e.g packet header.

For example, a TCP IP packet may have sender IP address, receiver IP address, sender port number,
receiver port number and protocol to which on receiving side the packet would be handed over.

TCP/IP Packet (figure 1.1)

Netfilter Architecture and how does it work?

Netfilter Architecture is an indispensable component of firewall. I would briefly explain here what are
chains and how these chains work.
There are five chains, but we are only concerned to know about three. These are so crucial for getting
into this topic. Whenever a packet comes to the machine or PC, there is a NIC card through which
network traffic goes in or out.

as depicted above in figure 1.3. when TCP IP packet comes to network interface card then that is sent
to the Pre Routing chain where the decision is made either the packet is destined for the local process,
or for another router or another interface depending upon packet header information, The decision is
made for routing the packet.

1. INPUT CHAIN: If the packet is destined for local process (process means the execution of code
at run time) so remember the local process could be any application interacting with the network. I
am considering, for example, an application running on port 80.

2. OUTPUT CHAIN: If the packet is generated through the local process and intended to go to
another machine or network or router etc. That packet will flow through OUTPUT CHAIN and then
POSTROUTING chain and then handed over to the network interface card.

3. FORWARD CHAIN: if packet comes through network interface and then the decision is made
either the packet is intended for local machine or is for another network interface or in another
words packet is for another machine or router then the packet goes through FORWARD CHAIN
and then sent to POSTROUTING and lastly to NIC.

I hope this would be useful to get to know about Netfilter architecture and how it works. For practical
purpose go through “sudo iptables -v -L” in Linux you will be able to see all these three chains and play
around by creating client-server machines.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy