CS 423 – Operating Systems Design
Lecture 21 – Disk Devices and Timers Klara Nahrstedt Fall 2011 Based on slides by YY Zhou and Andrew S. Tanenbaum
CS 423 - Fall 2011
Administrative announcements ◦ Midterm – Wednesday, October 12 in class!!
Disk Devices ◦ Performance Improvements
CS 423 - Fall 2011
Disk Performance Improvements
Getting first byte from disk read is slow ◦ high latency
Improve disk performance ◦ Parallelize access (RAID) ◦ Schedule requests to speed up disk access Schedule requests to shorten seeks
◦ Move some disk data into disk controller (ondisk caching) or main memory – file system caching 10/9/2011
RAID • • •
Use parallel processing to speed up CPU performance Use parallel I/O to improve disk performance, reliability (1988, Patterson) Design new class of I/O devices called RAID – Redundant Array of Inexpensive Disks (also Redundant Array of Independent Disks) Use the RAID in OS as a SLED (Single Large Expensive Disk), but with better performance and reliability 10/9/2011
RAID (cont.) RAID consists of RAID SCSI controller plus a box of SCSI disks Data are divided into strips and distributed over disks for parallel operation RAID 0 … RAID 5 levels RAID 0 organization writes consecutive strips over the drives in round-robin fashion – operation is called striping RAID 1 organization uses striping and duplicates all disks RAID 2 uses words, even bytes and stripes across multiple disks; uses error codes, hence very robust scheme RAID 3, 4, 5 alterations of the previous ones
Kernel Components Affected by Block Device Op. VFS (Virtual File System) Disk Caches Disk Filesystem
Block Device Driver
Layer Block Device Driver
Which disk request is serviced first? ◦ ◦ ◦ ◦ ◦ ◦
FCFS Shortest seek time first Elevator (SCAN) LOOK C-SCAN (Circular SCAN) C-LOOK
Looks familiar? 10/9/2011
FIFO (FCFS) order
Method ◦ First come first serve
◦ Fairness among requests ◦ In the order applications expect
Cons ◦ Arrival may be on random spots on the disk (long seeks) ◦ Wild swing can happen
Analogy: ◦ Can elevator scheduling use FCFS?
98, 183, 37, 122, 14, 124, 65, 67
SSTF (Shortest Seek Time First)
Method ◦ Pick the one closest on disk ◦ Rotational delay is in calculation
Pros ◦ Try to minimize seek time
Cons ◦ Starvation
Question ◦ Is SSTF optimal? ◦ Can we avoid starvation?
Analogy: elevator 98, 183, 37, 122, 14, 124, 65, 67 (65, 67, 37, 14, 98, 122, 124, 183)
Method ◦ Take the closest request in the direction of travel ◦ Real implementations do not go to the end (called LOOK)
Pros ◦ Bounded time for each request
Cons ◦ Request at the other end will take a while
98, 183, 37, 122, 14, 124, 65, 67 (37, 14, 0, 65, 67, 98, 122, 124, 183) 10/9/2011
C-SCAN (Circular SCAN)
Method ◦ Like SCAN ◦ But, wrap around ◦ Real implementation doesn’t go to the end (C-LOOK)
Pros ◦ Uniform service time
Cons ◦ Do nothing on the return
98, 183, 37, 122, 14, 124, 65, 67 (65, 67, 98, 122, 124, 183, 199, 0, 14, 37) 10/9/2011
LOOK and C-LOOK SCAN and C-SCAN move the disk arm across the full width of the disk • In practice, neither algorithm is implemented this way • More commonly, the arm goes only as far as the final request in each direction. Then, it reverses direction immediately, without first going all the way to the end of the disk. • These versions of SCAN and C-SCAN are called LOOK and C-LOOK •
Group Discussion Questions
The disk scheduling algorithm that may cause starvation is: ◦ FCFS or SSTF or C-SCAN or LOOK ??
From the list of disk-scheduling algorithms (FCFS, SSTF, SCAN, CSCAN, LOOK, C-LOOK), SSTF will always give the least head movement for any set of cylinder-number requests to the disk scheduler: ◦ True or False ??
The cylinder numbers on a disk are 0,1,…10. Currently, there are five cylinder requests on the disk scheduler queue in the following order: 1,5,4,8,7 and the head is located at position 2 and moving in the direction of increasing block numbers. The time to serve a request is proportional to the distance from the head to the cylinder number requested. If T(X) is the time it takes to service the requests currently in the queue using scheduling algorithm X, then: ◦ ◦ ◦ ◦
T(SSTF) < T(SCAN) < T(FCFS) or T(FCFS) < T(SSTF) < T(SCAN) or T(SSTF) < T(FCFS) < T(SCAN) or None of the above??? 10/9/2011
History of Disk-related Concerns
When memory was expensive ◦ Do as little bookkeeping as possible
When disks were expensive ◦ Get every last sector of usable space
When disks became more common ◦ Make them much more reliable
When processor got much faster ◦ Make them appear faster
Method ◦ Put RAM on disk controller to cache blocks Seagate ATA disk has .5MB, IBM Ultra160 SCSI has 16MB Some of the RAM space stores “firmware” (an OS)
◦ Blocks are replaced usually in LRU order –
What are the pros and cons of on-disk caching?
Disk Block Caches • •
Main memory rather than disk may hold disk blocks 85% or more of all I/O requests by file system and applications can be satisfied by disk block cache BSD UNIX provides a disk block cache as part of the block-oriented device software layer It consists of between 100 and 1000 individual buffers. 10/9/2011
Disk I/O Summary • • •
Disk is an important I/O device Disk must be fast and reliable Error Handling is important; check checksum, called ECC (Error Correcting Code) – Track a bad sector – Substitute a spare for the bad sector – Shift all sectors to bypass the bad one
RAID protects against few bad sectors, but does not protect against write errors laying down bad data Stable Storage may be needed 10/9/2011
Clocks and Timers •
Hardware clocks (also called timers) – give the current time – give the elapsed time – set a timer to trigger operation X at time T
Programmable interval timer - hardware to measure elapsed time and trigger operations • This mechanism is used by the scheduler, disk I/O subsystem, network subsystem • Give examples where/when/how CPU scheduler uses timers. •
Clock Hardware •
Older/simpler clocks – tied to the 110 or 220 volt power line and cause interrupt on every voltage cycle at 50-60 Hz.
Current clocks consist of three components: – a crystal oscillator – a counter – a holding register. – Feature: • The quartz crystal generates a periodic signal of very high accuracy (several hundred MHz). This signal is fed into the counter which counts down to zero. When the counter is zero, it causes CPU interrupt. The holding register is used to load the counter. 10/9/2011
Modes of programmable clocks: ◦ one-shot mode: when the clock starts (started explicitly by the software), it copies the register value into the counter, decrements the counter, causes an interrupts when counter is zero, and stops ; ◦ square-wave mode: when the clock starts, it copies the register value into the counter, decrements the counter, causes an interrupts when counter is zero, and then automatically the holding register value is copied into the counter, and the whole process repeats (i.e., there is no explicit start of the clock by the software) The periodic interrupts are called clock ticks. Programmable clock chips usually contain two or three independently programmable clocks. 10/9/2011
Clock Issues Battery-powered backup clocks Some clocks drift and need to be resynchronized. ◦ manually resynchronize the clocks. ◦ get the current time from a remote host which carries the UTC (Universal Coordinated Time ), formally known as Greenwich Mean Time). NTP (Network Time Protocol) is a standard protocol to re-synchronize clocks between two networked systems (regularly the system administrators re-synchronize all clocks in our machines using NTP).
Clock Software Clock hardware generates interrupts at known intervals. The clock hardware is served by the clock driver software. Functions of the clock driver: ◦ Maintaining the time of day ◦ Preventing processes from running longer than they are allowed to ◦ Accounting for CPU usage ◦ Handling alarm system call made by the user process ◦ Providing watchdog timers for parts of the system itself ◦ Doing profiling, monitoring and statistics gathering
Clock Overflow Time of the day(real clock): requires incrementing a counter at each clock tick. One difficulty is the overflow of the counter (e.g., a 32-bit counter overflows in two years). What are the possible solutions to the overflow problem?
Preventing processes from running too long ◦ whenever a process is started, the scheduler initializes a counter to the value of that process' quantum in clock ticks. As the clock ticks, the counter is decremented, and when counter is zero, the clock driver calls the scheduler to set up another process.
CPU accounting ◦ The system should start a second timer (different from the main timer). Whenever the process starts, the second timer starts counting. When a process stops, the timer can be read out to tell how long the process has run. When a process is interrupted, the timer value should be stored and when the process restarts, the timer value should be restored.
Simpler way to do CPU accounting ◦ currently running process keeps a field in the process table entry. At every clock tick, this field in the current process is incremented, ie. every clock tick is ``charged'' to the process running at the time (problems might arise if many interrupts occur).
Timer for Individual Process
Providing timers (alarm) to requests from processes in order to give a warning after a certain time interval ◦ If the clock driver had enough clocks, it could set a separate clock for each request. This is not the case.
Simulate virtual clocks with a single physical clock. ◦ maintain (a) table in which the signal time for all pending times is kept, and (b) a variable giving the time of the next one. When time of day is updated, the driver checks for the closest signal to occur. Once it has, the table is searched for the next one to occur.
Watchdog Timer and Profiling A timer is set, and when the timer goes off, instead of causing signal as it is the case for user signals, the clock driver calls a procedure supplied by the caller. Use it for profiling
◦ Build histogram of the program counter and see where it is spending its time. ◦ Debugging
Software-based timer, called soft timer, is a timer which ◦ sets to the requested frequency as needed ◦ is checked by the kernel when entries are made for other reasons such as system calls, TLB misses, page faults, I/O interrupts, CPU going idle. ◦ If the soft timer has expired, the scheduled event is performed (e.g., packet transmission) with no need to switch to kernel mode. ◦ The kernel resets the soft timer again to go off. The kernel copies the current clock value to the timer and adds the timeout interval to it.
Soft timer can avoid interrupts ◦ the frequency of kernel entries is so frequent that the kernel can check on the soft timer and schedule the events without sending an interrupt. 10/9/2011
Linux Timekeeping Architecture
Linux carries several time-related activities: ◦ Updates the time elapsed since system startup ◦ Updates the time and data ◦ Determines for each CPU quanta ◦ Updates resource usage statistics ◦ Checks whether the interval of time associated with each software timer elapsed. 10/9/2011
Timing Measurements •
Real Time Clock (RTC) – Independent of CPU and other chips – Continues to tick on small battery – Capable to issues periodic interrupts on IRQ8 at frequency 2Hz to 8,192 Hz
Time Stamp Counter
– Interface register that allows access to clock signal counter
Programmable Interval Time (PIT)
– Similar to alarm clock – it is used to issue timer interrupts on IRQ0 at frequency 1000 Hz – Time interval is called tick 10/9/2011
Timing Measurements •
CPU Local Timer – similar to PIT – can issue one shot or periodic interrupts
High Precision Event Timer (HPET) – New timer chip (Intel&Microsoft) – Includes up to 8 32-bit or 64-bit independent counters – HPET is expected to replace PIT
Power Management Timer 10/9/2011
Data Structures of Timekeeping Architecture • •
Examples of some of the 80x86 architecture Timer object – Identifies timer source and Supports four standard methods • Method that records exact time (invoked by timer interrupt handler) • Method that returns time elapsed since last tick • Method that returns number of nsec since kernel init • Method that waits for a given number of “loops”
Jiffies variable – Counter that stores number of ticks since system started
Xtime variable – Stores current time and date 10/9/2011
Software Timers •
Timer – software facility that allows functions to be invoked at some future moment, after a given time interval elapsed – Time-out facility
Linux has two types of timers – Dynamic timers – Interval timers
Dynamic timers may be dynamically created and destroyed Interval timers – these timers cause UNIX signals to be sent periodically to the process 10/9/2011
System Calls Related to Timing Measurements •
time() – Returns number of elapsed seconds since midnight of January 1, 1970
gettimeofday() – Returns structure ‘timeval’ – number of elapsed microseconds in the last second and seconds since midnight January 1, 1970
Sleep(), usleep() – Puts process to sleep for certain amount of seconds, microseconds, respectively
Other timers (alarm, setitimer…) • Group activity: •
– Write a pseudo-code where you calculate in a loop multiplication of a number x with itself and you return the result after 100 microseconds
Timer Summary •
Always check what is the timer resolution on your platform since too many components depend on it Linux kernel implements POSIX timers by means of dynamic timers Linux also supports Delay Functions – if software timers are useless (in case kernel must wait for a short time period – few ms, the kernel makes use of udelay() or ndelay()) 10/9/2011