12. A bit about Disks#

We will talk more about disks when we discuss devices later. However, most file systems have been designed to work on hard disk drives (often abbreviated as just hard drives or HDDs), and understanding a bit about them is important to understand how file systems work. Hard drives record data magnetically on one or more spinning platters in concentric circular tracks (Fig. 12.1). Hard disks are block devices, meaning that data can only be read from, or written to, the disk in relatively large chunks. The basic unit of access for a hard disk is called a sector. Historically, the most common sector size was 512 B, but modern hard disks use 4 KB sectors. Each track on the disk is divided up into some number of sectors.

../_images/disk-chs.png

Fig. 12.1 A disk drive contains some number of platters, where the collection of the same track on different platters is called a cylinder.#

Data on a drive can be identified by the platter surface it is on, the track on that surface, and finally the sector on that track. To simplify the task of specifying platter, surface, track and sector, hard disks support Logical Block Addresses (LBAs) in which the sectors are numbered from 0 to the capacity of the disk. Software layers request data transfers using LBAs, and the drive controller translates these into the required hardware-level information, freeing software from the need to understand the disk geometry.

Reading data from a disk (or writing to it) requires the disk drive to perform the following steps:

  1. Switching the electronics to communicate with the appropriate head; in terms of performance, one can generally ignore this step, because it is fast.

  2. Performing a seek that moves the head assembly until the head is positioned over the target track; the time spent to do this is called the seek time.

  3. Waiting for the platter to rotate until the first bit of the target data is passing under the head; this time is called rotational latency.

  4. Reading or writing until the last bit has passed under the head; this is called the transfer time.

../_images/disk-latency.png

Fig. 12.2 Operations to read a disk sector.#

The seek and rotational latency have a major effect on disk performance. To give an example, consider randomly accessing 4 KB data blocks on a 7200 RPM (i.e., a disk that rotates 7200 times per minute). The average rotational latency would be 4 ms, i.e., 1/2 the time for a 7200 RPM drive to complete one full rotation. A typical transfer rate might be around 200 MB/s, so to transfer a 4 KB block it would take \((4/1000)/200 = 0.0002\) seconds = 0.02 ms. A common seek time for a drive might be around 8 ms. We can see that the transfer time for a small 4 KB block is two orders of magnitude smaller than the positioning delays (seek and rotational latency).

Therefore, a file system that resulted in random reads (or writes) across the whole disk, would end up taking \(8 + 4 + 0.02 ~= 12\) ms for each block, or having an average throughput of 333 KB/s (i.e., \(4/(12/1000)\) ). If you use the minimum sector size of many disks, at 512 B, you would get only around 41 KB per second, while if you used a 16 KB block, your throughput would go up to around 1.3 MB per second.

If, on the other hand, the file system always transferred a 5 MB block, it would require \(8 + 4 + 25 = 37\) ms for an average throughput of \(134\) MB/s.

So, when designing a file system, for performance you want to (1) read relatively big disk blocks, and 2) have data frequently accessed together on the same tracks/cylinder, since seeking to nearby tracks is much faster.

Studies of file systems have shown that most files are 4KB or smaller, but most of the storage space is used by a small number of very big files [ABDL07].

<function matplotlib.pyplot.show(close=None, block=None)>
../_images/diskhw_3_1.png
<function matplotlib.pyplot.show(close=None, block=None)>
../_images/diskhw_4_1.png