Storage
The problems:
- When we turn off the computer, all contents of memory (RAM) are lost.
- Need a way to save information when the computer is restarted.
- We need some kind of secondary or external storage.
- Lots of different kinds of secondary storage, for example:
- Main factors distinguishing slow/fast storage (disk/memory): Speed and Cost
Date | RAM | Magnetic disk (Mechanical) | NAND SSD | Largest disk | Largest SSD |
11/22/2016 | about $4.69/GB | about $0.04/GB | about $0.30/GB | 10 TB ($500) | 4 TB ($1475) |
07/07/2016 | about $4.75/GB | about $0.04/GB | about $0.27/GB | 10 TB ($580) | 3.8 TB ($2650) |
04/06/2016 | about $4.00/GB | about $0.04/GB | about $0.30/GB | 8 TB ($220) | 2 TB ($500) |
11/24/2015 | about $5.00/GB | about $0.035/GB | about $0.32/GB | 8 TB ($240) | 2 TB ($700) |
7/7/2015 | about $5.00/GB | about $0.029/GB | about $0.30/GB | 8 TB ($275*) | 1.6 TB ($1500) |
4/9/2015 | about $7.50/GB | about $0.035/GB | about $0.38/GB | 8 TB ($300) | 1 TB ($380) |
11/18/2014 | about $10.00/GB | about $0.040/GB | about $0.41/GB | 6 TB ($300) | 960 GB ($390) |
7/7/2014 | about $10.25/GB | about $0.040/GB | about $0.44/GB | 6 TB ($300) | 1 TB ($440) |
4/3/2014 | about $10.00/GB | about $0.045/GB | about $0.50/GB | 4 TB ($165) | 1 TB ($500) |
11/12/2013 | about $10.00/GB | about $0.050/GB | about $0.73/GB | 4 TB ($190) | 1 TB ($600) |
7/9/2013 | about $8.50/GB | about $0.050/GB | about $0.78/GB | 4 TB ($200) | 1 TB ($2500*) |
4/2/2013 | about $6.25/GB | about $0.055/GB | about $0.55/GB | 4 TB ($300) | 1 TB ($2099) |
11/20/2012 | about $3.50/GB | about $0.055/GB | about $0.72/GB | 4 TB ($300) | 1 TB ($2249) |
7/5/2012 | about $5/GB | about $0.060/GB | about $0.72/GB | 4 TB ($330) | 1 TB ($2499) |
7/13/2011 | about $5/GB | about $0.040/GB | about $1.66/GB | 3 TB ($160) | 1 TB ($2864) |
4/11/2011 | about $15/GB | about $0.045/GB | about $2.00/GB | N/A | N/A |
12/7/2010 | about $15/GB | about $0.060/GB | about $1.40/GB | N/A | N/A |
7/20/2010 | about $15/GB | about $0.060/GB | N/A | N/A | N/A |
4/20/2010 | about $30/GB | about $0.090/GB | N/A | N/A | N/A |
12/3/2009 | about $22/GB | about $0.100/GB | N/A | N/A | N/A |
11/25/2008 | about $16/GB | about $0.120/GB | N/A | N/A | N/A |
7/9/2008 | about $21/GB | about $0.160/GB | N/A | N/A | N/A |
4/3/2008 | about $20/GB | about $0.200/GB | N/A | N/A | N/A |
11/29/2007 | about $70/GB | about $0.200/GB | N/A | N/A | N/A |
All prices are from Newegg.com except *Amazon.com.
- For historical comparisonshistorical comparisons (not counting the cost of inflation):
- In 1956, IBM's first hard disk, RAMDAC, was 5 MB ($50,000) at a cost of $10,000,000 per GB
- In 1980, a typical 40 MB hard drive ($1,200) had a cost of $36,000 per GB.
- In 1990, I (Mead) bought my first "big" hard drive, 200 MBs. It cost me about $1000. That's $5,000 per GB, thank you very much.
- In early 2000, drives had a cost of about $20 per GB.
- Today, hard drives cost a few pennies per GB.
For large amounts of storage, the magnetic hard drivemagnetic hard drive is still the most popular.

- physical size comparison of a 5.25" full-height drive and a laptop drive.
- The hard disk drive consists of several disks (platters) stacked one atop the other, rotating in sync at high speed (e.g. 3600, 5400, 5900, 7200, 10000, 15000 RPM).
- Each platter is coated with a thin magnetic film (about 1/10,000 the thickness of a piece of paper)
- For each disk there is a head that is used for reading and writing.
- Each head is attached to a disk arm (in a comb-like fashion) that positions the heads over the disks.
- The heads move in unison and not independently.
- Each disk is divided into concentric circles, called tracks.
- Each disk has the same number of tracks.
- The collection of tracks, one from each disk, lying at the same radius is called a cylinder.
- Each track is divided into arcs, called sectorssectors.
- Sectors are defined by the device (hardware).
- Each sector contains the same fixed number of bytes. (e.g. legacy 512 bytes or newer advanced 4096newer advanced 4096 bytes)
- Why 4,096 bytes? Sound familiar? (NTFS/Ext3/HFS+ used for Windows, Linux, and Mac,
respectively, default cluster size as well. Nice article here.)
- Small sectors are inefficient for very large files; large sectors are wasteful for very small files.
- Large sectors can cause internal fragmentation with many small files. (External fragmentation is different.)
- An entire sector is read/written in a single disk operation. (Smallest operation)
- Modern drives use Zone Bit RecordingZone Bit Recording (more sectors on the outer tracks) to divide tracks into zones.
- A position on the hard disk originally specified by the disk's geometry, which is specified as a triple: (Cylinder #, Head #, Sector #,
CHSCHS)
- Nowadays, CHS is used only by a few devices (and utility programs).
- Operating systems use LBALBA
(Logical Block Addressing) as the interface (LBA 0, LBA 1, etc.)
- Sectors are addressed from 0 up to the number of total sectors on the drive.
- LBA can be used with other storage devices (e.g. tapes) that don't have cylinders/heads/sectors.
- LBA is also simpler when using zone bit recording.
- Two factors affect the positioning time, that is, the amount of time it takes to access a position on the disk.
- Linear velocity vs. Angular velocity
- All sectors in all tracks experience the same angular velocityangular velocity
- Tracks on the outside of the disks move faster than the inner tracks. (linear velocity)
- Audio Compact Disks (CDs) use constant linear velocityconstant linear velocity
to keep the data access constant.
- This is fine for audio, because it is all sequential. The gradual speed changes were not noticeable.
- When burning high-speed CDs (and DVDs), there is a very noticeable speed change because disks varying their speeds during accesses. (The jet engine syndrome.)
- Now, data CDs use constant angular velocity, which means that outer tracks read/write faster than inner tracks.
- Better for random access reading (Don't have to change speed for each random sector read)
- A hard disk is connected to the computer via an I/O bus.
- Two controllers are used to transfer data between the computer and the disk.
- Disk controller on the disk end of the bus, used to read/write date to and from the disks.
- Host controller on the computer end of the bus
- There are also caches on the drives and/or controllers as well, to speed up access.
Disk formattingDisk formatting (low-level, hardware)
- Prior to usage, a disk must be (physically, or low-level) formatted.
- Each sector contains a data structure
- Sector number within the track
- Data area, which stores data.
- Error correcting code (ECC), a value that is computed from values in the data area.
- Stored when the data area is written
- Recomputed when the data is read, and compared with the written ECC
- Error occurs if the two values do not match
- A disk may have bad sectors unusable, or defective sectors.
- A new disk may come with bad sectors. (bad sector map)
- A disk may develop bad sectors over its lifetime.
- Bad sectors are typically kept track of by the disk itself, and a bad sector is replaced by a spare sector (set aside by the disk, not visible to the OS).
- Sector sparing may allow for reserving one sector on a track in case a sector becomes unusable.
Disk partitioningDisk partitioning and high-level formatting
- After a low-level format, the disk must be partitioned divided up into groups of cylinders.
- Some operating systems refer to these as logical drives.
- Each partition is treated by the OS as a separate logical hard disk.
- Once the drive is partitioned (even with only 1 partition), it is ready to
have a filesystem placed on it.
- There is one filesystem per partition.
- Windows XP with 3 hard disks:
- Windows 7 with 1 hard disk:
- Partition info from olga, athena, sabrina and maya
- To see the output shown in the links above, issue this command: (That's a lowercase L)
sudo fdisk -l
This will list all of the partitions from all of the disks on the system.
- Graphical view of storage on olga.
- Graphical view of storage on maya.
- Now these partitions need a logical formatting (create the file system)
- This is what the OS deals with
- Groups sectors into clusters (or blocks) for better efficiency (internal fragmentationinternal fragmentation)
- Clusters/blocks are defined by the operating system. (Sectors are defined by the device.)
Windows cluster sizes (allocation units)
- Some applications (e.g. databases, pagefile) to do their own file I/O and read sectors directly.
- Typical uses for partitions: swap space, separate system and data partitions, different file systems, different operating systems, etc.
- Mutli-boot systems - each OS is on it's own partition.
- Each partition can use a different file system.
- Can be mounted with different attributes, e.g. read-only
- Different areas of the hard disk have better performance characteristics (e.g. the outer tracks are faster than the inner tracks)
- Swap files may reside on their own partition (different format).
- Log files or files that grow quickly. (On Unix-like systems this is the /var directory)
- Runaway programs or malicious programs only affect a single partition.
- Doesn't endanger the system partition from running out of space.
- Isolates files - if one partition gets corrupted, the others may be fine.
- There are some drawbacks:
- There is overhead that will be duplicated on each partition.
- You can't move a file between partitions, you have to copy it. (copy/delete)
- Sometimes you can't link to a file on a different partition.
- Can mount and unmount partitions as needed.
- Boot process from before.
- When several disk I/O requests are made, the OS must choose the order in which to service the requests.
- Disk access involves moving the disk heads.
- The farther the heads have to move, the longer the seek time.
- To maximize disk bandwidth (the data transfer rate), we want to minimize the total seek time.
- Hardware schedulers: (move scheduling from OS to disk/controller, OS still schedules I/O requests but hardware orders the requests received)
Scheduling algorithms
- First-Come First-Serve (FCFS) - schedule I/O requests in the order in which they are received.
- Simple to implement with a first-in, first-out (FIFO) queue.
- Gives poor bandwidth, in general.
- Shortest Seek Time FirstShortest Seek Time First (SSTF) process the request that minimizes the distance the head must travel from its current position.
- May cause starvation (like SJF processor scheduling).
- ScanScan algorithm (elevator algorithm)
- order the requests so that the head moves in the same direction until the end or beginning of the disk is reached
- then the movement direction is reversed.
- Circular SCAN (C-SCAN) algorithm
- service the requests so that the head always moves from disk beginning to disk end
- when the end is reached, the head is moved back to the disk beginning.
- Provides a more uniform wait time than SCAN. (With SCAN, the middle tracks will be visited twice as much as the inner/outer tracks)
- LOOKLOOK and C-LOOK the same as the SCAN and C-SCAN algorithms, except that the head is not moved to the actual beginning or end, but rather the closest extreme among the scheduled requests.
Scheduling examples assumptions:
- Disk has 200 cylinders (0-199)
- Head currently at cylinder 53, and moving towards the inside of the disk (increasing cylinder #)
- Cylinder requests:
98, 183, 37, 122, 14, 124, 65, 67
FCFS example
- Cylinder requests: 98, 183, 37, 122, 14, 124, 65, 67
- Total head motion:
|98-53| + |183-98| + |37-183| + |122-37| + |14-122| + |124-14| + |65-124| + |67-65|
= 45 + 85 + 146 + 85 + 108 + 110 + 59 + 2
= 640 cylinders
SSTF example
- Cylinder requests: 98, 183, 37, 122, 14, 124, 65, 67
- The order of processing is
(53), 65, 67, 37, 14, 98, 122, 124, 183
Total head motion:
|65-53| + |67-65| + |37-67| + |14-37| + |98-14| + |122-98| + |124-122| + |183-124|
= 12 + 2 + 30 + 23 + 84 + 24 + 2 + 59
= 236 cylinders
SCAN example
- Cylinder requests: 98, 183, 37, 122, 14, 124, 65, 67
- Processing order
(53), 65, 67, 98, 122, 124, 183, (199), 37, 14
Total head motion:
|65-53| + |67-65| + |98-67| + |122-98| + |124-122| + |183-124| + |199-183| + |37-199| + |14-37|
= 12 + 2 + 31 + 24 + 2 + 59 + 16 + 162 + 23
= 331 cylinders
C-SCAN example
- Cylinder requests: 98, 183, 37, 122, 14, 124, 65, 67
- Processing order
(53), 65, 67, 98, 122, 124, 183, (199), (0), 14, 37
Total head motion:
|65-53| + |67-65| + |98-67| + |122-98| + |124-122| + |183-124| + |199-183| + |0-199| + |14-0| + |37-14|
= 12 + 2 + 31 + 24 + 2 + 59 + 16 + 199 + 14 + 23
= 382 cylinders
LOOK example
- Cylinder requests: 98, 183, 37, 122, 14, 124, 65, 67
- Processing order (compare with Scan example)
(53), 65, 67, 98, 122, 124, 183, 37, 14
Total head motion:
|65-53| + |67-65| + |98-67| + |122-98| + |124-122| + |183-124| + |37-183| + |14-37|
= 12 + 2 + 31 + 24 + 2 + 59 + 146 + 23
= 299 cylinders
C-LOOK example
- Cylinder requests: 98, 183, 37, 122, 14, 124, 65, 67
- Order of processing (compare with C-Scan)
(53), 65, 67, 98, 122, 124, 183, 14, 37
Total head motion:
|65-53| + |67-65| + |98-67| + |122-98| + |124-122| + |183-124| + |14-183| + |37-14|
= 12 + 2 + 31 + 24 + 2 + 59 + 169 + 23
= 322 cylinders
Points:
- The examples just accounted for seeks (which are the major factor in search times)
- Other factors such as rotational latency might need to be considered.
- The OS usually doesn't know about latency or the current positions of the heads, so the
hardware should do the scheduling.
- But, the OS may want to prioritize disk reads (e.g. paging is higher than user stuff).
- Too find out which scheduling algorithm your device is using (assume sda in this example):
cat /sys/block/sda/queue/scheduler
The command lists the available schedulers with the currently active one in brackets.
On all of my Linux Mint 13 systems (SSD and hard drives) I get this:
noop deadline [cfq]
However, on my Raspberry Pi,
I run this:
cat /sys/block/mmcblk0/queue/scheduler
And I get this:
noop [deadline] cfq
My ODroid-U3
is the same as the Raspberry Pi. It seems that the default scheduler on Linux Mint 17,
is also the deadline scheduler.
- Links for
noopnoop,
deadlinedeadline, and
cfqcfq.
- There is an older one named anticipatoryanticipatory
which has now been replaced by cfq.
- Most scheduling algorithms that re-order requests are variations of the Shortest-seek-time first, SCAN, and LOOK algorithms.
- Sometimes referred to as Redundant Array of Independent Disks.
- Generally uses JBOD (Just a Bunch Of Disks).
- The concept is that you make two or more physical drives (an array) appear as a single drive.
- There are several strategies, and two primary stategies are for performance and reliability.
- Striping This is for performance. Files (data) are split among the drives.
- Mirroring This is for reliability. Files are duplicated (mirrored) on multiple drives.
- ParityParity This is another form of reliability where one or more drives store parity information.
- The different schemes are divided into categories called RAID levelsRAID levels:
- There is hardware-based RAIDhardware-based RAID and
software-based RAIDsoftware-based RAID.
- Hardware:
- Uses less CPU resources as all the work is done on the controller.
- Can protect boot drive.
- OS doesn't have to know about RAID.
- If OS can't do RAID, it's OK as the hardware does all and the OS sees one big drive.
- Can't move drives to different controllers.
- Software:
- Uses CPU, but CPU performance has dramatically improved over the years and hard drives have not.
- Can move drives to another similar OS.
- Software can do advanced disk scheduling (via filesystems, e.g. ZFS and Btrfs).
- Boot drives are problematic.
- There is also fake RAIDfake RAID, which is unfortunately very prevalent.
- Finally, don't ever forget this:
!! RAID IS NOT BACKUP !!
Running a full-sized desktop computer as a simple backup server is overkill. You don't need a lot of
processor power or memory. If all you really want is some extra storage space to use as a backup,
then a NAS is the perfect way to go. Even a processor that is used in a cell phone is adequate, and you
don't even need 1 GB of memory, either.
- A NAS is a simple "mini-computer" (not like the mini-computersmini-computers of yesteryear) that provides
everything you need to handle file backups.
- Most NAS devices run Linux or a form of Linux (usually stripped down, which makes it very easy to manage).
- They generally support some type of RAID system.
(You want reliability the most, so at least RAID 1.)
- Simple "home" NAS devices have as little as 256 MB of memory (yeah, MB!) and a low-powered CPU.
- Higher-end "enterprise" NAS devices can have very powerful processors and several GBs of memory.
- Some will allow external drives (USB, eSATA, etc.) to be connected.
- They have minimal connections and don't have video support so you manage them via a web browser.
- In addition to providing basic file services, most NAS devices (especially home versions) support such things as:
- Additional file services (e.g.
CIFSCIFS (SMB, Samba),
NFSNFS,
AFPAFP,
FTPFTP,
HTTPHTTP,
HTTPSHTTPS)
- Streaming services (DLNADLNA, iTunes).
- Discovery services (BonjourBonjour, UPnPUPnP)
- Photo server (a complete website for sharing photos)
- Terminal services (e.g. SSHSSH, TelnetTelnet).
- Surveillance (web/video cams).
- Mail services (POP3POP3,
SMTPSMTP,
IMAPIMAP).
- Connecting printers and sharing them with computers on the network.
- Built-in firewalls and security.
- Apps to access the backup from iPhones, iPads, Androids, etc.
- NAS devices that I've used:
- Synology DS209 - Low cost, low power, nice interface, under-powered CPU.
- Time to backup: about 25 minutes (delta on about 1,000,000 files consuming about 325 GBs of space)
- Synology DS212j - Low cost, low power, nice interface, broad support, decently powered CPU.
- Time to backup: about 6 minutes.
- Netgear ReadyNAS Pro 2 - Slightly higher cost, decent interface, more powerful CPU.
- Time to backup: about 4 minutes.
- Hard drives have moving parts that
- are slow as seen from the CPU and memory
- wear out over time
- generate heat and noise
- are susceptible to crashes (shocks)
- A RAM diskRAM disk is a logical disk created from main memory
- Super fast as there are no moving parts, it's just memory.
- When the power is turned off, all data on the "drive" is lost.
- The memory used is no longer available to the CPU. (Great for DOS-like operating systems).
- Solid state drivesSolid state drives (SSD) are drives made from
certain types of flash memoryflash memory
- No moving parts, so they are fast, quiet, and generally cooler.
- They use a different kind of memory that persists when the power is turned off.
- DRAMDRAM-based volitile memory (battery required to retain data)
- NANDNAND-based non-volitile memory (no power needed to retain data)
- SLCSLC,
MLCMLC, and
TLC (Single, mulitple, triple, levels).
- The more expensive "enterprise" level of SSD do
over-provisioningover-provisioning
to increase performance, e.g. an SSD with 128 GBs of storage is advertised as a 100 GB SSD. (The classic space-time tradeoff).
Some spinning hard-drives do this by only using the outer cylinders.
- The memory has a limited number of writes over the lifetime of the drive.
- Some drives implement wear-levelingwear-leveling
(i.e. don't reuse a cell until all others have been used).
- Getting better/longer all the time. (i.e. warranties)
- Hybrid drives combine a hard drive and SSD in one.
- Can't overwrite memory, must erase it first. (Many operating systems have a
TRIMTRIM command to free pages)
- Typically, SSDs write 4 KB pages but erase 256 - 512 KB blocks.
- This is why write speeds can be dramatically slower than read speeds.
- Unlike magnetic media, this makes it impossible to "undelete" files.
- Originally, only a few operating systems (file systems) and/or SSDs provided this. Now, almost all do.
- See write amplification for more details.
- Currently very expensive compared to hard drives (Costs)
- Currently limited storage compared to hard drives (Sizes)
- F2FSF2FS - Flash-Friendly File System that is optimized for:
- Random access - No seek-time latency.
- Erasing blocks - Do it in the background when disk is idle.
- Distributing the writes across the disk; don't reuse the same blocks as often.
- As SSD components get cheaper, larger, and more robust, they will start to fill in for smaller hard drives.
- The most likely scenario for the next several years is for hybrid systems where the SSD sits between the hard drive and main memory (cache-like mechanism).
- Microsoft has a technology called ReadyBoostReadyBoost that does something like this.
- Putting often-accessed (especially read-only) files on an SSD is a good balance.
- Hard drives will likely be around for long-term storage and for things that don't need the speed and expense of SSDs.
- Interesting comparisoncomparison of SSDs with hard drives.
- The SSD Tutorial - A video introduction to SSDs from Newegg TV.
- This is really good for those that want to learn more about storage hardware.
- Another video from Newegg TV showing the very popluar Samsung 830 SSD.
For detailed information on any command, use the man pages or Google.
- hdparm - get/set hard disk parameters (hdparm --help) especially -t, -i, -I, and --offset XGBs
- A nice introduction to hdparm can be found here.
- smartctl - part of the smartmon tools (smartctl --help)
- This displays the S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology) information from a disk.
Links: