RAID Storage Redundancy
Overview
RAID — Redundant Array of Inexpensive/Independent Disks
- Either a dedicated hardware controller or implemented entirely in software
- A RAID system will have several storage devices as the bottom layer
- …these will be partitioned (another block device)…
- …and combined into a raid array (yet another block device)
What are the reasons to use RAID?
- Performance
- …enhanced transfer speed
- …enhanced number of transactions per second
- …increased single block device capacity
- Redundancy
- …greater efficiency in recovering from a device failures
- …should never be used as a replacement for reliable backup
Types & Levels
Three possible types of RAID…
- Firmware RAID…onboard integrated RAID controllers
- Hardware RAID…dedicated RAID controller (typically PCIe)
- Software RAID…completely hardware independent
Standard numbering system of raid levels…
RAID without redundancy…
- Linear - Simple grouping of drive…
- …creates a larger virtual drive…
- ..called append mode…IO to one device until full
- RAID 0 - Striping
- …data is interleaved across all drives in the array
- Bottlenecks associated with I/O to a single device are alleviated
- JBOD (Just a Bunch Of Disks)
- …circumvent RAID firmware…act like a normal disk controller
- configure device without RAID support
RAID with redundancy…
- RAID 1 - Mirroring
- …exact replica of all data on all devices
- Write performance hit…increases with number of devices
- RAID 4 - Dedicated Parity (…obsolete)
- …dedicated drive is used to store parity
- Single device failure can be recovered
- RAID 5 - Distributed Parity (block-level striping)
- At least 3 devices…support one device failure
- …parity information is spread across all devices
- Reduce the bottleneck inherent in writing parity
- RAID 6 - Double Distributed Parity
- At least 4 devices…supports two device failures
- Write speed is slow because of double parity…restoring process is long
Nested RAIDs..
- RAID 10 - Striping Mirror
- …hybrid array results from the combination of RAID-0 and RAID-1
- Performance of striping…redundant properties of mirroring
- …most expensive solution…lot of surplus disk hardware
- RAID 50 - Striping Parity
- …combine two RAID-5 arrays into a striped array
- Performance is slightly lower than a RAID 10
- Each RAID-5 can survive a single disk failure
RAID storage capacities…
| RAID level | Realized capacity |
|---|---|
| Linear mode | DiskSize 0+ DiskSize 1 +…DiskSizen |
| RAID-0 (striping) | TotalDisks * DiskSize |
| RAID-1 (mirroring) | DiskSize |
| RAID-4,5,6 | (TotalDisks-1) * DiskSize |
| RAID-10 (striped mirror) | NumberOfMirrors * DiskSize |
| RAID-50 (striped parity) | (TotalDisks-ParityDisks) * DiskSize |
Terminology
- Parity algorithms are an alternative to mirroring for redundancy…
- Striping…data is spread across multiple disks
- …improves read and write performance
- …stripes also support redundancy through disk parity
- Degraded…array supporting redundancy…with a failed device
- Rebuild (aka recovery)…
- ..reconstructed using the parity information provided by the remaining disks
- Usually puts an additional strain on system resource
- Scrubbing…check for data corruption and errors…
Automation to using a spare device…
- Hot-Spare
- Extra storage devices to act as spares when a drive failure occurs
- …replace a failed drive with a new drive without intervention
- Decreases the chance that a second drive will fail and cause data loss
- Hot-Swap
- Removed a failed drive from a running system…without reboot
- Need special hardware that supports it…mostly available today…
Mdadm
Mdadm1 — Software RAID2 through the multiple devices driver (MD) in kernel
# find out if a given device is a RAID array
>>> mdadm --query /dev/md127
/dev/md127: 1788.37GiB raid1 2 devices, 0 sparesOptions to check device status:
--query— Verify that a device is a component of an array--detail— Print details of one or more md devices--examine— Print metadata stored on device(s)
# show the MD configuration
mdadm --query --detail /dev/md127
# a list of SATA devices
mdadm --examine /dev/sd[bc]1
# a list of NVMe devices
mdadm -E /dev/nvme[01]n1p1Terminology
Superblock — Contains RAID metadata
- ~1KB stored in the beginning or in the end of each member disk
- RAID information…same on all the disks
- Number of devices…block size
- RAID level and layout
- Disk information…unique to each disk
- Superblock number…
- Disk role…column height
- …allows the array to be reliably re-assembled after a shutdown
Assemble — Rebuilds all RAID arrays
mdadm --assemble --scan…scan drives for superblocks- partially assembled array…in case of issues
- …automatic during boot…create
mdadm.conf - …requires support in initramfs if used for
/(root-partition)
Scrubbing — Identify bad blocks
- …regularly read blocks on devices to catch bad blocks early
- …read-error handling…write back of correct data from other devices
- …blocks read successfully…found to not be consistent…mismatch
MD Driver
RAID solution that is completely hardware independent…
- …implements the various RAID levels in the kernel disk (block device) code
- …dependent on the server CPU performance and load
/dev/md* virtual devices …created from one or more independent underlying devices
mdadm [mode] <raid-device> [options] <component-devices>Configuration
/etc/mdadm.conf3 — Default configuration file
/etc/mdadm.conf.dfor additional configuration files- Files in that directory are read in lexical order…
- …after successful reading of the
/etc/mdadm.conf
Keywords (an incomplete list):
DEVICE- List of which block devices to scanARRAY- Identify arrays for assemblyMAIL{ADDR,FORM}- Configure alert mail notificationCREATE- Default settings when creating new arraysAUTO- Controls which arrays are automatically assembled
# add a configuration to the end
mdadm --detail --scan >> /etc/mdadm.confARRAY example configuration…
ARRAY /dev/md0 metadata=1.2 name=node.fqdn:0 UUID=42650a5c:7eb06556:6db9f264:03ec67e8- …second word
/dev/md*device to assembled…orignore - otherwise…use various heuristics to determine an appropriate name
- …subsequent words identify the array,
nameidentifier…typically the node name with device number as suffixuuid128 bit…stored in the superblockdevices…comma separated list of device names
Create
Create a RAID configuration…mode --create
- …writes the per-device superblocks…initialisation…
- …making sure disks of a mirror are identical
- or…parity array the parities are correct
-–level=RAID level…–raid-devices=number of active devices in the array
# mirror
mdadm --create /dev/md0 --level=raid1 --raid-devices=2 /dev/nvme[0-1]n1
# parity raid
mdadm --create /dev/md0 --level=raid5 --raid-devices=3 /dev/sd[abc]3Monitor
mdmonitor.service — Systemd unit to run mdadm --monitor
>>> systemctl cat mdmonitor | grep -i ^exec
ExecStart=/sbin/mdadm --monitor --scan --syslog -f --pid-file=/run/mdadm/mdadm.pid--monitor4 to run in monitor mode- …periodically poll arrays & report events
- …system wide mode
--scan…all devices that appear in/proc/mdstat
Events passed to a separate program …maybe mailed to an E-mail
cat > /etc/mdadm.conf.d/mail.conf <<EOF
MAILADDR notify@example.org
EOFTesting the configuration to ensure that e-mails are sent:
# temporarily stop the mdadm deamon
systemctl stop mdmonitor
# run in monitor mode and send test mails for all devices
mdadm --monitor --scan --testManage
Remove device from array…
--failif not already in failed stated (due to a defect)--removea device from an array
mdadm /dev/md0 --fail /dev/sda1 --remove /dev/sda1--add new device to array (replacing a failed one probably)
mdadm --add /dev/md0 /dev/sdb1Stop and delete array…
# unmount the file-systems...
umount /dev/md0
# stop the device...
mdadm --stop /dev/md0
# remove the device
mdadm --remove /dev/md0
# ...incase if it errors with "No such file or directory"
mdadm --zero-superblock /dev/sda1 /dev/sdb1dm-raid
Device-mapper RAID (dm-raid) target provides a bridge from DM to MD…
- …allows the
mdraiddrivers to be accessed using a device-mapper interface - Supports…
- …RAID device discovery
- …RAID set activation, creation, removal, rebuild
- …display of properties for ATARAID/DDF1 metadata
lvm RAID
LVM supports RAID…
- …created and managed by LVM using the
mdraidkernel drivers - …levels 0, 1, 4, 5, 6, and 10
- Supports snapshots…
Create a RAID logical volume using lvcreate
lvcreate --type raid1 -m 1 -L 1G -n my_lv my_vgZFS
Implements RAID-Z…
- …variation on standard RAID-5 that offers better distribution of parity
- Eliminates RAID-5 write hole…inconsistency in case of power loss
- Three levels…
- …based on the number of parity devices
- …number of disks that can fail while the pool remains operational
Pool type in order of performance…
mirror– More disks, more reliability, same capacity. (RAID 1)raidz1– Single parity, minimum 3 disks. Two disk failures results in data loss.raidz2– Dual parity, minimum 4 disks. Allows for two disk failures.raidz3– Triple parity, minimum 5 disks. Allows for three disk failures.
zfs list [<name>] # show file-systems
zfs set mountpoint=<path> <name> # set target mount point for file-system
zfs mount <name> # mount a file-system
zfs umount <path> # unmount a file-system
grep -i mount= /etc/default/zfs # boot persistance
findmnt -t zfs # list mounted file-systems
zfs list -o quota <name> # show quota for file-system
zfs set quota=<size> <name> # set quota for file-system
zpool status [<name>] # show storage pools
zpool scrub <name>
zpool create <name> <type> <device> [<device>,...]
# create a new storage poolBtrfs
…ability to combine and manage several disks as one filesystem
Supports following RAID profiles…
- RAID 1…
- RAID 1c3…stores 3 copies on separate disk
- RAID 1c4…stores 4 copies on separate disks
- RAID 10…RAID1+RAID0 modes for increased performance and redundancy
- …not yet stable or suitable for production use
- RAID 5…striped mode with 1 disk as redundancy
- RAID 6…striped mode with 2 disks as redundancy
storecli
lspci | grep -i raid # find hardware RAID controllersMegaRAID SAS is the current high-end RAID controllers series by LSI
- StorCLI Reference Manual
- Search for the latest version on the Broadcom web-page.
- Download will require accepting EULA…
unzipthe archive- Add the
noarchto the repository
# list the included packages...
>>> find Unified_storcli_all_os -name '*.rpm'
Unified_storcli_all_os/ARM/Linux/storcli-007.2007.0000.0000-1.aarch64.rpm
Unified_storcli_all_os/Linux/storcli-007.2007.0000.0000-1.noarch.rpm
# ...install the command-line tools
>>> dnf install -y Unified_storcli_all_os/Linux/storcli-007.2007.0000.0000-1.noarch.rpm
# for simplicity...
>>> ln -s /opt/MegaRAID/storcli/storcli64 /sbin/storclistorcli command interfaces with the RAID controller…following general format…
<[object identifier]> <verb> <[adverb | attributes | properties] > <[key=value]>- Object identifiers…
/cxcontroller x/cx/vxvirtual drive x on controller x
- Verbs…
add,del,set,show, etc. <[adverb | attributes | properties] >…- …specifies what the verb modifies or displays
<[key=value]>…if a value is required by the command
# summary of the drive and controller status
storcli show
# number of controllers detected
storcli show ctrlcount
# show controller specifics...
storcli /c0 show | grep -e ^Product -e ^Serial -e Version
# ...controller & virtual devices configuration
storcli /c0/v0 show allDevices
Show the list of devices on a specific controller…
EID:Sltenclosure number and slot numbers
# ...first controller in this example
>>> storcli /c0 show
...
---------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type
---------------------------------------------------------------------------------
8:0 9 Onln 0 446.625 GB SATA SSD N N 512B INTEL SSDSC2KG480G8 U -
8:1 10 Onln 0 446.625 GB SATA SSD N N 512B INTEL SSDSC2KG480G8 U -
8:2 16 Onln 1 5.457 TB SAS HDD N N 512B HUS726060AL5210 U -
8:3 11 Onln 1 5.457 TB SAS HDD N N 512B HUS726060AL5210 U -
8:4 12 Onln 1 5.457 TB SAS HDD N N 512B HUS726060AL5210 U -
8:5 13 Onln 1 5.457 TB SAS HDD N N 512B HUS726060AL5210 U -
8:6 14 Onln 1 5.457 TB SAS HDD N N 512B HUS726060AL5210 U -
8:7 15 Onln 1 5.457 TB SAS HDD N N 512B HUS726060AL5210 U -
---------------------------------------------------------------------------------
...Configuration
Create add vd RAID configuration, aka virtual drive…
r6akatype=raid6drives=e:s|e:s-x|e:s-x,y
especifies the enclosure ID,srepresents the slote:s-xrange convention…slotsstoxin the enclosuree
AWB(always write back) ignores the non-existence of a cache moduleraread ahead should be helpful if the read access is fairly regularStrip=64is believed to by a potentially important parameter…- …amount of data written to one physical disk before moving to the next one…
- …in reality no one knows about the I/O, so take some middle value here
# created a Raid array
storcli /c0 add vd r6 drives=8:2-7 AWB ra cached Strip=64Sanity Checks
Basic sanity checks are enable for the RAID controller:
- Patrol read checks all disks for bad blocks…
- …if the system load is small
- …should not have a noticeable performance impact
- Consistency check for parity…time interval
delay=2016(roughly 3 month)
# run both consistency checks and patrol reads
storcli /c0 set cc=conc delay=2016 starttime=2022/04/28 11Footnotes
Mdadam, GitHub
https://github.com/md-raid-utilities/mdadm/↩︎RAID arrays, Linux Documentation
https://docs.kernel.org/admin-guide/md.html↩︎`mdadm.conf, Linux manual page
https://www.man7.org/linux/man-pages/man5/mdadm.conf.5.html↩︎mdadm --monitor, Linux manual page
https://www.man7.org/linux/man-pages/man8/mdadm.8.html#MONITOR_MODE↩︎