RAID Storage Redundancy
Overview
RAID (Redundant Array of Inexpensive/Independent Disks)…
- …dedicated hardware controller or entirely in software
- RAID system will have several storage devices as the bottom layer
- …these will be partitioned (another block device)…
- …and combined into a raid array (yet another block device)
Reasons to use RAID…
- Performance…
- …enhanced transfer speed
- …enhanced number of transactions per second
- …increased single block device capacity
- Redundancy…
- …greater efficiency in recovering from a device failures
- Should never be used as a replacement for reliable backup
Types & Levels
Three possible types of RAID…
- Firmware RAID…onboard integrated RAID controllers
- Hardware RAID…dedicated RAID controller (typically PCIe)
- Software RAID…completely hardware independent
Standard numbering system of raid levels…
RAID without redundancy…
- Linear - Simple grouping of drive…
- …creates a larger virtual drive…
- ..called append mode…IO to one device until full
- RAID 0 - Striping
- …data is interleaved across all drives in the array
- Bottlenecks associated with I/O to a single device are alleviated
- JBOD (Just a Bunch Of Disks)
- …circumvent RAID firmware…act like a normal disk controller
- configure device without RAID support
RAID with redundancy…
- RAID 1 - Mirroring
- …exact replica of all data on all devices
- Write performance hit…increases with number of devices
- RAID 4 - Dedicated Parity (…obsolete)
- …dedicated drive is used to store parity
- Single device failure can be recovered
- RAID 5 - Distributed Parity (block-level striping)
- At least 3 devices…support one device failure
- …parity information is spread across all devices
- Reduce the bottleneck inherent in writing parity
- RAID 6 - Double Distributed Parity
- At least 4 devices…supports two device failures
- Write speed is slow because of double parity…restoring process is long
Nested RAIDs..
- RAID 10 - Striping Mirror
- …hybrid array results from the combination of RAID-0 and RAID-1
- Performance of striping…redundant properties of mirroring
- …most expensive solution…lot of surplus disk hardware
- RAID 50 - Striping Parity
- …combine two RAID-5 arrays into a striped array
- Performance is slightly lower than a RAID 10
- Each RAID-5 can survive a single disk failure
RAID storage capacities…
RAID level | Realized capacity |
---|---|
Linear mode | DiskSize 0+ DiskSize 1 +…DiskSizen |
RAID-0 (striping) | TotalDisks * DiskSize |
RAID-1 (mirroring) | DiskSize |
RAID-4,5,6 | (TotalDisks-1) * DiskSize |
RAID-10 (striped mirror) | NumberOfMirrors * DiskSize |
RAID-50 (striped parity) | (TotalDisks-ParityDisks) * DiskSize |
Terminology
- Parity algorithms are an alternative to mirroring for redundancy…
- Striping…data is spread across multiple disks
- …improves read and write performance
- …stripes also support redundancy through disk parity
- Degraded…array supporting redundancy…with a failed device
- Rebuild (aka recovery)…
- ..reconstructed using the parity information provided by the remaining disks
- Usually puts an additional strain on system resource
- Scrubbing…check for data corruption and errors…
Automation to using a spare device…
- Hot-Spare
- Extra storage devices to act as spares when a drive failure occurs
- …replace a failed drive with a new drive without intervention
- Decreases the chance that a second drive will fail and cause data loss
- Hot-Swap
- Removed a failed drive from a running system…without reboot
- Need special hardware that supports it…mostly available today…
mdadm
Linux Software RAID…
mdraid
subsystem was designed as a software RAID solution for Linux- Package
mdadm*.{deb,rpm}
…configured with themdadm
utility - Can consists of…
- …physical devices
- …partitions
- …any Linux block device
- Package
dmraid
(deprecated) used on a wide variety of firmware RAID implementations- Hardware RAID controllers have no specific RAID subsystem…
- …come with their own drivers…
- …allow the system to detect the RAID sets as regular disks
Terminology…
- Superblock contains RAID metadata…
- ~1KB stored in the beginning or in the end of each member disk
- RAID information…same on all the disks
- Number of devices…block size
- RAID level and layout
- Disk information…unique to each disk
- Superblock number…
- Disk role…column height
- …allows the array to be reliably re-assembled after a shutdown
- Assemble…rebuilds all RAID arrays
mdadm --assemble --scan
…scan drives for superblocks- partially assembled array…in case of issues
- …automatic during boot…create
mdadm.conf
- …requires support in initramfs if used for
/
(root-partition)
- Scrubbing…
- …regularly read blocks on devices to catch bad blocks early
- …read-error handling…write back of correct data from other devices
- …blocks read successfully…found to not be consistent…mismatch
/dev/md*
md
multiple device driver…man 4 md
- RAID solution that is completely hardware independent…
- …implements the various RAID levels in the kernel disk (block device) code
- …dependent on the server CPU performance and load
/dev/md*
virtual devices…created from one or more independent underlying devices
Basic command syntax…
mdadm [mode] <raid-device> [options] <component-devices>
mdadm.conf
Configuration file…
- …collection of words separated by white space…
#
for comments - Keywords…
DEVICE
list devices/partitions t-ARRAY
identify arrays for assemblyMAIL{ADDR,FORM}
configure alert mail notificationCREATE
default for array creation- …more cf.
man 5 mdadm.conf
# add a configuration to the end
mdadm --detail --scan >> /etc/mdadm.conf
ARRAY
example configuration…
ARRAY /dev/md0 metadata=1.2 name=node.fqdn:0 UUID=42650a5c:7eb06556:6db9f264:03ec67e8
- …second word
/dev/md*
device to assembled…orignore
- otherwise…use various heuristics to determine an appropriate name
- …subsequent words identify the array,
name
identifier…typically the node name with device number as suffixuuid
128 bit…stored in the superblockdevices
…comma separated list of device names
Create
Create a RAID configuration…mode --create
- …writes the per-device superblocks…initialisation…
- …making sure disks of a mirror are identical
- or…parity array the parities are correct
-–level=
RAID level…–raid-devices=
number of active devices in the array
# mirror
mdadm --create /dev/md0 --level=raid1 --raid-devices=2 /dev/nvme[0-1]n1
# parity raid
mdadm --create /dev/md0 --level=raid5 --raid-devices=3 /dev/sd[abc]3
Monitor
# inspect the configuration in detail
mdadm --detail /dev/md0
# detailed information about each RAID device
mdadm --examine /dev/sd[bc]1
/proc/mdstat
snapshot of the kernel’s RAID/md state…
watch -n .1 cat /proc/mdstat
# ...more details...
mdadm --misc --detail /dev/md[012]
Manage
Remove device from array…
--fail
if not already in failed stated (due to a defect)--remove
a device from an array
mdadm /dev/md0 --fail /dev/sda1 --remove /dev/sda1
--add
new device to array (replacing a failed one probably)
mdadm --add /dev/md0 /dev/sdb1
Stop and delete array…
# unmount the file-systems...
umount /dev/md0
# stop the device...
mdadm --stop /dev/md0
# remove the device
mdadm --remove /dev/md0
# ...incase if it errors with "No such file or directory"
mdadm --zero-superblock /dev/sda1 /dev/sdb1
dm-raid
Device-mapper RAID (dm-raid) target provides a bridge from DM to MD…
- …allows the
mdraid
drivers to be accessed using a device-mapper interface - Supports…
- …RAID device discovery
- …RAID set activation, creation, removal, rebuild
- …display of properties for ATARAID/DDF1 metadata
lvm
RAID
LVM supports RAID…
- …created and managed by LVM using the
mdraid
kernel drivers - …levels 0, 1, 4, 5, 6, and 10
- Supports snapshots…
Create a RAID logical volume using lvcreate
lvcreate --type raid1 -m 1 -L 1G -n my_lv my_vg
ZFS
Implements RAID-Z…
- …variation on standard RAID-5 that offers better distribution of parity
- Eliminates RAID-5 write hole…inconsistency in case of power loss
- Three levels…
- …based on the number of parity devices
- …number of disks that can fail while the pool remains operational
Pool type in order of performance…
mirror
– More disks, more reliability, same capacity. (RAID 1)raidz1
– Single parity, minimum 3 disks. Two disk failures results in data loss.raidz2
– Dual parity, minimum 4 disks. Allows for two disk failures.raidz3
– Triple parity, minimum 5 disks. Allows for three disk failures.
zfs list [<name>] # show file-systems
zfs set mountpoint=<path> <name> # set target mount point for file-system
zfs mount <name> # mount a file-system
zfs umount <path> # unmount a file-system
grep -i mount= /etc/default/zfs # boot persistance
findmnt -t zfs # list mounted file-systems
zfs list -o quota <name> # show quota for file-system
zfs set quota=<size> <name> # set quota for file-system
zpool status [<name>] # show storage pools
zpool scrub <name>
zpool create <name> <type> <device> [<device>,...]
# create a new storage pool
Btrfs
…ability to combine and manage several disks as one filesystem
Supports following RAID profiles…
- RAID 1…
- RAID 1c3…stores 3 copies on separate disk
- RAID 1c4…stores 4 copies on separate disks
- RAID 10…RAID1+RAID0 modes for increased performance and redundancy
- …not yet stable or suitable for production use
- RAID 5…striped mode with 1 disk as redundancy
- RAID 6…striped mode with 2 disks as redundancy
storecli
lspci | grep -i raid # find hardware RAID controllers
MegaRAID SAS is the current high-end RAID controllers series by LSI
- StorCLI Reference Manual
- Search for the latest version on the Broadcom web-page.
- Download will require accepting EULA…
unzip
the archive- Add the
noarch
to the repository
# list the included packages...
>>> find Unified_storcli_all_os -name '*.rpm'
Unified_storcli_all_os/ARM/Linux/storcli-007.2007.0000.0000-1.aarch64.rpm
Unified_storcli_all_os/Linux/storcli-007.2007.0000.0000-1.noarch.rpm
# ...install the command-line tools
>>> dnf install -y Unified_storcli_all_os/Linux/storcli-007.2007.0000.0000-1.noarch.rpm
# for simplicity...
>>> ln -s /opt/MegaRAID/storcli/storcli64 /sbin/storcli
storcli
command interfaces with the RAID controller…following general format…
<[object identifier]> <verb> <[adverb | attributes | properties] > <[key=value]>
- Object identifiers…
/cx
controller x/cx/vx
virtual drive x on controller x
- Verbs…
add
,del
,set
,show
, etc. <[adverb | attributes | properties] >
…- …specifies what the verb modifies or displays
<[key=value]>
…if a value is required by the command
# summary of the drive and controller status
storcli show
# number of controllers detected
storcli show ctrlcount
# show controller specifics...
storcli /c0 show | grep -e ^Product -e ^Serial -e Version
# ...controller & virtual devices configuration
storcli /c0/v0 show all
Devices
Show the list of devices on a specific controller…
EID:Slt
enclosure number and slot numbers
# ...first controller in this example
>>> storcli /c0 show
...
---------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type
---------------------------------------------------------------------------------
8:0 9 Onln 0 446.625 GB SATA SSD N N 512B INTEL SSDSC2KG480G8 U -
8:1 10 Onln 0 446.625 GB SATA SSD N N 512B INTEL SSDSC2KG480G8 U -
8:2 16 Onln 1 5.457 TB SAS HDD N N 512B HUS726060AL5210 U -
8:3 11 Onln 1 5.457 TB SAS HDD N N 512B HUS726060AL5210 U -
8:4 12 Onln 1 5.457 TB SAS HDD N N 512B HUS726060AL5210 U -
8:5 13 Onln 1 5.457 TB SAS HDD N N 512B HUS726060AL5210 U -
8:6 14 Onln 1 5.457 TB SAS HDD N N 512B HUS726060AL5210 U -
8:7 15 Onln 1 5.457 TB SAS HDD N N 512B HUS726060AL5210 U -
---------------------------------------------------------------------------------
...
Configuration
Create add vd
RAID configuration, aka virtual drive…
r6
akatype=raid6
drives=e:s|e:s-x|e:s-x,y
e
specifies the enclosure ID,s
represents the slote:s-x
range convention…slotss
tox
in the enclosuree
AWB
(always write back) ignores the non-existence of a cache modulera
read ahead should be helpful if the read access is fairly regularStrip=64
is believed to by a potentially important parameter…- …amount of data written to one physical disk before moving to the next one…
- …in reality no one knows about the I/O, so take some middle value here
# created a Raid array
storcli /c0 add vd r6 drives=8:2-7 AWB ra cached Strip=64
Sanity Checks
Basic sanity checks are enable for the RAID controller:
- Patrol read checks all disks for bad blocks…
- …if the system load is small
- …should not have a noticeable performance impact
- Consistency check for parity…time interval
delay=2016
(roughly 3 month)
# run both consistency checks and patrol reads
storcli /c0 set cc=conc delay=2016 starttime=2022/04/28 11