Lustre — HPC Storage System
Client Installation & Configuration
Lean about Lustre from following resources…
Events in the Lustre community …LUG (Lustre User Group)
Packages
Available versions…
- Long-term support (LTS) …stable release recommended for production environments
- Feature release …ongoing development
- Native Linux Kernel …support build into the Linux kernel (considered unstable)
Packages available1…
- …from downloads.whamcloud.com …details at build.whamcloud.com
- …list of releases …roadmap …kernel support matrix
Long-Term Support
| Version | Date | Platform | 
|---|---|---|
| 2.15.0 | 2022/06 | EL 8.5 | 
| 2.15.1 | 2022/08 | EL 8.6 | 
| 2.15.2 | 2023/01 | EL 8.7, EL 9.0 | 
| 2.15.3 | 2023/06 | EL 8.8, EL 9.2 | 
| 2.15.4 | 2023/12 | EL 8.9, EL 9.3 | 
| 2.15.5 | 2024/06 | EL 8.10, EL 9.4 | 
| 2.15.6 | 2024/12 | EL 8.10, EL 9.5 | 
| 2.15.7 | 2025/06 | EL 8.10, EL 9.6 | 
Previous LTS…
| Date | Version | Platform | 
|---|---|---|
| 2020/06 | 2.12.5 | EL 7.8, EL 8.2 | 
| 2020/12 | 2.12.6 | EL 7.9, EL 8.3 | 
| 2021/07 | 2.12.7 | EL 7.9, EL 8.4 | 
| 2021/12 | 2.12.8 | EL 7.9, EL 8.5 | 
| 2022/06 | 2.12.9 | EL 7.9, EL 8.6 | 
Feature Releases
| Version | Date | Platform | 
|---|---|---|
| 2.13 | 2019/12 | EL 7.7, EL 8.0 | 
| 2.14 | 2021/02 | EL 8.3 | 
| 2.16 | 2024/11 | EL 9.4 | 
| 2.17 | 2025/Q4 | - | 
Each version has another build designated with an -ib suffix which includes the lib{ib,rdma} libraries as well as the client kernel modules build against LTS version of {ref}mlnx_ofed. Lustre client packages used in production are added to the local repository into a sub-directory called lustre/.
Quick Start
Following exemplifies how install and configure the Lustre client manually:
# install a specific version of the client package
dnf install -y lustre-client
# LNet configuration
cat <<EOF > /etc/modprobe.d/lustre.conf
options lnet networks="o2ib0(ib0)"
EOF
# load the kernel modules
modprobe lustre
modinfo lustre
# create the mount point
mkdir -p /lustre/alice
# mount the file-system
mount -t lustre 10.20.1.10@o2ib0:10.20.1.11@o2ib0:/alice /lustre/alice
# remove the mount
umount /lustre/alice
# check the kernel message buffer
dmesg | grep -i -e lustre -e lnetSource Code
Build the lustre from source…
- wiki.lustre.org …source code at whamcloud.com/public/lustre
- https://wiki.whamcloud.com/display/PUB/Rebuilding+the+Lustre-client+rpms+for+a+new+kernel
Build a new Lustre client on CentOS:
# install build dependencies
sudo yum install "kernel-devel-uname-r == $(uname -r)"
sudo yum install -y \
        asciidoc audit-libs-devel automake \
        bc binutils-devel bison \
        device-mapper-devel \
        elfutils-devel elfutils-libelf-devel expect \
        flex \
        gcc gcc-c++ git glib2 glib2-devel \
        hmaccalc \
        kernel-devel keyutils-libs-devel krb5-devel \
        ksh \
        libattr-devel libblkid-devel libselinux-devel libtool libuuid-devel libyaml-devel lsscsi \
        make \
        ncurses-devel net-snmp-devel net-tools newt-devel numactl-devel \
        openmpi-devel openssl-devel \
        parted patchutils pciutils-devel perl-ExtUtils-Embed pesign python-devel \
        redhat-rpm-config rpm-build \
        systemd-devel \
        tcl tcl-devel tk tk-devel \
        wget \
        xmlto \
        yum-utils \
        zlib-devel
# Download the Lustre source code
wget https://downloads.whamcloud.com/public/lustre/lustre-2.13.0/el7.7.1908/client/SRPMS/lustre-2.13.0-1.src.rpm 
# build the client, and create RPM packages
rpmbuild --rebuild --without servers lustre-2.13.0-1.src.rpmCompilation with Mellanox OFED distribution…
- …described in the Mellanox documentation in section Feature Overview and Configuration - Storage Protocols
- …for example in MLNX_OFED 5.4)
./configure --with-o2ib=/usr/src/ofa_kernel/default/
 make rpmsKernel Modules
Determine the version of the lustre-client package installed on a node:
>>> dnf list installed | grep -e kernel-core -e lustre-client
kernel-core.x86_64         4.18.0-348.12.2.el8_5   @anaconda           
kmod-lustre-client.x86_64  2.12.8_6_g5457c37-1.el8 @gsi-packages       
lustre-client.x86_64       2.12.8_6_g5457c37-1.el8 @gsi-packages    
# if multiple versions are installed
>>> dnf --showduplicates list kmod-lustre-client | tail -n+4 | sort -k2The Lustre client kernel module package kmod-lustre-client specifies the target Linux kernel in the packages description, for example:
# show package metadata
>>> dnf info kmod-lustre-client
...
Version      : 2.12.8_6_g5457c37
...
Description  : This package provides the lustre-client kernel modules built for
             : the Linux kernel 4.18.0-348.2.1.el8_5.x86_64 for the x86_64
             : family of processors.In cases no matching kernel modules package is available…
- lustre-client-dkmspackage builds modules against kernel source package
- Note that it is not recommended to use kernel modules build by DKMS.
# use Clustershell to identify lustre modules on nodes...
>>> date ; crush -b -- 'modinfo lustre | grep ^version'
...
lxbk[0264-0265,0267-0276,0279-0280].gsi.de (14)
---------------
version:        2.12.7
---------------
lxbk[0261-0262].gsi.de (2)
---------------
version:        2.12.8_6_g5457c37Versionlock
Why version locking the Linux kernel and Lustre client?
- Kernels and Lustre kernel modules need to be upgraded together
- Typically there is a delay until a lustre-clientpackage is available for a particular kernel…
- …make sure the kernel is not upgraded until a Lustre client is available
Install the required DNF versionlock plugin…
dnf install -y python3-dnf-plugin-versionlockThe lock file would look similar to:
>>> cat >> /etc/dnf/plugins/versionlock.list <<EOF
kernel-0:4.18.0-513.*
kernel-core-0:4.18.0-513.*
kernel-modules-0:4.18.0-513.*
kernel-tools-0:4.18.0-513.*
kernel-tools-libs-0:4.18.0-513.*
kernel-headers-0:4.18.0-513.*
kernel-devel-0:4.18.0-513.*
lustre-client-0:2.15.4*
kmod-lustre-client-0:2.15.4*
EOFMount
I/O happens via a service called the Lustre client…
- …responsible for providing a POSIX
- …creates a coherent presentation of the metadata and object data
- …file system IO is transacted over a network protocol
Lustre networking configuration…
- …clients must have valid LNet configuration
- …low-level device layer called a Lustre Network Driver (LND)
- …abstraction between the upper level LNet protocol and the kernel device driver
- …ko2iblnd.komodule for RDMA networks ..uses OFED …referred to as theo2ibLND
- …continue to read about static LNet configuration
# ...example configuration for RDMA verbs
>>> cat /etc/modprobe.d/lustre.conf
options lnet networks="o2ib0(ib0)"Read the mount.lustre manual page …use mount to start Lustre client…
mount -t lustre [-o options] <mgsname>:/<fsname> <client_mountpoint>- <mgsname>:=<mgsnode>[:<mgsnode>]- …colon-separated list of mgsnode
- …names where the MGS service may run
 
- …colon-separated list of 
- <mgsnode>@<lnd_protocol><lnd#>…LND protocol identifier and network number- …called an LNet Network Identifier (NID)
- …uniquely defines an interface for a host on an LNet communications fabric
 
- fsname…name of the file-system
# ...example mount...
mount -t lustre \
      -o rw,nosuid,nodev,relatime,seclabel,flock,lazystatfs \
      10.20.1.10@o2ib0:10.20.1.11@o2ib0:/alice /lustre/aliceSystemd Units
Systemd units to manage the Lustre mount point:
| Unit | Description | 
|---|---|
| lustre-*.mount | Mounts a file-system to /lustre | 
| unload-lustre.service | Forces unmount of Lustre and remove kernel modules when stopped | 
| lustre-params.service | Uses lctlto configure Lustre client options | 
| lustre-jobstats.service | Uses lctlto configure Slurm job statistics | 
# list all units
>>> systemctl list-units *lustre*
UNIT                    LOAD   ACTIVE SUB     DESCRIPTION
lustre-alice.mount      loaded active mounted Mount Lustre
lustre-jobstats.service loaded active exited  Enable Lustre Jobstats for SLURM Compute Node
lustre-params.service   loaded active exited  Configure Lustre Parameters
unload-lustre.service   loaded active exited  Unload lustre modules on shutdownUnmount Lustre storage and remove the kernel modules:
systemctl stop lustre-alice.mount unload-lustre.service
# following command should be zero if all modules have been removed...
lsmod | grep lustre | wc -l
# ...otherwise run...
lustre_rmmodlustre-*.mount
Following a Systemd mount unit for a Lustre file-system…
>>> systemctl cat lustre-alice.mount
# /etc/systemd/system/lustre-alice.mount
[Unit]
Description=Mount Lustre
Requires=network-online.target
Wants=systemd-networkd-wait-online.service
After=network-online.target
[Install]
WantedBy=remote-fs.target
[Mount]
What=10.20.1.10@o2ib0:10.20.1.11@o2ib0:/alice
Where=/lustre/alice
Type=lustre
Options=rw,flock,relatime,_netdev,nodev,nosuid
LazyUnmount=true
ForceUnmount=true…note that naming conventions for mount units are applied
Common Mount Options
Lustre mount options are described in man mount.lustre…
- flock…coherent userspace file locking across multiple client nodes- …imposes communications overhead in order to maintain locking
- …defaults is noflock…applications getENOSYSerror
 
General mount options are described in man mount …following may be relevant in context…
- _netdev…signal that file-system requires network access
- relatime…cleaver update of access times …reduces RPC load on Lustre
- nodev…ignore character or block special devices
- nosuid…ignore set-user-ID and set-group-ID
seclabel Option
Enabled SELinux (including permissive mode) may interfere with IO on Lustre…
# ...check for the seclabel mount option
findmnt /idril
TARGET        SOURCE                                   FSTYPE OPTIONS
/lustre/alice 10.20.1.10@o2ib0:10.20.1.11@o2ib0:/alice lustre rw,nosuid,nodev,relatime,seclabel,flock,lazystatfsseclabel is added by SElinux automatically …disable SELinux to prevent this
lustre-unload.service
Due to various reasons a clean unmount of a Lustre file-system may not work…
- …this could stop a node from properly rebooting (forcing a reset)
- Force umount -fto overcome this issue…- …-a -t lustre…applies to all Lustre file-systems
- …-l(lazy) option ignores references to this filesystem (does not matter since we reboot anyway)
 
- …
[Unit]
Description=Unload lustre modules on shutdown
DefaultDependencies=no
Requires=remote-fs.target
Before=remote-fs.target shutdown.target
Conflicts=shutdown.target
[Install]
WantedBy=multi-user.target
[Service]
ExecStart=/bin/echo
RemainAfterExit=yes
ExecStop=/usr/bin/umount -f -l -a -t lustre
ExecStop=/usr/sbin/lustre_rmmod
Type=oneshotlustre_rmmod is the recommended method for unloading Lustre and LNet kernel module…
lustre-params.service
lctl used to directly configure Lustre …after the file-system is mounted
Use a oneshot Systemd service unit set Lustre configurations
[Install]
WantedBy=multi-user.target
[Unit]
Description=Configure Lustre Parameters
Documentation=man:lctl(8)
Requires=lustre.mount
After=lustre.mount
[Service]
ExecStart=/usr/sbin/lctl set_param osc.*.max_rpcs_in_flight=64
ExecStart=/usr/sbin/lctl set_param osc.*.max_dirty_mb=32
ExecStart=/usr/sbin/lctl set_param llite.*.statahead_max=128
ExecStart=/usr/sbin/lctl set_param llite.*.statahead_agl=1
ExecStart=/usr/sbin/lctl set_param llite.*.max_read_ahead_mb=128
ExecStart=/usr/sbin/lctl set_param llite.*.max_read_ahead_whole_mb=64
ExecStart=/usr/sbin/lctl set_param llite.*.max_read_ahead_per_file_mb=128
# ....
RemainAfterExit=yes
Type=oneshotlustre-jobstats.service
Lustre can collect I/O statistics correlated to Slurm user…
- …creates overhead …use a dedicated service unit to enable/disable on demand
- Required parameters for lctl…- jobid_var=…name the environment variable set by the scheduler …typically- SLURM_JOB_ID
- jobid_var=disable…disable job stats
 
[Install]
WantedBy=multi-user.target
[Unit]
Description=Enable Lustre Jobstats for SLURM Compute Node
Documentation=man:lctl(8)
Requires=lustre.mount
After=lustre.mount
[Service]
ExecStart=/usr/sbin/lctl set_param jobid_var=SLURM_JOB_ID
ExecStop=/usr/sbin/lctl set_param jobid_var=disable
RemainAfterExit=yes
Type=oneshotlustre-jobstats-proc.service
Track Slurm statistics per process name and user ID…
- …relevant to node where user work interactively (i.e. submit nodes)
- Required parameters for lctl…jobid_var=procname_uid
[Install]
WantedBy=multi-user.target
[Unit]
Description=Enable Lustre Jobstats from /proc
Documentation=man:lctl(8)
Requires=lustre.mount
After=lustre.mount
[Service]
ExecStart=/usr/sbin/lctl set_param jobid_var=procname_uid
ExecStop=/usr/sbin/lctl set_param jobid_var=disable
RemainAfterExit=yes
Type=oneshotConfiguration
lfs monitoring and configuration:
findmnt -t lustre --df                 # list Lustre file-systems with mount point
lfs help                               # list available options
lfs help <option>                      # show option specific information
lfs osts                               # list vailable OSTs
lfs osts | tail -n1 | cut -d: -f1      # number of OSTs
lfs df -h [<path>]                     # storage space per OST
lfs quota -h -u $USER [<path>]         # storage quota for a user
lfs find -print -type f <path>         # find files in a directoryIdentify storage topology:
# get a list of all storage servers
>>> lctl get_param osc.*.ost_conn_uuid | ip2host | cut -d= -f2 | cut -d@ -f1 | cut -d. -f1 | sort | uniq | nodeset -f NS
lxfs[415-419]
# list OSTs per storage server
>>> nodeset-loop "echo -n '{} ' ; lctl get_param osc.*.ost_conn_uuid | ip2host | grep {} | cut -d'-' -f2 | tr '\n' ' '"
lxfs415 OST001c OST001d OST001e OST001f OST0020 OST0021 OST0022
lxfs416 OST0015 OST0016 OST0017 OST0018 OST0019 OST001a OST001b
lxfs417 OST000e OST000f OST0010 OST0011 OST0012 OST0013 OST0014
lxfs418 OST0007 OST0008 OST0009 OST000a OST000b OST000c OST000d
lxfs419 OST0000 OST0001 OST0002 OST0003 OST0004 OST0005 OST0006Striping
Split a file into small sections (stripes) and distribute these for concurrent access to multiple OSTs.
- Advantages:
- The file size can be bigger then the storage capacity of a single OST.
- Enables to utilize the I/O bandwidth of multiple OSTs while accessing a single file.
 
- Disadvantages:
- Placing stripes of a file across multiple OSTs requires a management overhead. (Hence small files should not be striped)
- A higher number of OSTs holding stripes of a file increases the risk to losing access as soon as a single OST is unreachable.
 
lfs getstripe <file|dir>                    # show striping information
lfs setstripe -c <stripe_count> <file|dir>  # configure the stripe count  
lfs setstripe -i 0x<idx> <file|dir>         # target a specific OST- File inherit the striping configuration of their parent directory.
- Stipe Count (default 1)
- By default a single file is stored to a single OST.
- A count of -1stripes across all available OSTs (eventually used for very big files).
 
- Stripe Size (default 1MB)
- Maximum size of the individual stripes.
- Lustre sends data in 1MB chunks → stripe size are recommended to range between 1MB up to 4MB
 
Alignment
Application I/O performance is influenced by choosing the right file size and stripe count.
Correct I/O alignment mitigates the effects of:
- Resource contention on the OST block device.
- Request contention on the OSS hosting multiple OSTs.
General recommendations for stripe alignment:
- Minimize the number of OSTs a process/task must communicate with.
- Ensure that a process/task accesses a file at offsets corresponding to stripe boundaries.
Quotas
Lustre enforces quotas for Linux groups and users:
- Maximum consumable storage per group (0kequals unlimited)
- Maximum number of files per user
Check the quota configuration using the lfs command as root on a node with mounted Lustre:
lfs quota -q -h -g $group /lustre/alice
lfs quota -h -u $user /lustre/aliceI/O
Quantitative description of application IO from the perspective of the file-system:
- The size of data generated
- The number of files generated
- The distribution of file sizes
- The distributions of file IOs (requests sizes, frequency)
- The number of simulations IO accesses (level of concurrency)
IO requests/-sizes:
# enable (reset) client IO statistics
>>> lctl set_param llite.*.extents_stats=1
# ... execute application ...
>>> dd if=/dev/zero of=io1.sink count=1024 bs=1M
>>> dd if=/dev/zero of=io2.sink count=1024 bs=128k
>>> dd if=/dev/zero of=io3.sink count=1024 bs=32k
# read the stats for the client
>>> lctl get_param llite.*.extents_stats
                               read       |                write
      extents            calls    % cum%  |          calls    % cum%
  32K -   64K :              0    0    0  |           1024   33   33
 128K -  256K :              0    0    0  |           1024   33   66
   1M -    2M :              0    0    0  |           1024   33  100
# read stats by process ID
>>> lctl get_param llite.*.extents_stats_per_process
                               read       |                write
      extents            calls    % cum%  |          calls    % cum%
PID: 27280
   1M -    2M :              0    0    0  |           1024  100  100
PID: 27344
 128K -  256K :              0    0    0  |           1024  100  100
PID: 27348
  32K -   64K :              0    0    0  |           1024  100  100RPC statistics:
>>> lctl set_param osc.*.rpc_stats=0 # reset the RPC counters
# monitor IO aggregation by Lustre
>>> lctl get_param osc.*.rpc_stats
                        read                    write
pages per rpc         rpcs   % cum % |       rpcs   % cum %
1024:                    0   0   0   |       1276  99 100Features
DNE (Distributed Namespace)
Distribute file/directorie metadata across multiple MDT…
- …circumvent bottleneck of a single MDT
- …scale metadata load across multiple MDT servers
- …load-balances file/directory metadata operations
- Benefits…
- …improves metadata performance
- …expands the maximum number of files per system
 
Creating directories to point to different DNE targets (Metadata Targets)…
# create a directory targeting MDT index 1
lfs mkdir -i 1 alice/
# similar for MDT index 2
lfs mkdir -i 2 bob/…sub directories and files inherit the MDT target.
DOM (Data on MDT)
Store data of smaller files directly on and MDS…
- …improve small file performance
- …eliminate RPC overhead to OSTs
- …utilizes MDT high-IOPS storage optimized for small IO
- …used in conjunction with the Distributed Namespace (DNE)
- …improve efficiency without sacrificing horizontal scale
- References…
- Data on MDT Solution Architecture, Lustre Wiki
 
PCC (Persistent Client Cache)
PCC (Persistent Client Cache)
- …clients deliver additional performance…
- …using a local storage device (SSD/NVMe) as cache
- …reduce visible overhead for applications
- …for read and write intensive applications (node-local I/O patterns)
- …latencies and lock conflicts can be significantly reduced
 
- …I/O stack is much simpler (no interference I/Os from other clients)
 
- …caching reduces the pressure on (OSTs)…
- …small or random I/Os are regularized to big sequential I/Os directed to OSTs
- …temporary files do not need to be flushed to OSTs
 
- Mechanism based on..
- …combined HSM and layout lock mechanisms
- …single global namespace in two physical tiers…
- …migration of individual files between local and shared storage
- …local file system (such as ext4) used to manage the data on local caches
 
Synchronization between PCC and Lustre not tightly coupled…
- …PCC is not transparent to the user
- …mechanism of lfs {attach,detach}needs to be used properly
- …rmcommand withoutlfs detachloses data in PCC
 
- …mechanism of 
- …disk space in PCC is independent of Lustre quotas
- …file size of PCC cached files is not visible on Lustre
Command line interface (lctl admins, lfs for users):
# ...add a PCC backend to the Lustre clien
lctl pcc add $mount_point $local_path_to_pcc [-p $params]- $mount_point…specified Lustre file-system instance or Lustre mount point
- $local_path_to_pcc…directory path on local file-system for PCC cache
- $params…name-value pairs to configure the PCC back-end
# ...attach the given files onto PCC
lfs pcc attach -i $num $file ...
# ...detach the file from PCC permanently and remove the PCC copy after detach
lfs pcc detach $file
# ...keep the PCC copy in cache
lfs pcc detach -k $file
# ...display the PCC state for given files
lfs pcc state $fileModes
Two modes…
PCC-RW read/write cache on local storage for a single client
- …uses HSM mechanism for data synchronization
- …cache entire files on their local files-systems
- …node is an HSM agent
- ….copy tool instance …with unique archive number
- …restore file from local cache on OSTs
- …triggered by another from another client
 
- …if PCC client goes offline …cached data becomes inaccessible (temporarily)
- Locks ensure that cache is consistent with the global file system state
- …includes a rule-based, configurable caching infrastructure
- …customizing I/O caching
- …provides performance isolation
- …QoS guarantees
 
PCC-RO read only cache on local storage of multiple clients
- …LDML lock to protect file data
- …grouplock prevents modification by any client
- …multiple replicates on different clients
- …data read from local cache
- …metadata read from MDT (with the exception of file size)
References
- Lustre Manual - Chapter 27. Persistent Client Cache (PCC)
- Lustre Persistent Client Cache, Whamcloud
- LU-10092, Whamcloud Jira
- A client side cache that speeds up applications with certain I/O patterns, Li Xi DDN Storage
- LUG 2018 Presentation, OpenSFS Administration Youtube
 
- Slurm burst buffer plugin with Lustre PCC(Persistent Client Cache)
WBC (Writeback Cache)
Client-side metadata writeback cache (instead of server-side)…
- …delayed & grouped metadata flush
- …instead of immediate RPC to MDS
- …no RPC round-trips for modifications of files/directories
 
- …cache in volatile memory (RAM) instead of persistent storage
- …use bulk RPC to flushes metadata of file in batch
- …flush limited to parts of a directory three modified
 
- …can be integrated with Persistent Client Cache (PCC)
Metadata flush happens…
- …when accessed from remote clients
- …to relieve memory pressure on local host
- …periodically to reduce risk of data loss
References…
Footnotes
- Amazon FSX Lustre Client Compatibility 
 https://docs.aws.amazon.com/fsx/latest/LustreGuide/lustre-client-matrix.html↩︎