FSx for Lustre Flashcards
What is the latency it provides
sub millisecond
How much throughput
100s of GBps
How much IOPS
up to millions
What operating system does it support
POSIX which means that it is part of the underlying standard of UNIX systems like Linux and FreeBSD
What deployment options does it support? What are the differences between them?
Scratch and Persistent.
Scratch file systems are ideal for temporary storage and shorter-term processing of data. Data is not replicated and does not persist if a file server fails.
Persistent file systems are ideal for longer-term storage and throughput-focused workloads. In persistent file systems, data is replicated, and file servers are replaced if they fail.
If you want to minimize storage costs for a shorter-term workload (hours/days) AND is OK re-running their job in the case of a file server failure then Scratch is extremely popular and cost effective option.
What hardware based storage options exist and what are the differences between them?
SSD storage options – For low-latency, IOPS-intensive workloads that typically have small, random file operations, choose one of the SSD storage options.
HDD storage options – For throughput-intensive workloads that typically have large, sequential file operations, choose one of the HDD storage options.
How fast does FSx perform its metadata operations, and how is it able to be so fast?
You can optionally provision a read-only SSD cache that is sized to 20 percent of your HDD storage capacity. This provides sub-millisecond latencies and higher IOPS for frequently accessed files. Both SSD-based and HDD-based file systems are provisioned with SSD-based metadata servers. As a result, all metadata operations, which represent the majority of file system operations, are delivered with sub-millisecond latencies.
What AWS compute options are compatible with it?
EC2, EKS, ECS. The Lustre client is included with Amazon Linux 2 and Amazon Linux. For RHEL, CentOS, and Ubuntu, an AWS Lustre client repository provides clients that are compatible with these operating systems.
What encryption does FSx support?
Encryption at rest and in transit. Amazon FSx automatically encrypts file system data at rest using keys managed in AWS Key Management Service (AWS KMS). Data in transit is also automatically encrypted on file systems in certain AWS Regions when accessed from supported Amazon EC2 instances. Encryption of data at rest is automatically enabled when you create an Amazon FSx for Lustre file system, regardless of the deployment type you use.
Explain how an object from S3 is loaded into FSx
When you create your FSx for Lustre file system, you link it to your S3 data repository. At this point, the objects in your S3 bucket are listed as files and directories on your FSx file system. Amazon FSx then automatically copies the file contents from S3 to your Lustre file system when a file is accessed for the first time on the Amazon FSx file system. After your compute workload runs, or at any time, you can use a data repository task to export changes back to S3.
What happens when a scratch file system fails
Files stored on other servers are still accessible. If clients try to access data that is on the unavailable server or disk, clients experience an immediate I/O error.
What is the availability/durability of a scratch file system
Around 99%. Because larger file systems have more file servers and more disks, the probabilities of failure are increased.
Where is data replicated for persistent file systems
data is automatically replicated within the same Availability Zone in which the file system is located
What happens when a persistent file system fails
it’s replaced automatically within minutes of failure. During that time, client requests for data on that server transparently retry and eventually succeed after the file server is replaced.
What is the persistent 1 deployment type
well-suited for use cases that require longer-term storage, and have throughput-focused workloads that aren’t latency-sensitive.
What is the persistent 2 deployment type
best-suited for use cases that require longer-term storage, and have latency-sensitive workloads that require the highest levels of IOPS and throughput. This option is NOT available in GovCloud.
Is data from S3 automatically imported into my FSx file system
You must turn on automatic import or do it manually each time. When you turn on automatic import for a data repository association, your file system automatically imports file metadata as files are created, modified, and/or deleted in the S3 data repository. Alternatively, you can import metadata for new or changed files and directories using an import data repository task.
Can any S3 object be imported
No. FSx for Lustre imports only S3 objects that have POSIX-compliant object keys.
When we say FSx stores metadata, what does that mean
FSx for Lustre stores POSIX metadata, including ownership, permissions, and timestamps for files, directories, and symbolic links, in S3 objects
Can I create custom metadata
No. FSx for Lustre doesn’t retain any user-defined custom metadata on S3 objects.
Can I create a link between FSx and S3 after I create my file system
Yes. You can create the link when creating the file system or at any time after the file system has been created.
What is a data repository association
A link between a directory on the file system and an S3 bucket or prefix
How many S3 buckets can I create a DRA with
You can configure a maximum of 8 data repository associations on an FSx for Lustre file system
Can I import/export from all S3 buckets at the same time from one FSx
No. A maximum of 8 DRA requests can be queued, but only one request can be worked on at a time for the file system
Can I link an S3 bucket that has encryption enabled
FSx for Lustre supports Amazon S3 buckets that use server-side encryption with S3-managed keys (SSE-S3), and with AWS KMS keys stored in AWS Key Management Service (SSE-KMS)
Does my FSx and S3 have to be in the same account
Automatic export supports cross-Region configurations. The Amazon FSx file system and the linked S3 bucket can be located in the same AWS Region or in different AWS Regions.
Automatic import does not support cross-Region configurations. Both the Amazon FSx file system and the linked S3 bucket must be located in the same AWS Region.
Both automatic export and automatic import support cross-Account configurations. The Amazon FSx file system and the linked S3 bucket can belong to the same AWS account or to different AWS accounts.
How long does it take to deploy FSx for Lustre
Depends on file system capacity, deployment type, number of storage disks, and availability zone. Typically around 10-20 minutes.
What type of import policies can I select?
New, Changed, Deleted, Any Combination, No Policy. The first 3 are recommended
What happens to the local file when a change has been detected in S3?
It overwrites it, even if file is write locked (assuming import policy is set)
Where can I debug FSx
CloudWatch Logs
Can I fully synchronize data from S3 to FSx if my DRA entered a MISCONFIGURED state
No. Import data repository tasks don’t synchronize deletes in your S3 bucket with your FSx for Lustre file system. If you want to fully synchronize S3 with your file system (including deletes), you must re-create your file system.
Can I make the first read and/or write from a file in S3 faster
Yes, you can preload. If you request the preloading of multiple files simultaneously, Amazon FSx loads your files from your Amazon S3 data repository in parallel
Explain a data repository task
Data repository tasks optimize data and metadata transfers between your FSx for Lustre file system and a data repository on S3. One way that they do this is by tracking changes between your Amazon FSx file system and its linked data repository. They also do this by using parallel transfer techniques to transfer data at speeds up to hundreds of GB/s.
There are two types, import and export
How does caching improve performance
Each file server employs a fast, in-memory cache to enhance performance for the most frequently accessed data. When a client accesses data that’s stored in the in-memory or SSD cache, the file server doesn’t need to read it from disk, which reduces latency and increases the total amount of throughput you can drive.
What variables determine performance based on caching or having to read from disk
When you read data that is stored on the file server’s in-memory or SSD cache, file system performance is determined by the network throughput. When you write data to your file system, or when you read data that isn’t stored on the in-memory cache, file system performance is determined by the lower of the network throughput and disk throughput.
What is a fundamental performance consideration for FSx
The throughput that an FSx for Lustre file system supports is proportional to its storage capacity. Regardless of file system size, Amazon FSx for Lustre provides consistent, sub-millisecond latencies for file operations.
How are file and meta data stored
All file data in Lustre is stored on storage volumes called object storage targets (OSTs). All file metadata (including file names, timestamps, permissions, and more) is stored on storage volumes called metadata targets (MDTs). Amazon FSx for Lustre file systems are composed of a single MDT and multiple OSTs. Each OST is approximately 1 to 2 TiB in size, depending on the file system’s deployment type. Amazon FSx for Lustre spreads your file data across the OSTs that make up your file system to balance storage capacity with throughput and IOPS load.
How can I improve file throughput performance
You can stripe at the file level across OST’s
How many OSTs are there by default
5
When does stripe count become important
Striped layout matters most for large files, especially for use cases where files are routinely hundreds of megabytes or more in size.
What is the throughout rate
200 MB/s per TiB of storage
Do I have to modify my application to support FSx encryption
No. In an encrypted file system, data and metadata are automatically encrypted before being written to the file system. Similarly, as data and metadata are read, they are automatically decrypted before being presented to the application.
What type of encryption is used for data at rest
AES-256
What is the difference between Persistent 1 and 2
The main difference between “Persistent 1” and “Persistent 2” lies in their performance and storage capacity. “Persistent 1” is designed for workloads that require moderate throughput and storage capacity. It provides up to 200 MB/s of throughput per TiB of storage and can scale up to 64 TiB per file system.
In contrast, “Persistent 2” is designed for workloads that require higher throughput and storage capacity. It provides up to 1 GB/s of throughput per TiB of storage and can scale up to 120 TiB per file system. “Persistent 2” file systems also offer enhanced durability and data protection features compared to “Persistent 1.”
In summary, the main difference between “Persistent 1” and “Persistent 2” for FSx for Lustre is their performance and storage capacity. “Persistent 2” provides higher throughput and storage capacity than “Persistent 1” and has additional durability and data protection features.
What happens to the KMS keys for scratch file systems
The keys used to encrypt scratch file systems at-rest are unique per file system and destroyed after the file system is deleted. These keys are managed by AWS.
What are the port ranges
988
1018-1023
How is FSx accessed from my VPC
Through an ENI. You access your Amazon FSx file system through its DNS name, which maps to the file system’s network interface. Only resources within the associated VPC, or a peered VPC, can access your file system’s network interface.
How many file systems can I create? Is it unlimited?
These can be increased, but the default is 100
How can I backup my file system
You can use AWS Backups. Backups created using the AWS Backup console have the same level of file system consistency and performance, and the same number of restore options, as backups created through the Amazon FSx console
Throughput for FSx
FSx Lustre is considered network traffic, not EBS, so you get 25 Gbps for shared storage.
How many AZ’s does it run in
One, but there are DR strategies
Does FSX support compression when hydrated from S3
Yes, if data compression is already enabled on FSX. But it will not automatically compress existing uncompressed data within FSX
How are backups taken and how long does it take
FSx for Lustre file system backups are block-based, incremental backups, whether they are generated using the automatic daily backup or the user-initiated backup feature. This means that when you take a backup, Amazon FSx compares the data on your file system to your previous backup at the block level.
The initial backup of a brand new file system with very little data takes minutes to complete.
The initial backup of a brand new file system taken after loading TBs of data takes hours to complete.
A second backup taken of the file system with TBs of data with minimal changes to the block-level data (relatively few creates/modifications) takes seconds to complete.
A third backup of the same file system after a large amount of data has been added and modified takes hours to complete.
What file types might not be conducive for FSx
Many small files. You are doing two hops, one to the metadata servers, and one to the object servers.