Selecting the Storage Service
Service | Description |
Simple Storage Service (S3) | First service offered by Amazon; Object-level storage; Distributed across three (3) Availability Zones (AZs) by default. |
Glacier | Archiving data storage; meant for infrequent access; The cheapest storage option in AWS. |
CloudFront | Getting data closer to the user; caching data at the edge area close to the customer. |
Elastic Block Storage (EBS) | The best storage solution for (EBS optimized) EC2 instances; high-speed persistent block-level access. |
Elastic File System (EFS) | Shareable by multiple instances; NAS like; accessed through NFSv4 protocol; not supported on Windows instances. |
Amazon FSx | A file system for the cloud; Supports two file system types: Windows File Server and Lustre (high performance file system). |
Storage Gateway | An appliance (software) that acts as a VPN connection to AWS. |
Snow Family | Migration related; three products depending on the required capacity. |
Databases | Another category of services; Relational, Non-relational, Warehouse options. |
Block Storage | Object Storage |
– Used on local networks all the time, with iSCSI, Fiber Channel – AWS can use block storage with virtual machines/ EC2 instances within the AWS cloud using EBS – Elastic Block Storage (EBS) | – Similar to file storage/ chunks of data; object = file – Used with NAS devices locally – Simple Storage Service (S3) |
iSCSI – Internet Small Computer Systems Interface
NAS – Network-attached storage
Selecting Storage Consideration Factors
- Size/ Capacity – how big are the objects; how much total space
- Performance – how quickly can I get objects
- Cost – services and their subclasses vary in price
S3 Storage Overview
- Object storage, where the object is a file or any chunk of data
- Objects are stored in buckets
- Distributed across at least three availability zones by default
- Except for 1A – one zone, lest expensive option
- Support for encryption and auto data classification
- Auto data classification can be used to move inactive objects automatically to the Glacier etc.
- Big data analytics can run directly against stored data
- Many other services can store data in S3
Getting Data into S3
- Application Programming Interface (API)/ custom applications
- Amazon Direct Connect (VPN to AWS)
- Storage Gateway – data stored locally + replicated
- Kinesis Data Firehose – large amounts of analytical data
- Transfer Acceleration – used with CloudFront for the optimized route to S3 buckets; increased cost;
- Snow Family
- Snowball – 50/80TB; Petabyte-scale
- Snowball Edge – 100TB local storage on the edge location;
- Snowmobile – a large trailer with a cap of 100PB per truck! Exabytes-scale
S3 Concepts
Buckets | – The container where objects are put in – In theory, there’s unlimited storage – There are 100 by instance by default |
Regions | – Buckets are in regions to cache objects near users |
Objects | – A file/ a chunk of data |
Keys | – Logical names of the objects |
Object URLs | – Uniform Resource Locator since everything is on the Internet |
Eventual Consistency | – As in latency in sync across AZs – Objects in S3 buckets have eventual consistency – Objects in EBS are consistent; |
S3 works great as a static webserver/ static website host; You may want DNS redirect for a friendly URL.
Common S3 Operations
- Creating and deleting buckets
- Managing: writing, reading, and deleting objects
- Listing keys in buckets
Representational State Transfer (REST)
- How apps code communicates with AWS;
- S3 API is RESTful
- It maps HTTP methods to CRUD operations
- Create (C) uses PUT or POST
- Read (R) uses GET
- Update (U) uses POST or PUT
- Delete (D) uses DELETE
S3 Features
Prefixes and Delimiters | – Since there are no folders, prefixes and delimiters are used to mimic the hierarchical (folder/ tree-like) structure |
Storage Classes | – S3 Standard – $$$$ – S3 Infrequent Access (IA) — $$$ – S3 Reduced Redundancy Storage (RRS) — $$ – Glacier — $ |
Encryption | – Server-Side Encryption (SSE) at rest after upload – Storage, not Transfer/ Transit security – Client-Side Encryption when data is accessed – AES-256 (default), AWS-KMS (self-managed keys) |
Versioning | – Multiple version of an object |
Multi-Factor Authentication (MFA) | – MFA Delete; delete will only be allowed with MFA |
Multi-part Upload | – Batch like upload; multiple streams at once |
Range GET | – Get a range on an object, e.g. large file, get data from 0 to 50kb |
Cross-Region Replication | – Only applies to new objects when turned on |
Logging | – Audit for the bucket |
Event Notifications | – Admins can receive event notifications based on activities performed against S3 objects |
Object Lifecycle Management help to define storage class behaviours/intelligent tiering, e.g. moving data into Glacier based its age. The lifecycle of objects can be created based on tags and/ or prefixes.
Objects can be configured as WORM – Write Once Read Many; Batch operations/ jobs can be configured for automatic processing of S3 objects;
When creating S3 buckets, they must be globally unique in terms of name/ FQDN. Creating buckets can be done via API or the Management Console. Objects can be controlled with security parameters to control access.
Object properties include storage classes, encryption, metadata and tags. Metadata is used to define the purpose of the object, while tags are used to search, organize and manage access. The minimum object size in the S3 bucket is 0 bytes (empty file). The minimum billable size in the S3 bucket is 128kb. You can add AES-256 encryption to any object. An object can have up to 10 tags.
JSON Policies – the old way to configure permissions on object-level; AWS policy generator;
CORS – Cross-Origin Resource Sharing;
FQDN – Fully Qualified Domain Name; E.g. mycompany[host].portal[subdomain].mycompany.com[domain]
Managing Objects Within the Bucket
- Bucket Level
- All objects in the bucket
- Object Level
- Individual objects
Glacier Overview
- Archival data storage
- Fractions of a penny per GB/month
- Three access methods
- Expedited — 3-5min
- Standard — 3-5h
- Bulk — 5-12h
- You define the region for the data storage
- Data is stored with AES-256 bit encryption
- The Glacier can be integrated with S3
- S3 data can be automatically moved into Glacier using lifecycle manager (intelligent tiering);
- Snow divides can be used to import data
- Storage Gateway can connect to the Glacier
Glacier Concepts
Archives | – Objects in vaults |
Vaults | – Archive containers/ buckets |
Vault Locks | – Security of the vault |
Data Retrieval | – Up to 5% retrieved at no charge, no rollover – Vault can be configured to limit cost |
A single AWS account can create up to 1,000 vaults per region. Only empty vaults can be deleted. Glacier supports multi-part uploads of archives, so a large file is not required to be uploaded in a single action.
Tape Gateway is a sub-component of Storage Gateway and can be used with systems that only allow for backups on tapes. There are three (3) types of Storage Gateways, i.e., file, volume, and tape. Tape Gateway can be configured as public or VPC (private) and are stored in Glacier Deep Archives.
VLT – Virtual Tape Library; a library of backups “tapes” that are just objects in S3 buckets;
Elastic Block Storage (EBS) Overview
- Used for persistent/ durable storage in EC2 instance-bound drives
- Similar to the local hard drive; writing at the block level
- Durable = data exists when storage is brought online
- EC2 instances boot from EBS
- Block-level storage from one AWS service to another
- Can be created and attached during the creation of EC2 instances
EBS Volume Types
- Magnetic (HDD)
- Lowest cost; the slowest option
- Three main volume classes
- Cold HDD – massive and very slow
- Magnetic Standard – middle/ available in the free tier
- Throughput Optimized – large and fast
- Solid-State (SSD)
- Higher cost; significantly faster than magnetic
- Two main volume classes
- Generic Purpose
- For <10k IOPS
- Provisioned IOPS
- For 10k+ IOPS
- Generic Purpose
- EBS optimized instances should be used with the SSD option. Otherwise, performance will not be guaranteed
- Volume limit 1GB-1TB (1024GB)
- EBS should be in the same AZ as EC2 instance it’s attached to
- Can be created from a snapshot
IOPS – Input/ Output Operations per Second
Protecting EBS Data
- Snapshots
- Volume recovery
- Attaching volumes from one instance to another; Only available within the same AZ
- Can be encrypted/ drive
- Various encryption methods
Elastic File System (EFS) Overview
- Similar to NAS drives; shareable storage
- Multiple instances accessing at the same time
- Hierarchical in nature; as in folder hierarchy
- Can be accessed through NFSv4
- While EBS is bound to an instance, EFS is free to be accessed by many different instances
- EC2 instances can use EFS shares
- EFS is not supported on Windows instances
- Can be limited to a Virtual Private Cloud (VPC)
- Can be tagged for better management and location
PrivateLink endpoint can be set up for EFS, allowing for a secure connection between VPCs, Services, and apps in AWS. Since endpoints are public there’s a cost to it (static IP).
Integrating On-premises Storage
Storage Gateway
- On-premises Software Appliance (VM) that creates the gateway
- Provides three (3) types of storage solutions
- File-based: uses the Network File System (NFS);
- E.g. objects to S3 buckets;
- Volume-based: uses iSCASI protocol over Ip;
- E.g. storage volumes on AWS accessible withing VPC
- Tape-based;
- E.g. tape gateway – virtual tapes for backups
- File-based: uses the Network File System (NFS);
The Storage Gateway connects on-premises software appliances with cloud-based storage; Storage can be file-based, volume-based, or tape-based. The file gateway provides an instance to S3 buckets.
ARN – Amazon Resource Name;
Storage Access Security
- S3 storage can be configured with AWS JSON Policy — S3 bucket policy
- Permissions on resources using JSON Policies
- Storage security can be managed in the AWS Management Console and the AWS Command Line Interface (CLI)
- EBS volumes are primarily managed within EC2 instances, just like hard drives in local servers
- JSON can be used to implement storage access policies
Storage Performance
Storage performance management is about selecting the right type and class of storage. A gibibyte is not the same as a gigabyte. [2^30 vs 2^9]
IF > 10,000 | Provisioned IOPS SSD |
IF <= 10,000 | General Purpose SSD |
IF = 500 | Throughput Optimized HDD |
IF = 250 | Cold HDD |
Binary vs. Decimal Measurements
Decimal Name | Decimal Abbr. | Decimal Power | Decimal Value | Binary Name | Binary Abbr. | Binary Power | Binary Value |
Kilobyte | kB | 10^3 | 1,000 | Kibibyte | kiB | 2^10 | 1,024 |
Megabyte | MB | 10^6 | 1,000,000 | Mebibyte | MiB | 2^20 | 1,048,576 |
Gigabyte | GB | 10^9 | 1,000,000,000 | Gibibyte | GiB | 2^30 | 1,073,741,824 |
Terabyte | TB | 10^12 | 1,000,000,000,000 | Tebibyte | TiB | 2^40 | 1,099,511,627,776 |