AWS Supported Database Types
- Hosted Services
- Customer Installed
- “Bring Your Own License” (BYOL)
- Self-install/ configuration on EC2s
- “Bring Your Own License” (BYOL)
Hosted Services
- AWS Relational Database Services (RDS)
- Amazon Aurora MySQL
- Amazon Aurora PostgreSQL
- Oracle
- MS SQL Server
- MySQL
- PostgreSQL
- MariaDB
Custom Instances — EC2 Based
- Start the instance with the required Operating System (OS)
- Select the Amazon Machine Image (AMI)
- Install the database engine/ service
- From ISO image
- Create databases
- Manage and maintain the database service
NoSQL Databases
- Not based on standard SQL or relational design theory
- Non-tabular model
- E.g. a key-value, column-family, document, and graph databases
- The design supports very fast transactions
- DynamoDB is the AWS NoSQL Solution
Data Warehouse
- A large central repository of consolidated data
- Data can be aggregated from one or more source systems
- Used for Online Analytical Processing (OLAP)
- Redshift is the AWS OLAP/ warehousing solution
Relational Database Terminology
- Rows = Tuples
- Column = Attributes, Properties
- Tables = Entities, Objects, Relations
- Relationships
- Primary Key
- Foreign Key
- Joins
Normalization
- Process for evaluating and correcting data structures
- Determines the best assignment of attributes to entities
- Works thought a series of stages called normal forms
- 1NF –> 2NF –> 3NF –> 4NF (optional)
- Higher the normal form, (closer to 4NF)
- Slower the reads
- Faster the writes
EC2 Based Hosting
Setup | Benefits & Considerations |
1. Launch an instance 2. Install the database service 3. Config ports in security groups 4. Connect to the database | – Complete control – Manual performance management – Manual updates |
AWS Service-Based
Setup | Benefits & Considerations |
1. Launch the database 2. Connect to the database | – Less control – Automatic performance management – Automatic updates |
High-Availability (HA) Solutions
- Increase availability
- Increase recoverability
Clustering | Standby Instances |
– Multiple servers/ instances – One database with replication – Increases availability – Automatic failover – Increased cost | – Multiple servers/ instances – One database with replication – Increases recoverability – No automatic failover – Reduced cost |
Single AZ Deployment | Multi AZ Deployment |
– One instance in one AZ in one region – Reduced cost | – Multiple instances in multiple AZs in one region – Increased availability (replication) – Increased performance – Increased cost |
Scalability Solutions
- Increase or decrease capacity
- Storage
- Processing
- Network operations e.g. throughput
Scaling the Instance | Read Replica |
– Change the instance type/ class – Auto-scaling is not supported in RDS. However, it can be scripted with CLI commands | – A read-only copy of the database – Offload read-only traffic from the main database – Multiple instances can be in different regions to increase the availability |
Database Security
Encryption | Permissions |
– RDS databases support “at rest” encryption (storage encryption/ physical) – Must be encrypted at creation time – Can be enabled on recovery (manually) | – Admin access is based on IAM – Data access is based on database capabilities e.g. CRUD, DB Admin roles etc. |
Amazon Aurora
- Relational database service
- Optimized for Online Transactional Processing (OLTP)
- Very fast data writes
- MySQL compatible database system
- Increased performance over MySQL (5x)
- Has 6 replicas by default with a max of 15 replicas
Scaling Aurora
- Initially 10 GB and scales in 10 GB increments
- Max 64 TB
- Compute resources
- Max 32 CPUs
- Max 244 BiB RAM
Aurora Availability
- Availability defaults
- Two (2) database copies in each AZ
- Min three (3) AZs
- Write capabilities
- Continues with up to two (2) copies lost
- Read capabilities
- Continues with up to three (3) copies lost
Aurora Replicas
- Up to 15 Aurora replicas
- Automatic failover
- Up to 5 MySQL replicas
- No automatic failover
Amazon Redshift
- Amazon’s data warehouse solution
- Optimized for Online Analytical Processing (OLAP)
- AWS Managed Service
- Pricing
- Entry point of $0.25/h or $1,000 per TB/year
- Single node implementation
- 160 GB
- Multiple nodes
- Leader node
- Connections and Queries
- Compute Node
- Stores data and executes queries and calculations
- Leader node
Redshift Speed
- Column based datastore
- Sequential reads
- Very fast reads
- Data compression
- Massively Parallel Processing (MPP)
Redshift Security
- SSL in transit encryption
- AES-256 storage encryption
- Keys managed through AWS Key Management
Redshift Availability
- Operates in one AZ
- Snapshots can be restored in other AZs
Amazon DynamoDB
- NoSQL database service
- Eventual consistency model
- Provides special features
- Milliseconds latency at any scale
- Very fast reads and writes
- Stored on SSD
- spread across 3 distinct data centers
- Milliseconds latency at any scale
- Read consistency types
- Eventual consistency reads
- Seconds delay
- Strongly consistent reads
- Milliseconds delay
- Eventual consistency reads
- You create tables in DynamoDB now databases
- Partition Key = Primary Key
- Items that have different values
- All about small fast transactions
DynamoDB Pricing
- Storage
- $0.25/GB per month
- Throughput
- Writes: billed per hour for every 10 units
- Reads: billed per hour for every 50 units
- Where 1 unit = 1 write per second
- Use AWS Pricing Calculator for more accurate numbers
Other Notes
Multi-AZ — getting failover nodes
Read Replicas — scaling out for performance
RDS Backups — when retention is 0 backups are disabled
The maximum retention for an RDS database is 35 days! By default, AWS backups the database regularly using snapshots. Snapshots can also be created manually. Restoring a database is creating a new instance.
Database monitoring includes real-time monitoring and log review. You can see performance statistics related to databases. It should be used to improve the performance and availability of any database.