Amazon Athena Overview
- Amazon Athena is an interactive query service (serverless) that allows analyzing data in Amazon S3 using SQL
- Athena used Data Catalogue that stores the S3 DDL
- Athena is based on Presto – an open-source, distributed SQL query engine optimized for low latency and ad-hoc data analysis with support for Parquet, JSON, Avro, CSV, and ORC
- Athena executes queries in parallel, enhancing results latency
- Athena integrates with AWS Glue (ETL), QuickSight (data visualization) and KMS
- Athena uses pay per query model
Queries & Security
- Query history – 45 days
- No support for UDFs, Stored Procs and writes/ inserts
- S3 based data security controls – bucket policies, ACLs, and IAM
- Athena has support for querying encrypted data
- Athena can be accessed via the AWS Management Console, JDBC/ODBC driver or an API
S3 Data Considerations
- Partition data for better performance and cost optimization (fewer data scans, less memory used etc.)
- Athena results are stored in S3
Use Cases
- Great tool for running ad-hoc queries on smaller datasets
- Running ad-hoc queries on complex data types – struts, maps, and arrays