Amazon Athena Overview

  • Amazon Athena is an interactive query service (serverless) that allows analyzing data in Amazon S3 using SQL
  • Athena used Data Catalogue that stores the S3 DDL
  • Athena is based on Presto – an open-source, distributed SQL query engine optimized for low latency and ad-hoc data analysis with support for Parquet, JSON, Avro, CSV, and ORC
  • Athena executes queries in parallel, enhancing results latency
  • Athena integrates with AWS Glue (ETL), QuickSight (data visualization) and KMS
  • Athena uses pay per query model

Queries & Security

  • Query history – 45 days
  • No support for UDFs, Stored Procs and writes/ inserts
  • S3 based data security controls – bucket policies, ACLs, and IAM
  • Athena has support for querying encrypted data
  • Athena can be accessed via the AWS Management Console, JDBC/ODBC driver or an API

S3 Data Considerations

  • Partition data for better performance and cost optimization (fewer data scans, less memory used etc.)
  • Athena results are stored in S3

Use Cases

  • Great tool for running ad-hoc queries on smaller datasets
  • Running ad-hoc queries on complex data types – struts, maps, and arrays