Databases & Analytics

Lets learn about the different types of databases in the cloud.

RDS Deployments : Read Replicas, Multi-AZ, Multi-Region :

Read Replicas :

  • Scale the read workload of your DB

  • Can Create up to 5 Read Replicas

  • Data is only written to the main DB

=> Is like creating a symbolic link to the main DB.

Multi-AZ :

  • Failover in case of AZ outage (high availability)

  • Data is only read/written to the main database

  • Can only have 1 other AZ as failover

=> Is like creating a hard link so when something or in case of issues with main DB.

Multi-Region ( Read Replicas) :

  • Disaster recovery in case of region issue.

  • Local performance for global reads

  • Replication cost.

=> Something like fault-tolerant.

Databases :

RDS - Relational Database Service :

  • SQL relational database service.

  • It allows you to create databases in the cloud that are managed by AWS : Aurora, MySQL, MariaDB, Oracle, Microsoft SQL Server, Aurora (AWS prop database) => Compatible database

  • Simplifies management, e.g. manages backups software patching, automatic failure detection and recovery, etc.

Aurora :

  • Proprietary technology by AWS.

  • Relational database engine that is fully managed built for the cloud.

  • MySQL and PostgreSQL compatible database.

  • 5x faster than MySQL and 3x faster than PostgreSQL. This performance is on par with commercial databases, at 1/10th the cost.

  • Fault-tolerant and self-healing

  • AWS manages hardware provisioning, software patching, setup, configuration or backups etc...

ElastiCache :

  • In-memory database with very high performance and low latency.

  • Compatible with Redis or Memcached engines.

  • For example, if you're regularly reading an intensive database, it could be helpful to reduce the load with ElastiCache

  • AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recover and backups

DynamoDB :

  • A fully managed, NoSQL key/value database service.

  • Offers encryption for the data.

  • Can scale to massive workloads seamlessly. distributed Serverless (no servers to manage or maintain) databse.

  • Single-digit Very low latency and very high performance.

  • Low cost and auto-scaling capabilities.

  • Integrated with IAM for security, authorization and administration

  • It is NON-RELATIONAL!

DynamoDB Accelerator - DAX :

  • Fully Managed in-memory cache for DynamoDB

  • 10x Performance improvement

  • Secure, Highly scalable & highly available

  • Difference with ElastiCache at the CCP level : DAX is only used for and is integrated with DynamoDB, while ElastiCache can be used for other databases.

DocumentDB :

  • Fully managed, NoSQL Document Database

  • MongoDB compatible

  • Different to DynamoDB as that was a key/value database and this is a documentDB

  • MongoDB is used to store, query and index JSON data

  • Similar " deployment concepts" as Aurora

  • Fully Managed, highly available which replication across 3 AZ.

  • Aurora Storage automatically grows in increments of 10GB, up to 64 TB.

  • Automatically scales to workloads with millions of requests per seconds.

Redshift :

  • Fully managed, PETABYTE-scale data warehousing service.

  • it's an OLAP (Online Analytical Processing) database.

  • Uses columnar storage, instead of row based.

  • Pay as you go based on the instance provisioned.

  • Has a SQL interface per performing the queries

  • BI tools such as AS Quicksight or tableau integrate with it.

Neptune :

  • Fully managed graph database

  • Highly available across 3 AZ, with up to 15 read replicas

  • can store up to billions of relations and query the graph with milliseconds latency

  • Fast, reliable, fully managed graph data base service

  • Great for knowledge graphs "Wikipedia", fraud detection.

Amazon QLDB - Quantum Ledger Database :

  • A ledger is a book recording financial transactions

  • Fully managed, serverless, high available, Replication across 3AZ

  • Used to review history of all the changes made to your application data over time

  • Immutable system => No entry can be removed or modified, cryptographically verifiable

  • 2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL

Amazon Managed Blockchain :

  • Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted central authority

  • AMB is a managed service to join public blockchain networks or create your own scalable private network

  • Compatible with the frameworks Hyperledger Fabric & Ethereum

Analytics :

Amazon EMR - Elastic MapReduce :

  • Helps create Hadoop clusters for data processing, machine learning, big data, etc..

  • Run and scale Apache Spark, Hive, Presto, and other big data frameworks

  • Basically it helps analyze data using Hadoop Clusters

  • Auto-scaling and integrated with spot instances.

Amazon QuickSight :

  • Serverless machine learning-powered business intelligence service to create interactive dashboards

  • Fast, automatically scalable, embeddable, with per-session pricing

  • Use cases : -> Business Analytics -> Building visualizations -> Perform ad-hoc analysis -> Get business insights using data

  • Data integrated with RDS, Aurora, Athena, Redshift, S3..

Athena :

  • Serverless service used to query data in S3 using SQL.

  • Pay per query (only pay for data scanned).

  • Secured through IAM

  • Use Cases : one-time SQL queries, Serverless queries on s3, log analytics.

Amazon DMS - Database Migration Service

  • Quickly and securely migrate databases to AWS, resilient, self healing

  • The source database remains available during the migration

  • Supports -> Homogeneous migrations : ex Oracle to Oracle -> Heterogeneous migrations : ex Microsoft SQL Server to Aurora.

AWS Glue :

  • Managed extract, transform, and load (ETL) service

  • Useful to prepare and transform data for analytics

  • Fully serverless service

  • Glue Data Catalog : catalog of datasets

Databases & Analytics Summary in AWS :

  • Relational Databases -> OLTP : RDS & Aurora "SQL"

  • Differences between Multi-AZ, Read Replicas, Multi-Region

  • In-memory Database -> ElastiCache

  • Key/Value Databse -> DynamoDB "Serverless" & DAX (Cache for dynamoDB)

  • Warehouse - OLAP (Online Analytics Processing) -> Redshift "SQL"

  • Hadoop Cluster -> EMR - Elastic MapReduce

  • Athena -> Query Data on Amazon S3 "Serverless & SQL"

  • QuickSight -> Dashboards on your data "Serverless"

  • DocumentDB -> " Aurora for MongoDB" (JSON - NoSQL database).

  • Amazon QLDB (Quantum Ledger DB) -> Financial Transactions Ledger(immutable journal, cryptographically verifiable)

  • Amazon Managed Blockchain -> managed Hyperledger Fabric & Ethereum blockchains

  • Glue -> Managed ETL (Extract Transform Load) and Data Catalog Service

  • Database Migration -> DMS - Database Migration Service. Support homogenous and heterogenous migration.

  • Neptune -> Graph Database.

Last updated