Database
    🗄️Database

    Amazon Redshift

    Fully managed petabyte-scale data warehouse service

    Redshift is like a massive warehouse optimized for analyzing huge amounts of data. While regular databases (like RDS) are great for transactional workloads (insert this order, update that user), Redshift is built for analytical queries, such as 'show me total sales by region for the last 5 years' or 'which products are most popular among customers aged 25-35?' It stores data in columns instead of rows, which makes aggregations blazingly fast. Think of it as the difference between a retail store (RDS) where you handle individual transactions, and a data analysis center (Redshift) where you crunch numbers on millions of records at once.

    Redshift is a columnar data warehouse that uses massively parallel processing (MPP) to distribute queries across multiple nodes. Data is stored in columns (not rows), compressed, and distributed across compute nodes using distribution keys.

    Key Capabilities

    Key features: Redshift Spectrum (query data in S3 without loading it), Concurrency Scaling (automatically add capacity for concurrent queries), and Materialized Views (pre-compute aggregations). You choose between RA3 (compute and storage scale independently) or DC2 (compute and storage coupled) node types.

    Gotchas & Constraints

    Gotcha #1: Redshift is optimized for analytical queries, not transactional workloads; don't use it as an OLTP database. Gotcha #2: Poor distribution keys cause data skew (some nodes store more data than others), killing performance. Choose high-cardinality keys. Constraints: Multi-AZ is supported for RA3 clusters (99.99% SLA), but DC2 clusters are Single-AZ only, maintenance windows require planning, and VACUUM operations (reclaim space, sort data) can impact performance.

    A retail company analyzes 10 years of sales data (50TB) to identify trends and optimize inventory. Querying this data in RDS PostgreSQL takes hours and maxes out CPU. They migrate to Redshift, loading data from S3 using COPY commands. Redshift distributes data across 10 nodes, and queries that took 2 hours now complete in 30 seconds. They use Redshift Spectrum to query an additional 200TB of historical data in S3 without loading it, combining warehouse and data lake queries. For real-time analytics, they stream data from Kinesis Data Firehose into Redshift every 5 minutes. Business analysts use QuickSight to visualize data, running complex queries (joins across 5 tables, aggregations over billions of rows) that return in seconds. They enable Concurrency Scaling to handle 50 concurrent users during month-end reporting without performance degradation.

    The Result

    100x faster queries, $50,000/month cost savings vs. scaling RDS, and self-service analytics for business users.

    Official AWS Documentation