Analytics & Business Intelligence
    📈Analytics & Business Intelligence

    Amazon Kinesis

    Real-time streaming data collection and processing

    Kinesis is like a fire hose for data streams. Imagine thousands of sensors, mobile apps, or servers all sending data continuously: clickstreams, logs, IoT telemetry. Kinesis captures this data in real-time, buffers it, and lets you process it immediately. It's like having a conveyor belt that never stops: data flows in one end, you process it in the middle, and results come out the other end, all in real-time. Perfect for scenarios where you need to react to data immediately, not hours later after batch processing.

    Kinesis has four services: Kinesis Data Streams (real-time data ingestion), Kinesis Data Firehose (load data to S3/Redshift/Elasticsearch), Kinesis Data Analytics (SQL queries on streams), and Kinesis Video Streams (video ingestion). Data Streams uses shards (throughput units): each shard handles 1MB/sec input, 2MB/sec output. Producers write records to streams, consumers read and process them.

    Key Capabilities

    Key features: retention (24 hours to 365 days), enhanced fan-out (dedicated throughput per consumer), and server-side encryption.

    Gotchas & Constraints

    Gotcha #1: Shard limits can cause throttling; monitor metrics and split shards proactively. Gotcha #2: Consumers must checkpoint progress; if a consumer crashes, it resumes from the last checkpoint. Constraints: Maximum 1MB record size, maximum 1,000 records/second per shard (PutRecords), and resharding takes time (split/merge shards).

    A gaming company tracks player actions in real-time: every click, move, and purchase. They send 100,000 events/second to Kinesis Data Streams (100 shards). Lambda functions consume the stream, calculating real-time leaderboards and detecting cheating patterns. Kinesis Data Firehose loads raw events to S3 for long-term analytics. Kinesis Data Analytics runs SQL queries on the stream: 'count purchases by game level in 5-minute windows', and sends results to DynamoDB for dashboards. When they detect a player exploiting a bug (abnormal score increase), they trigger a Lambda function to flag the account immediately. During a new game launch, traffic spikes 10x; they scale from 100 to 1,000 shards in 30 minutes.

    The Result

    real-time insights, immediate fraud detection, and scalable data ingestion.

    Official AWS Documentation