Management & Governance
    📊Management & Governance

    AWS Auto Scaling

    Automatically adjust capacity to maintain performance at lowest cost

    Auto Scaling is like having a smart thermostat that adjusts heating and cooling based on the weather and your schedule. For AWS, it automatically adds or removes resources (EC2 instances, ECS tasks, DynamoDB capacity) based on demand. When traffic spikes, Auto Scaling launches more instances. When traffic drops, it terminates extras to save money. You define the rules: 'keep CPU around 50%' or 'scale based on request count', and Auto Scaling handles the rest. It's like having an operations team that never sleeps, constantly adjusting capacity to match demand while minimizing costs.

    Auto Scaling works with multiple services: EC2 Auto Scaling (instances), Application Auto Scaling (ECS, DynamoDB, Aurora, Lambda), and AWS Auto Scaling (unified scaling across services). For EC2, you create Auto Scaling Groups (ASGs) defining min/max/desired capacity, launch templates, and scaling policies. Scaling policies: target tracking (maintain metric at target value), step scaling (scale by steps based on alarm severity), and scheduled scaling (scale at specific times).

    Key Capabilities

    Key features: health checks (replace unhealthy instances), lifecycle hooks (run scripts during launch/termination), and warm pools (pre-initialized instances for faster scaling).

    Gotchas & Constraints

    Gotcha #1: Scaling out is fast (launch instances), but scaling in is slow (wait for cooldown period to avoid thrashing). Gotcha #2: Auto Scaling doesn't fix application issues; if your app crashes under load, scaling won't help. Constraints: Maximum 500 ASGs per region (request increase), ASG max size limited by EC2 service quotas, scaling actions take 3-5 minutes (instance launch time), and cooldown periods delay subsequent scaling actions.

    A news website has unpredictable traffic: normal load is 1,000 requests/second (2 instances), but breaking news can spike to 50,000 requests/second. They create an Auto Scaling Group with min=2, max=100, desired=2 instances behind an ALB. They configure target tracking scaling: maintain ALB RequestCountPerTarget at 1,000 (each instance handles 1,000 requests/second). When a breaking news story hits, traffic spikes to 50,000 requests/second. Auto Scaling detects high request count and launches 48 more instances over 10 minutes, distributing load evenly. When traffic drops back to normal after 2 hours, Auto Scaling gradually terminates extra instances, returning to 2 instances. They also configure scheduled scaling: scale to 10 instances at 8 AM weekdays (anticipating morning traffic), scale down to 2 instances at 10 PM.

    The Result

    website stays responsive during traffic spikes, costs are minimized during low traffic, and no manual intervention required.

    Official AWS Documentation