📊Management & Governance

AWS Auto Scaling

Automatically adjust capacity to maintain performance at lowest cost

Auto Scaling is like having a smart thermostat that adjusts heating and cooling based on the weather and your schedule. For AWS, it automatically adds or removes resources (EC2 instances, ECS tasks, DynamoDB capacity) based on demand. When traffic spikes, Auto Scaling launches more instances. When traffic drops, it terminates extras to save money. You define the rules: 'keep CPU around 50%' or 'scale based on request count', and Auto Scaling handles the rest. It's like having an operations team that never sleeps, constantly adjusting capacity to match demand while minimizing costs.

Auto Scaling works with multiple services: EC2 Auto Scaling (instances), Application Auto Scaling (ECS, DynamoDB, Aurora, Lambda), and AWS Auto Scaling (unified scaling across services). For EC2, you create Auto Scaling Groups (ASGs) defining min/max/desired capacity, launch templates, and scaling policies. Scaling policies: target tracking (maintain metric at target value), step scaling (scale by steps based on alarm severity), and scheduled scaling (scale at specific times).

Key Capabilities

Manages EC2 instance counts in Auto Scaling Groups (ASGs) within defined minimum, maximum, and desired capacity bounds
Target Tracking scaling maintains a specific metric value (such as 50% CPU utilization) by automatically adding or removing instances
Step Scaling and Scheduled Scaling handle threshold-based responses and pre-planned capacity changes for known traffic patterns
Predictive Scaling uses ML-based forecasting to provision capacity ahead of anticipated load rather than reacting after the fact
Lifecycle hooks pause instance launch or termination so you can run bootstrap or cleanup scripts before the instance joins or leaves service
Mixed instances policy combines On-Demand and Spot instances in a single ASG; warm pools keep pre-initialized instances ready to reduce scale-out latency

Gotchas & Constraints

Gotcha #1: Scaling out is fast (launch instances), but scaling in is slow (wait for cooldown period to avoid thrashing). Gotcha #2: Auto Scaling doesn't fix application issues; if your app crashes under load, scaling won't help. Constraints: Maximum 500 ASGs per region (request increase), ASG max size limited by EC2 service quotas, scaling actions take 3-5 minutes (instance launch time), and cooldown periods delay subsequent scaling actions.

A news website has unpredictable traffic: normal load is 1,000 requests/second (2 instances), but breaking news can spike to 50,000 requests/second. They create an Auto Scaling Group with min=2, max=100, desired=2 instances behind an ALB. They configure target tracking scaling: maintain ALB RequestCountPerTarget at 1,000 (each instance handles 1,000 requests/second). When a breaking news story hits, traffic spikes to 50,000 requests/second. Auto Scaling detects high request count and launches 48 more instances over 10 minutes, distributing load evenly. When traffic drops back to normal after 2 hours, Auto Scaling gradually terminates extra instances, returning to 2 instances. They also configure scheduled scaling: scale to 10 instances at 8 AM weekdays (anticipating morning traffic), scale down to 2 instances at 10 PM.

The Result

website stays responsive during traffic spikes, costs are minimized during low traffic, and no manual intervention required.

Official AWS Documentation