⚡Compute

AWS Batch

Fully managed batch processing at any scale

Imagine you have 10,000 photos that need to be resized, or a million data files that need processing. You could do them one at a time, but that would take forever. AWS Batch is like having a factory assembly line that automatically figures out how many workers you need, assigns tasks to them, and scales up or down based on the workload. You just submit your jobs, and Batch handles everything: provisioning servers, queuing jobs, retrying failures, and shutting down resources when done. It's perfect for any task that can be broken into independent chunks and processed in parallel.

AWS Batch orchestrates batch computing workloads by managing job queues, compute environments, and job definitions. You define a job (container image with command), submit it to a queue, and Batch provisions EC2 or Fargate compute resources to run it. Batch supports job dependencies (Job B runs after Job A completes), array jobs (run the same job 1,000 times with different parameters), and multi-node parallel jobs (MPI workloads).

Key Capabilities

Key configurations: compute environment type (EC2, Spot, Fargate), instance types, and job priorities.

Gotchas & Constraints

Gotcha #1: Batch uses Spot instances by default for cost savings, but Spot interruptions can cause job failures; use retry strategies and checkpointing for long-running jobs. Gotcha #2: Job definitions must specify resource requirements (vCPU, memory); under-specifying causes OOM errors, over-specifying wastes money. Constraints: Jobs are stateless; use S3 or EFS for input/output data. Maximum job runtime is 14 days.

A genomics research lab processes DNA sequencing data; each sample requires 4 hours of compute and 32GB RAM. They have 500 samples per week, requiring 2,000 compute hours. Running this on dedicated EC2 instances would cost $5,000/month. They use AWS Batch with Spot instances, defining a job that reads sample data from S3, runs the analysis, and writes results back to S3. Batch automatically provisions Spot instances (70% cheaper than On-Demand), runs jobs in parallel (up to 50 concurrent jobs), and handles Spot interruptions by retrying failed jobs. When samples arrive sporadically, Batch scales to zero when idle.

The Result

$1,500/month cost, zero infrastructure management, and faster processing through parallelization.

Official AWS Documentation