Application Integration
    🔗Application Integration

    AWS Step Functions

    Orchestration service for coordinating distributed applications

    Step Functions is like a conductor for an orchestra; it coordinates multiple services to work together in a specific sequence. Imagine a workflow: upload file to S3, trigger Lambda to process it, store results in DynamoDB, send notification via SNS, and if anything fails, retry or send an error alert. Step Functions orchestrates this entire workflow, handling retries, error handling, and state management. You define the workflow as a state machine (visual diagram), and Step Functions executes it, tracking progress and handling failures. It's perfect for complex, multi-step processes that involve multiple AWS services.

    Step Functions executes state machines defined in Amazon States Language (JSON). States include Task (invoke Lambda, ECS, Batch, etc.), Choice (branching logic), Parallel (execute branches concurrently), Wait (delay), Succeed/Fail (terminal states), and Map (iterate over array). Step Functions offers two workflow types: Standard (long-running, exactly-once execution, full execution history) and Express (high-volume, at-least-once execution, limited history).

    Key Capabilities

    Key features: error handling (retry, catch), input/output processing (transform data between states), and service integrations (200+ AWS services).

    Gotchas & Constraints

    Gotcha #1: Step Functions charges per state transition; complex workflows with many states can be expensive. Gotcha #2: Standard workflows have 1-year maximum execution time; Express workflows limited to 5 minutes. Constraints: Maximum 25,000 events in execution history (Standard), maximum 256KB input/output per state, and maximum 25 parallel branches.

    A video processing pipeline: user uploads video to S3, transcode to multiple resolutions, generate thumbnails, extract metadata, update database, and send notification. Previously, they used Lambda with SQS; complex error handling, hard to track progress, and 15-minute Lambda timeout was problematic for large videos. They implement Step Functions: S3 upload triggers state machine. First state invokes Lambda to validate video. Second state runs parallel branches: one invokes MediaConvert for transcoding (30 minutes), another invokes Lambda for thumbnail generation. Third state waits for both to complete, then invokes Lambda to extract metadata and update DynamoDB. Final state sends SNS notification. If transcoding fails, Step Functions retries 3 times with exponential backoff. If it still fails, it sends an error notification and marks the workflow as failed. They monitor all executions in Step Functions console: see which step failed, view input/output of each state, and replay failed executions.

    The Result

    reliable video processing, easy error handling, and full visibility into workflow execution.

    Official AWS Documentation