Agent skill
step-functions
AWS Step Functions workflow orchestration with state machines. Use when designing workflows, implementing error handling, configuring parallel execution, integrating with AWS services, or debugging executions.
Install this agent skill to your Project
npx add-skill https://github.com/itsmostafa/aws-agent-skills/tree/main/skills/step-functions
SKILL.md
AWS Step Functions
AWS Step Functions is a serverless orchestration service that lets you build and run workflows using state machines. Coordinate multiple AWS services into business-critical applications.
Table of Contents
- Core Concepts
- Common Patterns
- CLI Reference
- Best Practices
- Troubleshooting
- References
Core Concepts
Workflow Types
| Type | Description | Pricing |
|---|---|---|
| Standard | Long-running, durable, exactly-once | Per state transition |
| Express | High-volume, short-duration | Per execution (time + memory) |
State Types
| State | Description |
|---|---|
| Task | Execute work (Lambda, API call) |
| Choice | Conditional branching |
| Parallel | Execute branches concurrently |
| Map | Iterate over array |
| Wait | Delay execution |
| Pass | Pass input to output |
| Succeed | End successfully |
| Fail | End with failure |
Amazon States Language (ASL)
JSON-based language for defining state machines.
Common Patterns
Simple Lambda Workflow
{
"Comment": "Process order workflow",
"StartAt": "ValidateOrder",
"States": {
"ValidateOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:ValidateOrder",
"Next": "ProcessPayment"
},
"ProcessPayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:ProcessPayment",
"Next": "FulfillOrder"
},
"FulfillOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:FulfillOrder",
"End": true
}
}
}
Create State Machine
AWS CLI:
aws stepfunctions create-state-machine \
--name OrderWorkflow \
--definition file://workflow.json \
--role-arn arn:aws:iam::123456789012:role/StepFunctionsRole \
--type STANDARD
boto3:
import boto3
import json
sfn = boto3.client('stepfunctions')
definition = {
"Comment": "Order workflow",
"StartAt": "ProcessOrder",
"States": {
"ProcessOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:...",
"End": True
}
}
}
response = sfn.create_state_machine(
name='OrderWorkflow',
definition=json.dumps(definition),
roleArn='arn:aws:iam::123456789012:role/StepFunctionsRole',
type='STANDARD'
)
Start Execution
import boto3
import json
sfn = boto3.client('stepfunctions')
response = sfn.start_execution(
stateMachineArn='arn:aws:states:us-east-1:123456789012:stateMachine:OrderWorkflow',
name='order-12345',
input=json.dumps({
'order_id': '12345',
'customer_id': 'cust-789',
'items': [{'product_id': 'prod-1', 'quantity': 2}]
})
)
execution_arn = response['executionArn']
Choice State (Conditional Logic)
{
"StartAt": "CheckOrderValue",
"States": {
"CheckOrderValue": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.total",
"NumericGreaterThan": 1000,
"Next": "HighValueOrder"
},
{
"Variable": "$.priority",
"StringEquals": "rush",
"Next": "RushOrder"
}
],
"Default": "StandardOrder"
},
"HighValueOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:ProcessHighValue",
"End": true
},
"RushOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:ProcessRush",
"End": true
},
"StandardOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:ProcessStandard",
"End": true
}
}
}
Parallel Execution
{
"StartAt": "ProcessInParallel",
"States": {
"ProcessInParallel": {
"Type": "Parallel",
"Branches": [
{
"StartAt": "UpdateInventory",
"States": {
"UpdateInventory": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:UpdateInventory",
"End": true
}
}
},
{
"StartAt": "SendNotification",
"States": {
"SendNotification": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:SendNotification",
"End": true
}
}
},
{
"StartAt": "UpdateAnalytics",
"States": {
"UpdateAnalytics": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:UpdateAnalytics",
"End": true
}
}
}
],
"Next": "Complete"
},
"Complete": {
"Type": "Succeed"
}
}
}
Map State (Iteration)
{
"StartAt": "ProcessItems",
"States": {
"ProcessItems": {
"Type": "Map",
"ItemsPath": "$.items",
"MaxConcurrency": 10,
"Iterator": {
"StartAt": "ProcessItem",
"States": {
"ProcessItem": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:ProcessItem",
"End": true
}
}
},
"ResultPath": "$.processedItems",
"End": true
}
}
}
Error Handling
{
"StartAt": "ProcessWithRetry",
"States": {
"ProcessWithRetry": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:Process",
"Retry": [
{
"ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException"],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
},
{
"ErrorEquals": ["States.Timeout"],
"IntervalSeconds": 5,
"MaxAttempts": 3,
"BackoffRate": 1.5
}
],
"Catch": [
{
"ErrorEquals": ["CustomError"],
"ResultPath": "$.error",
"Next": "HandleCustomError"
},
{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.error",
"Next": "HandleAllErrors"
}
],
"End": true
},
"HandleCustomError": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:HandleCustom",
"End": true
},
"HandleAllErrors": {
"Type": "Fail",
"Error": "ProcessingFailed",
"Cause": "An error occurred during processing"
}
}
}
CLI Reference
State Machine Management
| Command | Description |
|---|---|
aws stepfunctions create-state-machine |
Create state machine |
aws stepfunctions update-state-machine |
Update definition |
aws stepfunctions delete-state-machine |
Delete state machine |
aws stepfunctions list-state-machines |
List state machines |
aws stepfunctions describe-state-machine |
Get details |
Executions
| Command | Description |
|---|---|
aws stepfunctions start-execution |
Start execution |
aws stepfunctions stop-execution |
Stop execution |
aws stepfunctions describe-execution |
Get execution details |
aws stepfunctions list-executions |
List executions |
aws stepfunctions get-execution-history |
Get execution history |
Best Practices
Design
- Keep states focused — one purpose per state
- Use meaningful state names
- Implement comprehensive error handling
- Use Parallel for independent tasks
- Use Map for batch processing
Performance
- Use Express workflows for high-volume, short tasks
- Set appropriate timeouts
- Limit Map concurrency to avoid throttling
- Use SDK integrations when possible (avoid Lambda wrapper)
Reliability
- Retry transient errors
- Catch and handle specific errors
- Use idempotent operations
- Enable X-Ray tracing
Cost Optimization
- Use Express for short workflows (< 5 minutes)
- Combine related operations to reduce transitions
- Use Wait states instead of Lambda delays
Troubleshooting
Execution Failed
# Get execution history
aws stepfunctions get-execution-history \
--execution-arn arn:aws:states:us-east-1:123456789012:execution:MyWorkflow:exec-123 \
--query 'events[?type==`TaskFailed` || type==`ExecutionFailed`]'
Lambda Timeout
Causes:
- Lambda running too long
- Task timeout too short
Fix:
{
"Type": "Task",
"Resource": "arn:aws:lambda:...",
"TimeoutSeconds": 300,
"HeartbeatSeconds": 60
}
State Stuck
Check:
- Task state waiting for callback
- Wait state not yet elapsed
- Activity worker not responding
Invalid State Machine
# Validate definition
aws stepfunctions validate-state-machine-definition \
--definition file://workflow.json
References
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
sqs
AWS SQS message queue service for decoupled architectures. Use when creating queues, configuring dead-letter queues, managing visibility timeouts, implementing FIFO ordering, or integrating with Lambda.
eks
AWS EKS Kubernetes management for clusters, node groups, and workloads. Use when creating clusters, configuring IRSA, managing node groups, deploying applications, or integrating with AWS services.
cloudwatch
AWS CloudWatch monitoring for logs, metrics, alarms, and dashboards. Use when setting up monitoring, creating alarms, querying logs with Insights, configuring metric filters, building dashboards, or troubleshooting application issues.
ec2
AWS EC2 virtual machine management for instances, AMIs, and networking. Use when launching instances, configuring security groups, managing key pairs, troubleshooting connectivity, or automating instance lifecycle.
cloudformation
AWS CloudFormation infrastructure as code for stack management. Use when writing templates, deploying stacks, managing drift, troubleshooting deployments, or organizing infrastructure with nested stacks.
sns
AWS SNS notification service for pub/sub messaging. Use when creating topics, managing subscriptions, configuring message filtering, sending notifications, or setting up mobile push.
Didn't find tool you were looking for?