Skip to content

Deploying to AWS ECS with GitHub Actions

Managing ECS deployments with colocated task definitions, infrastructure as code, and automated CI/CD pipelines

Related Concepts: Continuous Delivery | Related Implementation: Multi-Stage CI/CD Pipeline

Introduction

Deploying containerized applications to AWS ECS presents a common challenge: task definitions change frequently (resource allocations, environment variables, secrets), but managing them effectively requires careful separation of concerns between infrastructure provisioning and application deployment.

The Problem

ECS task definitions contain critical configuration:

  • Container resource allocations (CPU, memory)
  • Environment variables
  • Secrets references (AWS Secrets Manager ARNs)
  • Logging configuration
  • IAM role assignments
  • Port mappings

These configurations change often during development:

  • Increasing memory for a memory-intensive feature
  • Adding new environment variables
  • Integrating with new AWS services (new IAM permissions)
  • Adjusting logging levels

Common Approaches and Their Issues

Approach 1: Terraform Manages Everything

hcl
resource "aws_ecs_task_definition" "app" {
  family = "my-app"

  container_definitions = jsonencode([{
    name   = "my-app"
    image  = "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest"
    memory = 512
    # ... 50+ lines of configuration
  }])
}

Problems:

  • Configuration drift - developers change task definitions via AWS CLI/Console for quick iteration
  • Slow feedback loop - every config change requires terraform apply
  • State lock contention - multiple developers waiting for Terraform
  • Mixing concerns - application config changes treated like infrastructure changes

Approach 2: Manual AWS Console Management

Problems:

  • No version control for task definition changes
  • No audit trail of who changed what
  • Cannot tie configuration changes to code changes
  • Difficult to replicate across environments

Approach 3: Download from AWS in CI/CD (Legacy Pattern)

yaml
- name: Download task definition from AWS
  run: |
    aws ecs describe-task-definition \
      --task-definition my-app \
      --query taskDefinition > task-def.json

Problems:

  • Not true "infrastructure as code"
  • Task definition lives only in AWS
  • Cannot review configuration changes in pull requests
  • Bootstrapping new environments is manual
  • No version history or rollback capability

Note: You may encounter this pattern in legacy systems, but it's not recommended for new projects.

The Solution: Colocated Task Definitions

The pattern we recommend colocates task definitions with service code while maintaining clear separation between infrastructure (Terraform) and application configuration (CI/CD).

Directory Structure

services/
└── my-api/
    ├── src/
    ├── tests/
    ├── Dockerfile
    ├── package.json
    ├── task-definition.json      ← Task definition colocated with service
    └── appspec.yml               ← CodeDeploy configuration (optional)

Why Colocate Task Definitions?

1. Version Control Task definition changes are tracked in Git alongside code changes:

bash
git log task-definition.json
# commit abc123: Increase memory to 1024 for new caching feature
# commit def456: Add REDIS_URL environment variable
# commit ghi789: Add S3 read permissions to task role

2. Atomic Changes Code and configuration change together in a single commit/PR:

diff
// Code change
+ const cache = new Redis(process.env.REDIS_URL);

// Configuration change (same PR)
+ {
+   "name": "REDIS_URL",
+   "value": "redis://cache.example.com:6379"
+ }

3. Developer Ownership Application developers can adjust their own resource allocations without involving infrastructure team or waiting for Terraform applies.

4. Pull Request Reviews Configuration changes are visible and reviewable:

Files changed:
  src/services/user-service.ts
  task-definition.json           ← Reviewers see memory increased

Review comment: "Do we really need 2GB for this service?"

5. Environment Consistency Easy to replicate task definition structure across environments (dev, staging, prod) by maintaining separate files or using templating.

Terraform's Role: Infrastructure, Not Configuration

Terraform manages the infrastructure foundation but deliberately ignores task definition configuration changes.

What Terraform Manages

Terraform provisions and manages the foundational infrastructure:

  1. ECS Cluster - The cluster that hosts all services
  2. ECS Service - Service configuration including desired count, launch type (Fargate/EC2), network configuration, and load balancer attachments
  3. IAM Roles
    • Task Execution Role - Allows ECS to pull images and fetch secrets
    • Task Role - Permissions for the application (S3, SES, etc.)
    • Policies for Secrets Manager access, CloudWatch Logs, and application-specific permissions
  4. Load Balancers and Target Groups - Application Load Balancer (ALB), target groups with health check configuration, and listeners
  5. Security Groups - Network access rules for ECS tasks and load balancers
  6. CloudWatch Log Groups - Centralized logging with retention policies
  7. Initial Task Definition - A placeholder task definition with lifecycle { ignore_changes = [container_definitions] } to prevent Terraform from managing updates

The Critical Lifecycle Block

hcl
lifecycle {
  ignore_changes = [container_definitions]
}

Why this matters:

Without ignore_changes:

  1. Developer updates task-definition.json in repo
  2. CI/CD renders and registers new task definition revision
  3. ECS service updates to new revision
  4. Next terraform plan shows drift:
    # aws_ecs_task_definition.api will be updated in-place
    ~ container_definitions = jsonencode(...)
  5. terraform apply reverts to old configuration
  6. Service breaks

With ignore_changes:

  1. Developer updates task-definition.json in repo
  2. CI/CD renders and registers new task definition revision
  3. ECS service updates to new revision
  4. terraform plan shows no changes
  5. No drift, no problems

What Terraform doesn't manage:

  • Task definition container configurations
  • Environment variables (except initial setup)
  • Resource allocations (CPU/memory tuning)
  • Docker image tags

The Render and Deploy Workflow

The GitHub Actions workflow implements the deployment using the "render pattern" from the AWS ECS GitHub Actions.

Complete Deployment Workflow

yaml
# .github/workflows/deploy_api.yml
name: Deploy API to ECS

on:
  workflow_call:
    inputs:
      image_tag:
        description: 'Docker image tag to deploy'
        required: true
        type: string
      environment:
        description: 'Deployment environment'
        required: false
        type: string
        default: 'production'

jobs:
  deploy:
    name: Deploy to ECS
    runs-on: ubuntu-latest
    environment: ${{ inputs.environment }}

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      # Step 1: Configure AWS credentials
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1
          role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsRole
          role-duration-seconds: 1800

      # Step 2: Login to ECR
      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2

      # Step 3: Render task definition with new image tag
      - name: Render Amazon ECS task definition
        id: render-task-def
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
          task-definition: services/my-api/task-definition.json
          container-name: my-api
          image: ${{ steps.login-ecr.outputs.registry }}/my-api:${{ inputs.image_tag }}

      # Step 4: Register new task definition revision
      - name: Register task definition
        id: register-task-def
        uses: aws-actions/amazon-ecs-deploy-task-definition@v2
        with:
          task-definition: ${{ steps.render-task-def.outputs.task-definition }}

      # Step 5: Run database migrations
      - name: Run database migrations
        run: |
          chmod +x ./scripts/run-migrations.sh
          ./scripts/run-migrations.sh ${{ steps.register-task-def.outputs.task-definition-arn }}

      # Step 6: Deploy to ECS service
      - name: Deploy to Amazon ECS
        uses: aws-actions/amazon-ecs-deploy-task-definition@v2
        with:
          task-definition: ${{ steps.render-task-def.outputs.task-definition }}
          service: my-api
          cluster: production-cluster
          wait-for-service-stability: true
          wait-for-minutes: 10
          # Optional: CodeDeploy configuration
          codedeploy-appspec: services/my-api/appspec.yml
          codedeploy-application: my-api
          codedeploy-deployment-group: my-api-deployment-group

      # Step 7: Verify deployment
      - name: Verify deployment
        run: |
          CURRENT_TASK_DEF=$(aws ecs describe-services \
            --cluster production-cluster \
            --services my-api \
            --query 'services[0].taskDefinition' \
            --output text)

          echo "Service is now running: $CURRENT_TASK_DEF"
          echo "✅ Deployment completed successfully"

Breaking Down the Workflow

Step 3: Render Task Definition

yaml
- name: Render Amazon ECS task definition
  id: render-task-def
  uses: aws-actions/amazon-ecs-render-task-definition@v1
  with:
    task-definition: services/my-api/task-definition.json
    container-name: my-api
    image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-api:abc1234

What happens:

  1. Reads task-definition.json from repo
  2. Finds the container with name: "my-api"
  3. Replaces its image field with the new tag
  4. Outputs rendered task definition as JSON file

Before (in repo):

json
{
  "containerDefinitions": [{
    "name": "my-api",
    "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-api:latest",
    ...
  }]
}

After rendering:

json
{
  "containerDefinitions": [{
    "name": "my-api",
    "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-api:abc1234",
    ...
  }]
}

Step 4: Register Task Definition

yaml
- name: Register task definition
  id: register-task-def
  uses: aws-actions/amazon-ecs-deploy-task-definition@v2
  with:
    task-definition: ${{ steps.render-task-def.outputs.task-definition }}

What happens:

  1. Calls aws ecs register-task-definition with rendered JSON
  2. AWS creates new revision: my-api-production:42
  3. Returns task definition ARN in outputs

AWS Output:

Registered task definition: arn:aws:ecs:us-east-1:123456789012:task-definition/my-api-production:42

Step 6: Deploy to Service

yaml
- name: Deploy to Amazon ECS
  uses: aws-actions/amazon-ecs-deploy-task-definition@v2
  with:
    task-definition: ${{ steps.render-task-def.outputs.task-definition }}
    service: my-api
    cluster: production-cluster
    wait-for-service-stability: true

What happens:

  1. Calls aws ecs update-service with new task definition
  2. ECS begins deployment (rolling or blue-green)
  3. Action waits for service to reach steady state
  4. Times out after 10 minutes if deployment fails

Database Migrations as One-Off Tasks

Running database migrations before deployment ensures new application code doesn't break due to missing schema changes.

Migration Script

bash
#!/bin/bash
# scripts/run-migrations.sh
set -e

if [ -z "$1" ]; then
  echo "Error: Task definition ARN required"
  echo "Usage: $0 <task-definition-arn>"
  exit 1
fi

TASK_DEF_ARN="$1"
CLUSTER="production-cluster"
SERVICE="my-api"
CONTAINER_NAME="my-api"

echo "Running database migrations..."
echo "Task Definition: $TASK_DEF_ARN"

# Get network configuration from running service
echo "Fetching network configuration..."
NETWORK_CONFIG=$(aws ecs describe-services \
  --cluster "$CLUSTER" \
  --services "$SERVICE" \
  --query 'services[0].networkConfiguration.awsvpcConfiguration' \
  --output json)

if [ -z "$NETWORK_CONFIG" ] || [ "$NETWORK_CONFIG" = "null" ]; then
  echo "Error: Failed to get network configuration"
  exit 1
fi

# Run migration as one-off ECS task
echo "Starting migration task..."
TASK_ARN=$(aws ecs run-task \
  --cluster "$CLUSTER" \
  --task-definition "$TASK_DEF_ARN" \
  --launch-type FARGATE \
  --network-configuration "{\"awsvpcConfiguration\":$NETWORK_CONFIG}" \
  --overrides "{\"containerOverrides\":[{\"name\":\"$CONTAINER_NAME\",\"command\":[\"npm\",\"run\",\"migrate\"]}]}" \
  --query 'tasks[0].taskArn' \
  --output text)

if [ -z "$TASK_ARN" ] || [ "$TASK_ARN" = "None" ]; then
  echo "Error: Failed to start migration task"
  exit 1
fi

echo "Migration task started: $TASK_ARN"

# Wait for task to complete
echo "Waiting for migration to complete..."
aws ecs wait tasks-stopped \
  --cluster "$CLUSTER" \
  --tasks "$TASK_ARN"

# Check exit code
EXIT_CODE=$(aws ecs describe-tasks \
  --cluster "$CLUSTER" \
  --tasks "$TASK_ARN" \
  --query 'tasks[0].containers[0].exitCode' \
  --output text)

# Fetch and display logs
TASK_ID=$(basename "$TASK_ARN")
LOG_GROUP="/ecs/production/my-api"
LOG_STREAM="ecs/$CONTAINER_NAME/$TASK_ID"

echo ""
echo "=== Migration Logs ==="
aws logs get-log-events \
  --log-group-name "$LOG_GROUP" \
  --log-stream-name "$LOG_STREAM" \
  --output json 2>/dev/null | \
  jq -r '.events[].message' || \
  echo "Note: Logs not available yet"
echo "======================"
echo ""

if [ "$EXIT_CODE" != "0" ]; then
  echo "❌ Migration failed with exit code $EXIT_CODE"
  exit 1
fi

echo "✅ Database migrations completed successfully"

Why This Pattern Works

1. Uses Same Task Definition The migration task uses the exact same task definition that will be deployed, ensuring:

  • Same Docker image
  • Same environment variables
  • Same secrets (DATABASE_URL)
  • Same IAM permissions
  • Same network configuration

2. Command Override

json
{
  "containerOverrides": [{
    "name": "my-api",
    "command": ["npm", "run", "migrate"]
  }]
}

The normal container command might be ["npm", "start"], but we override it to run migrations instead.

3. Waits for Completion

bash
aws ecs wait tasks-stopped --tasks $TASK_ARN

The deployment doesn't proceed until migrations finish successfully.

4. Logs and Error Handling

  • Fetches CloudWatch logs for debugging
  • Checks exit code
  • Fails deployment if migration fails

Migration Timing

┌─────────────────────────────────────────────────────────────┐
│ Deployment Timeline                                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│ 1. Build Docker image (from Build stage)                   │
│    └─> Image: my-api:abc1234                               │
│                                                             │
│ 2. Render task definition                                  │
│    └─> Task definition registered: my-api-production:42    │
│                                                             │
│ 3. Run migrations (one-off task)                           │
│    ├─> Start: ECS task with command override               │
│    ├─> Run: npm run migrate                                │
│    ├─> Wait: Until task stops                              │
│    └─> Check: Exit code = 0                                │
│        │                                                    │
│        ├─> Success: Continue to deployment                 │
│        └─> Failure: Stop deployment                        │
│                                                             │
│ 4. Deploy to ECS service                                   │
│    ├─> Update service with new task definition             │
│    ├─> ECS starts new tasks                                │
│    ├─> Health checks pass                                  │
│    └─> Traffic shifts to new tasks                         │
│                                                             │
│ 5. Verify deployment                                       │
│    └─> Service running task-definition:42                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Why migrations run before deployment:

  • New code may depend on schema changes (new columns, tables)
  • Running migrations first ensures database is ready
  • If migrations fail, deployment stops (safe failure)

Deployment Strategies

ECS supports two primary deployment strategies that provide zero-downtime deployments with different trade-offs for infrastructure complexity, rollback speed, and operational control.

Blue-Green Deployment with CodeDeploy

Blue-green deployment uses AWS CodeDeploy to create a complete parallel environment (green) while your existing environment (blue) continues serving traffic. The deployment process first starts new tasks in the green environment, waits for them to pass health checks, then shifts all traffic from blue to green instantly. After a waiting period (typically 5 minutes), the blue environment is terminated.

How It Works

The infrastructure requires two load balancer target groups (blue and green), a CodeDeploy application, and a deployment group that manages the traffic shifting. The ECS service is configured with a CodeDeploy deployment controller type rather than the default ECS controller. When a deployment starts, CodeDeploy orchestrates the entire process without manual intervention.

During deployment, new tasks are registered with the green target group while the blue target group continues serving 100% of production traffic. Once all green tasks pass health checks, you can optionally run Lambda hooks for pre-traffic validation (such as synthetic tests or smoke tests). CodeDeploy then shifts traffic from the blue target group to the green target group atomically. After an optional post-traffic validation period and a configurable wait time, the blue tasks are terminated.

If anything fails during deployment—health checks don't pass, Lambda hooks fail, or CloudWatch alarms trigger—CodeDeploy automatically shifts traffic back to the blue environment and terminates the green tasks. This provides instant rollback with zero user impact.

Configuration Requirements

Terraform manages the ECS service configuration with the CodeDeploy deployment controller type, both target groups (blue and green), the load balancer listener (with lifecycle rules to ignore changes since CodeDeploy updates it), the CodeDeploy application and deployment group, and IAM roles for CodeDeploy. The service and listener both need lifecycle ignore_changes blocks because CodeDeploy modifies these resources during deployments.

The deployment group configuration specifies the ECS cluster and service, both target groups, the production traffic route (load balancer listener), and auto-rollback settings. You can configure how long to wait before terminating blue tasks after a successful deployment.

You also need an AppSpec file (typically services/my-api/appspec.yml) that CodeDeploy uses to understand how to deploy the new task definition. This file specifies the task definition placeholder, load balancer information, and optional lifecycle hooks for pre-traffic and post-traffic validation using Lambda functions.

Benefits and Trade-offs

Benefits:

  • Zero-downtime deployments with no mixed versions in production
  • Instant rollback capability (just shift traffic back to blue)
  • Pre-traffic validation with Lambda hooks for running tests before shifting traffic
  • Full parallel environment allows comprehensive testing before exposing to users
  • Automated rollback on CloudWatch alarms (can monitor error rates, latency, etc.)
  • Clear separation between old and new versions during deployment

Trade-offs:

  • More complex infrastructure requiring two target groups and CodeDeploy setup
  • Slower deployments since the full green environment must be healthy before traffic shifts
  • Double resource usage during deployment (running both blue and green)
  • More AWS services to manage and understand (CodeDeploy, Lambda for hooks)
  • Longer GitHub Actions workflow times due to additional validation steps

Rolling Update Deployment (ECS Default)

Rolling update is ECS's default deployment strategy that gradually replaces old tasks with new ones without requiring a parallel environment. This simpler approach uses a single load balancer target group and incrementally updates the service by starting new tasks, waiting for health checks, then terminating old tasks—repeating this cycle until all tasks are running the new version.

How It Works

The deployment process is controlled by two critical parameters: maximum_percent and minimum_healthy_percent. These settings determine how many tasks can run simultaneously during deployment and how many must remain healthy at all times.

For zero-downtime deployments, use maximum_percent of 200 and minimum_healthy_percent of 100. This allows ECS to temporarily run double the desired task count while ensuring full capacity is always maintained. For example, with a desired count of 4 tasks, ECS can run up to 8 tasks during deployment but must keep at least 4 healthy at all times.

During a rolling update, ECS starts a new task and waits for it to pass health checks and register with the load balancer. Once healthy and receiving traffic, ECS terminates one old task. This process repeats until all tasks are running the new version. The load balancer's deregistration delay (typically 30 seconds) ensures in-flight requests complete before tasks are terminated.

Configuration Requirements

Terraform manages the ECS service with the default ECS deployment controller type, the deployment configuration specifying maximum and minimum percentages, a single load balancer target group with health check settings, and a lifecycle ignore_changes block for the task definition. The target group's deregistration_delay setting is critical for graceful shutdowns.

Unlike blue-green deployments, rolling updates require no additional AWS services—just the ECS service, task definition, load balancer, and target group. The simplicity makes this approach easier to understand, troubleshoot, and maintain.

Deployment Parameter Trade-offs

Different maximum and minimum percentage combinations offer different trade-offs. The recommended configuration (200/100) provides zero downtime with reasonable deployment speed by maintaining full capacity while allowing temporary resource doubling.

An aggressive configuration (150/50) deploys faster but reduces capacity to 50% during deployment, potentially causing user-facing performance issues or downtime under load. This might be acceptable for internal services or development environments.

A conservative configuration (100/100) maintains full capacity but provides the slowest possible deployment because ECS must terminate a task before it can start a new one. This is generally not recommended as it combines the worst aspects of both approaches.

Benefits and Trade-offs

Benefits:

  • Zero downtime when properly configured (maximum_percent: 200, minimum_healthy_percent: 100)
  • Simpler infrastructure requiring only one target group
  • Faster deployments without waiting for parallel environment
  • Lower resource usage (only temporary excess during rollout)
  • No CodeDeploy setup or management required
  • Easier to understand and troubleshoot

Trade-offs:

  • Slower rollback requiring a new deployment of the previous version
  • No pre-deployment validation hooks (can't run tests before traffic hits new version)
  • Mixed versions serving traffic during deployment (old and new tasks both active)
  • Less control over traffic shifting (gradual, not instant)
  • No automatic rollback on errors (must manually trigger rollback deployment)

Comparison: Blue-Green vs Rolling Update

AspectBlue-Green (CodeDeploy)Rolling Update (ECS)
DowntimeZeroZero
Rollback SpeedInstant (traffic shift)Slow (new deployment)
InfrastructureComplex (2 TGs, CodeDeploy)Simple (1 TG)
Resource UsageHigh (2x during deploy)Low (1-2x depending on config)
Deployment TimeSlower (parallel env + hooks)Faster (incremental)
Traffic ControlPrecise (all-at-once or gradual)Gradual (task by task)
Validation HooksYes (Lambda)No
Auto RollbackYes (on alarms)No (manual)
Mixed VersionsNeverDuring deployment
Best ForCritical production servicesInternal services, dev/staging

When to Use Each

Use Blue-Green (CodeDeploy) when:

  • ✅ Zero-downtime is critical (customer-facing services)
  • ✅ You need instant rollback capability
  • ✅ You want pre-deployment validation (Lambda hooks)
  • ✅ You can tolerate infrastructure complexity
  • ✅ Budget allows for temporary 2x resource usage

Use Rolling Update when:

  • ✅ Internal or non-critical services
  • ✅ Lower infrastructure complexity preferred
  • ✅ Resource costs are a concern
  • ✅ Dev/staging environments
  • ✅ Acceptable to deploy previous version for rollback

Summary

Deploying to ECS with GitHub Actions using colocated task definitions provides:

1. Clear Separation of Concerns

  • Terraform manages infrastructure (clusters, IAM, networking)
  • Task definitions manage application configuration
  • CI/CD manages deployments

2. Developer Empowerment

  • Developers own their resource allocation
  • Configuration changes reviewed in pull requests
  • Fast iteration without infrastructure team bottleneck

3. Version Control and Audit Trail

  • All configuration tracked in Git
  • Changes tied to code changes
  • Easy rollback (deploy previous Git commit)

4. Flexible Deployment Strategies

  • Blue-Green for zero-downtime critical services
  • Rolling Update for simpler deployments
  • Easy to switch between strategies

5. Safe Database Migrations

  • Migrations run before deployment
  • Same task definition as application
  • Automatic rollback if migration fails

6. Cross-Account Security

  • Build isolated from production
  • Least-privilege IAM roles
  • Clear audit trail

Key Principles:

  • ✅ Infrastructure in Terraform, configuration in repo
  • ✅ Task definitions colocated with service code
  • ✅ Build Once, Deploy Everywhere (from multi-stage pipeline)
  • ✅ Migrations before deployment
  • ✅ Health checks and stability verification
  • ✅ Cross-account security boundaries

This pattern scales from simple single-service deployments to complex multi-account, multi-region production systems while maintaining clarity and developer productivity.

Further Reading

AWS Documentation:

GitHub Actions: