Agent skills
container-apps-gpu-2025

Agent skill

container-apps-gpu-2025

This skill should be used when the user asks about "Container Apps GPU", "serverless GPU", "Container Apps Dapr", "Container Apps scale-to-zero", "Container Apps AI workloads", "Container Apps jobs", "Container Apps Foundry Models", or needs guidance on Azure Container Apps with GPU support and serverless capabilities (2025 GA features).

View SKILL.md on GitHub Repository

Stars 22

Forks 3

Install this agent skill to your Project

npx add-skill https://github.com/JosiahSiegel/claude-plugin-marketplace/tree/main/plugins/azure-master/skills/container-apps-gpu-2025

SKILL.md

Azure Container Apps GPU Support - 2025 Features

Complete knowledge base for Azure Container Apps with GPU support, serverless capabilities, and Dapr integration (2025 GA features).

Overview

Azure Container Apps is a serverless container platform with native GPU support, Dapr integration, and scale-to-zero capabilities for cost-efficient AI/ML workloads.

Key 2025 Features (Build Announcements)

1. Serverless GPU (GA)

Automatic scaling: Scale GPU workloads based on demand
Scale-to-zero: Pay only when GPU is actively used
Per-second billing: Granular cost control
Optimized cold start: Fast initialization for AI models
Reduced operational overhead: No infrastructure management

2. Dedicated GPU (GA)

Consistent performance: Dedicated GPU resources
Simplified AI deployment: Easy model hosting
Long-running workloads: Ideal for training and continuous inference
Multiple GPU types: NVIDIA A100, T4, and more

3. Dynamic Sessions with GPU (Early Access)

Sandboxed execution: Run untrusted AI-generated code
Hyper-V isolation: Enhanced security
GPU-powered Python interpreter: Handle compute-intensive AI workloads
Scale at runtime: Dynamic resource allocation

4. Foundry Models Integration

Deploy AI models directly: During container app creation
Ready-to-use models: Pre-configured inference endpoints
Azure AI Foundry: Seamless integration

5. Workflow with Durable Task Scheduler (Preview)

Long-running workflows: Reliable orchestration
State management: Automatic persistence
Event-driven: Trigger workflows from events

6. Native Azure Functions Support

Functions runtime: Run Azure Functions in Container Apps
Consistent development: Same code, serverless execution
Event triggers: All Functions triggers supported

7. Dapr Integration (GA)

Service discovery: Built-in DNS-based discovery
State management: Distributed state stores
Pub/sub messaging: Reliable messaging patterns
Service invocation: Resilient service-to-service calls
Observability: Integrated tracing and metrics

Creating Container Apps with GPU

Basic Container App with Serverless GPU

bash

# Create Container Apps environment
az containerapp env create \
  --name myenv \
  --resource-group MyRG \
  --location eastus \
  --logs-workspace-id <workspace-id> \
  --logs-workspace-key <workspace-key>

# Create Container App with GPU
az containerapp create \
  --name myapp-gpu \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/ai-model:latest \
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 10 \
  --ingress external \
  --target-port 8080

Production-Ready Container App with GPU

bash

az containerapp create \
  --name myapp-gpu-prod \
  --resource-group MyRG \
  --environment myenv \
  \
  # Container configuration
  --image myregistry.azurecr.io/ai-model:latest \
  --registry-server myregistry.azurecr.io \
  --registry-identity system \
  \
  # Resources
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  \
  # Scaling
  --min-replicas 0 \
  --max-replicas 20 \
  --scale-rule-name http-scaling \
  --scale-rule-type http \
  --scale-rule-http-concurrency 10 \
  \
  # Networking
  --ingress external \
  --target-port 8080 \
  --transport http2 \
  --exposed-port 8080 \
  \
  # Security
  --registry-identity system \
  --env-vars "AZURE_CLIENT_ID=secretref:client-id" \
  \
  # Monitoring
  --dapr-app-id myapp \
  --dapr-app-port 8080 \
  --dapr-app-protocol http \
  --enable-dapr \
  \
  # Identity
  --system-assigned

Container Apps Environment Configuration

Environment with Zone Redundancy

bash

az containerapp env create \
  --name myenv-prod \
  --resource-group MyRG \
  --location eastus \
  --logs-workspace-id <workspace-id> \
  --logs-workspace-key <workspace-key> \
  --zone-redundant true \
  --enable-workload-profiles true

Workload Profiles (Dedicated GPU)

bash

# Create environment with workload profiles
az containerapp env create \
  --name myenv-gpu \
  --resource-group MyRG \
  --location eastus \
  --enable-workload-profiles true

# Add GPU workload profile
az containerapp env workload-profile add \
  --name myenv-gpu \
  --resource-group MyRG \
  --workload-profile-name gpu-profile \
  --workload-profile-type GPU-A100 \
  --min-nodes 0 \
  --max-nodes 10

# Create container app with GPU profile
az containerapp create \
  --name myapp-dedicated-gpu \
  --resource-group MyRG \
  --environment myenv-gpu \
  --workload-profile-name gpu-profile \
  --image myregistry.azurecr.io/training-job:latest \
  --cpu 8 \
  --memory 16Gi \
  --min-replicas 1 \
  --max-replicas 5

GPU Scaling Rules

Custom Prometheus Scaling

bash

az containerapp create \
  --name myapp-gpu-prometheus \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/ai-model:latest \
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 10 \
  --scale-rule-name gpu-utilization \
  --scale-rule-type custom \
  --scale-rule-custom-type prometheus \
  --scale-rule-metadata \
    serverAddress=http://prometheus.monitoring.svc.cluster.local:9090 \
    metricName=gpu_utilization \
    threshold=80 \
    query="avg(nvidia_gpu_utilization{app='myapp'})"

Queue-Based Scaling (Azure Service Bus)

bash

az containerapp create \
  --name myapp-queue-processor \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/batch-processor:latest \
  --cpu 4 \
  --memory 8Gi \
  --gpu-type nvidia-t4 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 50 \
  --scale-rule-name queue-scaling \
  --scale-rule-type azure-servicebus \
  --scale-rule-metadata \
    queueName=ai-jobs \
    namespace=myservicebus \
    messageCount=5 \
  --scale-rule-auth connection=servicebus-connection

Dapr Integration

Enable Dapr on Container App

bash

az containerapp create \
  --name myapp-dapr \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/myapp:latest \
  --enable-dapr \
  --dapr-app-id myapp \
  --dapr-app-port 8080 \
  --dapr-app-protocol http \
  --dapr-http-max-request-size 4 \
  --dapr-http-read-buffer-size 4 \
  --dapr-log-level info \
  --dapr-enable-api-logging true

Dapr State Store (Azure Cosmos DB)

yaml

# Create Dapr component for state store
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: statestore
spec:
  type: state.azure.cosmosdb
  version: v1
  metadata:
    - name: url
      value: "https://mycosmosdb.documents.azure.com:443/"
    - name: masterKey
      secretRef: cosmosdb-key
    - name: database
      value: "mydb"
    - name: collection
      value: "state"

bash

# Create the component
az containerapp env dapr-component set \
  --name myenv \
  --resource-group MyRG \
  --dapr-component-name statestore \
  --yaml component.yaml

Dapr Pub/Sub (Azure Service Bus)

yaml

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: pubsub
spec:
  type: pubsub.azure.servicebus.topics
  version: v1
  metadata:
    - name: connectionString
      secretRef: servicebus-connection
    - name: consumerID
      value: "myapp"

Service-to-Service Invocation

python

# Python example using Dapr SDK
from dapr.clients import DaprClient

with DaprClient() as client:
    # Invoke another service
    response = client.invoke_method(
        app_id='other-service',
        method_name='process',
        data='{"input": "data"}'
    )

    # Save state
    client.save_state(
        store_name='statestore',
        key='mykey',
        value='myvalue'
    )

    # Publish message
    client.publish_event(
        pubsub_name='pubsub',
        topic_name='orders',
        data='{"orderId": "123"}'
    )

AI Model Deployment Patterns

OpenAI-Compatible Endpoint

dockerfile

# Dockerfile for vLLM model serving
FROM vllm/vllm-openai:latest

ENV MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
ENV GPU_MEMORY_UTILIZATION=0.9
ENV MAX_MODEL_LEN=4096

CMD ["--model", "${MODEL_NAME}", \
     "--gpu-memory-utilization", "${GPU_MEMORY_UTILIZATION}", \
     "--max-model-len", "${MAX_MODEL_LEN}", \
     "--port", "8080"]

bash

# Deploy vLLM model
az containerapp create \
  --name llama-inference \
  --resource-group MyRG \
  --environment myenv \
  --image vllm/vllm-openai:latest \
  --cpu 8 \
  --memory 32Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  --min-replicas 1 \
  --max-replicas 5 \
  --target-port 8080 \
  --ingress external \
  --env-vars \
    MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct" \
    GPU_MEMORY_UTILIZATION="0.9" \
    HF_TOKEN=secretref:huggingface-token

Stable Diffusion Image Generation

bash

az containerapp create \
  --name stable-diffusion \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/stable-diffusion:latest \
  --cpu 4 \
  --memory 16Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 1 \
  --min-replicas 0 \
  --max-replicas 10 \
  --target-port 7860 \
  --ingress external \
  --scale-rule-name http-scaling \
  --scale-rule-type http \
  --scale-rule-http-concurrency 1

Batch Processing Job

bash

az containerapp job create \
  --name batch-training-job \
  --resource-group MyRG \
  --environment myenv \
  --trigger-type Manual \
  --image myregistry.azurecr.io/training:latest \
  --cpu 8 \
  --memory 32Gi \
  --gpu-type nvidia-a100 \
  --gpu-count 2 \
  --parallelism 1 \
  --replica-timeout 7200 \
  --replica-retry-limit 3 \
  --env-vars \
    DATASET_URL="https://mystorage.blob.core.windows.net/datasets/train.csv" \
    MODEL_OUTPUT="https://mystorage.blob.core.windows.net/models/" \
    EPOCHS="100"

# Execute job
az containerapp job start \
  --name batch-training-job \
  --resource-group MyRG

Monitoring and Observability

Application Insights Integration

bash

az containerapp create \
  --name myapp-monitored \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/myapp:latest \
  --env-vars \
    APPLICATIONINSIGHTS_CONNECTION_STRING=secretref:appinsights-connection

Query Logs

bash

# Stream logs
az containerapp logs show \
  --name myapp-gpu \
  --resource-group MyRG \
  --follow

# Query with Log Analytics
az monitor log-analytics query \
  --workspace <workspace-id> \
  --analytics-query "ContainerAppConsoleLogs_CL | where ContainerAppName_s == 'myapp-gpu' | take 100"

Metrics and Alerts

bash

# Create metric alert for GPU usage
az monitor metrics alert create \
  --name high-gpu-usage \
  --resource-group MyRG \
  --scopes $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv) \
  --condition "avg Requests > 100" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --action <action-group-id>

Security Best Practices

Managed Identity

bash

# Create with system-assigned identity
az containerapp create \
  --name myapp-identity \
  --resource-group MyRG \
  --environment myenv \
  --system-assigned \
  --image myregistry.azurecr.io/myapp:latest

# Get identity principal ID
IDENTITY_ID=$(az containerapp show -g MyRG -n myapp-identity --query identity.principalId -o tsv)

# Assign role to access Key Vault
az role assignment create \
  --assignee $IDENTITY_ID \
  --role "Key Vault Secrets User" \
  --scope /subscriptions/<sub-id>/resourceGroups/MyRG/providers/Microsoft.KeyVault/vaults/mykeyvault

# Use user-assigned identity
az identity create --name myapp-identity --resource-group MyRG
IDENTITY_RESOURCE_ID=$(az identity show -g MyRG -n myapp-identity --query id -o tsv)

az containerapp create \
  --name myapp-user-identity \
  --resource-group MyRG \
  --environment myenv \
  --user-assigned $IDENTITY_RESOURCE_ID \
  --image myregistry.azurecr.io/myapp:latest

Secret Management

bash

# Add secrets
az containerapp secret set \
  --name myapp-gpu \
  --resource-group MyRG \
  --secrets \
    huggingface-token="<token>" \
    api-key="<key>"

# Reference secrets in environment variables
az containerapp update \
  --name myapp-gpu \
  --resource-group MyRG \
  --set-env-vars \
    HF_TOKEN=secretref:huggingface-token \
    API_KEY=secretref:api-key

Cost Optimization

Scale-to-Zero Configuration

bash

az containerapp create \
  --name myapp-scale-zero \
  --resource-group MyRG \
  --environment myenv \
  --image myregistry.azurecr.io/myapp:latest \
  --min-replicas 0 \
  --max-replicas 10 \
  --scale-rule-name http-scaling \
  --scale-rule-type http \
  --scale-rule-http-concurrency 10

Cost savings: Pay only when requests are being processed. GPU costs are per-second when active.

Right-Sizing Resources

bash

# Start with minimal resources
--cpu 2 --memory 4Gi --gpu-count 1

# Monitor and adjust based on actual usage
az monitor metrics list \
  --resource $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv) \
  --metric "CpuPercentage,MemoryPercentage"

Use Spot/Preemptible GPUs (Future Feature)

When available, configure spot instances for non-critical workloads to save up to 80% on GPU costs.

Troubleshooting

Check Revision Status

bash

az containerapp revision list \
  --name myapp-gpu \
  --resource-group MyRG \
  --output table

View Revision Details

bash

az containerapp revision show \
  --name <revision-name> \
  --app myapp-gpu \
  --resource-group MyRG

Restart Container App

bash

az containerapp update \
  --name myapp-gpu \
  --resource-group MyRG \
  --force-restart

GPU Not Available

If GPU is not provisioning:

Check region availability: Not all regions support GPU
Verify quota: Request quota increase if needed
Check workload profile: Ensure GPU workload profile is created

Best Practices

✓ Use scale-to-zero for intermittent workloads ✓ Implement health probes (liveness and readiness) ✓ Use managed identities for authentication ✓ Store secrets in Azure Key Vault ✓ Enable Dapr for microservices patterns ✓ Configure appropriate scaling rules ✓ Monitor GPU utilization and adjust resources ✓ Use Container Apps jobs for batch processing ✓ Implement retry logic for transient failures ✓ Use Application Insights for observability

References

Azure Container Apps with GPU support provides the ultimate serverless platform for AI/ML workloads!

Maintainer

JosiahSiegel Core maintainer

Source details

Full Name: JosiahSiegel/claude-plugin-marketplace
Branch: main
Path in repo: plugins/azure-master/skills/container-apps-gpu-2025
License: MIT License

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

JosiahSiegel/claude-plugin-marketplace

opentofu-guide

Comprehensive OpenTofu expertise including migration from Terraform, state encryption, OpenTofu 1.10/1.11 features (OCI registry, native S3 locking, ephemeral resources, enabled meta-argument), and CI/CD integration. Covers when to use OpenTofu vs Terraform with decision matrix.

22 3

Explore

JosiahSiegel/claude-plugin-marketplace

terraform-tasks

Specialized Terraform task execution skill for autonomous infrastructure operations. Handles code generation, debugging, version management (1.10-1.14+), security scanning, and architecture design across all providers (AWS 6.0, AzureRM 4.x, GCP) and platforms. Covers ephemeral values, Terraform Stacks, policy-as-code, and 2025 best practices.

22 3

Explore

JosiahSiegel/claude-plugin-marketplace

shellcheck-cicd-2025

ShellCheck validation as non-negotiable 2025 workflow practice

22 3

Explore

JosiahSiegel/claude-plugin-marketplace

bash-master

Expert bash/shell scripting system across ALL platforms. PROACTIVELY activate for: (1) ANY bash/shell script task, (2) System automation, (3) DevOps/CI/CD scripts, (4) Build/deployment automation, (5) Script review/debugging, (6) Converting commands to scripts. Provides: Google Shell Style Guide compliance, ShellCheck validation, cross-platform compatibility (Linux/macOS/Windows/containers), POSIX compliance, security hardening, error handling, performance optimization, testing with BATS, and production-ready patterns. Ensures professional-grade, secure, portable scripts every time.

22 3

Explore

JosiahSiegel/claude-plugin-marketplace

process-substitution-fifos

Process substitution, named pipes (FIFOs), and advanced IPC patterns for efficient bash data streaming (2025)

22 3

Explore

JosiahSiegel/claude-plugin-marketplace

modern-automation-patterns

Modern DevOps and CI/CD automation patterns with containers and cloud (2025)

22 3

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Azure Container Apps GPU Support - 2025 Features

Overview

Key 2025 Features (Build Announcements)

1. Serverless GPU (GA)

2. Dedicated GPU (GA)

3. Dynamic Sessions with GPU (Early Access)

4. Foundry Models Integration

5. Workflow with Durable Task Scheduler (Preview)

6. Native Azure Functions Support

7. Dapr Integration (GA)

Creating Container Apps with GPU

Basic Container App with Serverless GPU

Production-Ready Container App with GPU

Container Apps Environment Configuration

Environment with Zone Redundancy

Workload Profiles (Dedicated GPU)

GPU Scaling Rules

Custom Prometheus Scaling

Queue-Based Scaling (Azure Service Bus)

Dapr Integration

Enable Dapr on Container App

Dapr State Store (Azure Cosmos DB)

Dapr Pub/Sub (Azure Service Bus)

Service-to-Service Invocation

AI Model Deployment Patterns

OpenAI-Compatible Endpoint

Stable Diffusion Image Generation

Batch Processing Job

Monitoring and Observability

Application Insights Integration

Query Logs

Metrics and Alerts

Security Best Practices

Managed Identity

Secret Management

Cost Optimization

Scale-to-Zero Configuration

Right-Sizing Resources

Use Spot/Preemptible GPUs (Future Feature)

Troubleshooting

Check Revision Status

View Revision Details

Restart Container App

GPU Not Available

Best Practices

References

Recommended Agent Skills

opentofu-guide

terraform-tasks

shellcheck-cicd-2025

bash-master

process-substitution-fifos

modern-automation-patterns