Terraform IaC Expert

Expert in Terraform and Infrastructure as Code for deploying AI infrastructure across multi-cloud environments.

Project Structure

infrastructure/
├── modules/
│   ├── aws-bedrock/
│   ├── azure-openai/
│   ├── oci-genai/
│   └── vector-store/
├── environments/
│   ├── dev/
│   ├── staging/
│   └── prod/
├── shared/
│   ├── networking/
│   └── security/
└── scripts/

Module Overview

Module	Provider	Purpose
`aws-bedrock`	AWS	Bedrock, Knowledge Bases, VPC Endpoints
`azure-openai`	Azure	Azure OpenAI, AI Search, Private Endpoints
`oci-genai`	OCI	DACs, Endpoints, Agents, Knowledge Bases
`vector-store`	Multi	OpenSearch Serverless, Qdrant, Milvus

Full module code: resources/modules.tf

AWS AI Infrastructure

Bedrock Module

hcl

module "aws_ai" {
  source = "./modules/aws-bedrock"

  prefix             = "prod"
  region             = "us-east-1"
  vpc_id             = data.aws_vpc.main.id
  private_subnet_ids = data.aws_subnets.private.ids

  enable_private_endpoint = true
  create_knowledge_base   = true
}

Key Resources

IAM roles for Bedrock access
VPC endpoints for private connectivity
Knowledge bases with OpenSearch

Azure AI Infrastructure

Azure OpenAI Module

hcl

module "azure_ai" {
  source = "./modules/azure-openai"

  openai_name         = "prod-openai"
  location            = "eastus"
  resource_group_name = azurerm_resource_group.ai.name

  gpt4o_capacity      = 100  # TPM
  embedding_capacity  = 50

  enable_private_endpoint = true
}

Key Resources

Cognitive Account (OpenAI kind)
Model deployments (GPT-4o, embeddings)
Private endpoints
Azure AI Search

OCI AI Infrastructure

GenAI Module

hcl

module "oci_ai" {
  source = "./modules/oci-genai"

  prefix         = "prod"
  compartment_id = var.oci_compartment_id
  cluster_type   = "HOSTING"
  unit_count     = 10
  unit_shape     = "LARGE_COHERE"

  create_agent          = true
  create_knowledge_base = true
}

Key Resources

Dedicated AI Clusters (DAC)
Model endpoints
GenAI Agents
Knowledge bases

Vector Store Infrastructure

OpenSearch Serverless (AWS)

hcl

module "vectors" {
  source = "./modules/vector-store/aws-opensearch"

  prefix           = "prod"
  vpc_endpoint_ids = [aws_vpc_endpoint.opensearch.id]
  allowed_principals = [aws_iam_role.bedrock.arn]
}

Multi-Cloud Environment

hcl

# environments/prod/main.tf

terraform {
  backend "s3" {
    bucket = "terraform-state-ai-infra"
    key    = "prod/terraform.tfstate"
    encrypt = true
  }
}

provider "aws" { region = var.aws_region }
provider "azurerm" { features {} }
provider "oci" { ... }

# Deploy to all clouds
module "aws_ai" { source = "../../modules/aws-bedrock" ... }
module "azure_ai" { source = "../../modules/azure-openai" ... }
module "oci_ai" { source = "../../modules/oci-genai" ... }

Best Practices

State Management

Remote state (S3, Azure Blob, OCI Object Storage)
State locking (DynamoDB, Cosmos DB)
Encrypt state at rest
Separate state per environment

Variable Validation

hcl

variable "environment" {
  type = string
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod."
  }
}

Sensitive Data

Use sensitive = true for outputs
Reference secrets from secret managers
Never commit .tfvars with secrets

Tagging

hcl

default_tags {
  tags = {
    Environment = var.environment
    Project     = "ai-platform"
    ManagedBy   = "terraform"
  }
}

CI/CD Integration

GitHub Actions

yaml

- uses: hashicorp/setup-terraform@v3
- run: terraform init
- run: terraform plan -out=tfplan
- run: terraform apply -auto-approve tfplan
  if: github.ref == 'refs/heads/main'

Key Patterns

Plan on PR, apply on merge
Use workspaces or directories for environments
Lock state during apply
Store plan artifacts

Managed Kubernetes GPU

EKS GPU Nodes

hcl

eks_managed_node_groups = {
  gpu = {
    instance_types = ["g5.2xlarge"]
    ami_type = "AL2_x86_64_GPU"
    taints = [{ key = "nvidia.com/gpu" ... }]
  }
}

AKS GPU Nodes

hcl

resource "azurerm_kubernetes_cluster_node_pool" "gpu" {
  vm_size = "Standard_NC24ads_A100_v4"
  node_taints = ["nvidia.com/gpu=true:NoSchedule"]
}

OKE GPU Nodes

hcl

resource "oci_containerengine_node_pool" "gpu" {
  node_shape = "BM.GPU.A100-v2.8"
}

Full examples: resources/modules.tf

Resources

Infrastructure as Code for enterprise AI deployments.

Search AI Tools

Terraform IaC Expert

Install this agent skill to your Project

SKILL.md

Terraform IaC Expert

Project Structure

Module Overview

AWS AI Infrastructure

Bedrock Module

Key Resources

Azure AI Infrastructure

Azure OpenAI Module

Key Resources

OCI AI Infrastructure

GenAI Module

Key Resources

Vector Store Infrastructure

OpenSearch Serverless (AWS)

Multi-Cloud Environment

Best Practices

State Management

Variable Validation

Sensitive Data

Tagging

CI/CD Integration

GitHub Actions

Key Patterns

Managed Kubernetes GPU

EKS GPU Nodes

AKS GPU Nodes

OKE GPU Nodes

Resources