Terraform and Infrastructure as Code
State Management
Terraform state tracks the mapping between configuration and real infrastructure. On GCP, the recommended backend is Google Cloud Storage with state locking.
# Backend configuration — GCS with state locking
terraform {
backend "gcs" {
bucket = "my-project-tf-state"
prefix = "terraform/production"
}
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
}
provider "google" {
project = "my-project-id"
region = "us-central1"
}
Never store Terraform state locally in production. Use a GCS backend with versioning enabled on the bucket. This enables team collaboration, state locking (prevents concurrent modifications), and state recovery from accidental corruption.
Module Patterns
Google provides Cloud Foundation Toolkit (CFT) modules — production-ready, opinionated Terraform modules for common GCP patterns.
| CFT Module | Purpose | Key Features |
|---|---|---|
| project-factory | Create and configure projects | Billing, APIs, Shared VPC, IAM, labels |
| network | VPC, subnets, routes, firewalls | Shared VPC support, NAT, Private Google Access |
| kubernetes-engine | GKE clusters and node pools | Autopilot/Standard, Workload Identity, private clusters |
| cloud-storage | Buckets with best practices | Versioning, lifecycle, CMEK, uniform access |
| sql-db | Cloud SQL instances | HA, backups, private IP, CMEK |
Terraform in CI/CD
# cloudbuild.yaml — Terraform plan + apply pipeline
steps:
- id: 'tf-init'
name: 'hashicorp/terraform:1.7'
args: ['init']
dir: 'infrastructure/'
- id: 'tf-validate'
name: 'hashicorp/terraform:1.7'
args: ['validate']
dir: 'infrastructure/'
- id: 'tf-plan'
name: 'hashicorp/terraform:1.7'
args: ['plan', '-out=tfplan']
dir: 'infrastructure/'
# Apply only on main branch
- id: 'tf-apply'
name: 'hashicorp/terraform:1.7'
args: ['apply', '-auto-approve', 'tfplan']
dir: 'infrastructure/'
Apigee API Management
Architecture
Apigee is Google Cloud's full-lifecycle API management platform. It sits between API consumers and backend services, providing security, analytics, monetization, and developer portal capabilities.
| Apigee Tier | Deployment | Best For | Key Difference |
|---|---|---|---|
| Apigee X | Google-managed, VPC-peered | Enterprise API programs | Full feature set, global distribution |
| Apigee Hybrid | Runtime on your K8s, management in cloud | Data residency, on-prem backends | Runtime runs in your GKE/Anthos cluster |
| API Gateway | Serverless, fully managed | Simple APIs, Cloud Run/Functions backends | Lightweight, OpenAPI spec-based |
Policies and Features
- OAuth 2.0 / API Keys — Authentication and authorization for API consumers.
- Quota / Spike Arrest — Rate limiting to protect backends. Quota is per-consumer, spike arrest is per-proxy.
- Response Caching — Cache backend responses to reduce latency and backend load.
- Message Transformation — JSON-to-XML, field masking, header manipulation.
- Analytics — API traffic metrics, error rates, latency percentiles, developer adoption.
- Developer Portal — Self-service portal for API consumers to discover, register, and test APIs.
- Monetization — Billing plans for API usage (freemium, pay-per-call, tiered).
Apigee vs API Gateway: Use API Gateway for simple serverless APIs (Cloud Run, Cloud Functions) with basic authentication and rate limiting. Use Apigee when you need a full API program with developer portal, monetization, advanced analytics, or complex policy chains.
Deployment Strategies
Strategy Comparison
| Strategy | How It Works | Risk | Rollback Speed | GCP Support |
|---|---|---|---|---|
| Rolling Update | Gradually replace instances | Medium | Medium (reverse roll) | GKE, MIGs |
| Blue/Green | Full parallel environment, switch traffic | Low | Instant (switch back) | Cloud Run revisions, GKE |
| Canary | Small % of traffic to new version | Lowest | Instant (route to old) | Cloud Deploy, Cloud Run, GKE |
| Recreate | Kill old, start new (downtime) | Highest | Slow (redeploy old) | Basic K8s deployment |
| A/B Testing | Route by user attributes (not random) | Low | Instant (route change) | Istio/Anthos Service Mesh, Cloud Run |
Rollback Patterns
# Cloud Run — instant rollback to previous revision
gcloud run services update-traffic my-api \
--to-revisions=my-api-00005-abc=100 \
--region=us-central1
# GKE — rollback a deployment
kubectl rollout undo deployment/my-app
kubectl rollout status deployment/my-app
# Cloud Deploy — rollback a release
gcloud deploy rollbacks create my-rollback \
--delivery-pipeline=my-pipeline \
--release=release-001 \
--region=us-central1
Gemini Cloud Assist
Capabilities
Gemini Cloud Assist is an AI-powered assistant integrated into the Google Cloud Console. It helps architects and operators with natural-language queries about their infrastructure.
Natural Language Queries
Ask questions about your GCP resources in plain English. "Which VMs are over-provisioned?" or "Show me the firewall rules for project-x."
Troubleshooting
Diagnose issues with network connectivity, permission errors, and service misconfigurations. Suggests remediation steps.
Code Generation
Generate gcloud commands, Terraform configurations, and Cloud Build pipelines from natural-language descriptions.
Cost Insights
Analyze spending patterns, identify optimization opportunities, and explain billing line items in context.
Gemini Cloud Assist is an operational tool, not an architecture replacement. It accelerates implementation by generating configurations and troubleshooting issues, but architecture decisions still require human judgment about trade-offs, compliance, and business requirements.
Cloud SDKs and Tools
gcloud Deep Dive
| Tool | Purpose | Key Commands |
|---|---|---|
| gcloud | Primary CLI for GCP resources | gcloud compute, gcloud container, gcloud run |
| gsutil / gcloud storage | Cloud Storage operations | gsutil cp, gcloud storage cp |
| bq | BigQuery operations | bq query, bq mk, bq load |
| kubectl | Kubernetes cluster management | kubectl apply, kubectl get |
| terraform | Infrastructure as Code | terraform plan, terraform apply |
Client Libraries
Google Cloud provides idiomatic client libraries for Python, Java, Go, Node.js, C#, Ruby, and PHP. These are preferred over REST API calls for application integration.
# Python — Cloud Storage client library
from google.cloud import storage
client = storage.Client(project="my-project")
bucket = client.bucket("my-bucket")
blob = bucket.blob("data/report.csv")
blob.upload_from_filename("/tmp/report.csv")
# Python — BigQuery client library
from google.cloud import bigquery
client = bigquery.Client(project="my-project")
query = """
SELECT name, COUNT(*) as count
FROM `my-project.my_dataset.my_table`
GROUP BY name ORDER BY count DESC LIMIT 10
"""
results = client.query(query).result()
for row in results:
print(f"{row.name}: {row.count}")
Testing Frameworks
For the PCA exam, understand the testing pyramid and how it applies to cloud architectures:
- Infrastructure Tests — Validate Terraform plans with
terraform validate, policy checks with Sentinel or OPA/Gatekeeper, and drift detection. - Container Image Scanning — Artifact Registry automatically scans for CVEs. Binary Authorization enforces that only scanned images deploy to GKE.
- Integration Testing — Test service interactions in a staging environment that mirrors production. Cloud Build can orchestrate multi-service test suites.
- Smoke Testing — Post-deployment health checks to verify the new version serves traffic correctly before promoting.
Binary Authorization is the exam-favorite answer for "ensure only trusted containers run in GKE." It requires container images to be signed by trusted attestors before deployment. Combined with Artifact Registry vulnerability scanning, it forms a complete supply chain security solution.
Exam Tips
"A company wants to manage their cloud infrastructure as code with proper state management, team collaboration, and CI/CD..."
Answer: Terraform with GCS backend for state, Cloud Build for CI/CD pipeline (plan on PR, apply on merge to main), Cloud Foundation Toolkit modules for standardized resource creation.
"An enterprise wants to expose internal APIs to partners with rate limiting, authentication, and a developer portal..."
Answer: Apigee X. Full API management with OAuth 2.0, quota policies, response caching, analytics, and a developer portal. API Gateway is too basic for this use case.
"A team wants to deploy a new version of their Cloud Run service with minimal risk..."
Answer: Canary deployment using Cloud Run traffic splitting. Deploy the new revision, route 10% of traffic to it, monitor error rates and latency in Cloud Monitoring, then gradually increase to 100%. Use gcloud run services update-traffic for instant rollback.
Implementation questions test your ability to execute, not just design. Know the specific gcloud commands, Terraform patterns, and deployment configurations. The exam may present code snippets and ask you to identify errors or choose the correct configuration.