GCP ML Engineer Learning Path

Build and deploy ML models directly in BigQuery using SQL. Supports classification, regression, forecasting, clustering, matrix factorization, and imported TensorFlow models.

Vertex AI

Google Cloud's unified ML platform. Provides tools for dataset management, training (AutoML and custom), model deployment, pipelines, feature store, experiments, and model monitoring.

Feature Store

A centralized repository for organizing, storing, and serving ML features. Ensures consistency between training and serving, supports point-in-time lookups, and enables feature sharing across teams.

AutoML

Automated machine learning that handles model architecture search, hyperparameter tuning, and feature engineering. Available in Vertex AI for tabular, image, text, and video data.

TFX (TensorFlow Extended)

An end-to-end platform for deploying production ML pipelines. Components include ExampleGen, StatisticsGen, SchemaGen, Transform, Trainer, Tuner, Evaluator, and Pusher.

Kubeflow Pipelines

A platform for building and deploying portable, scalable ML workflows on Kubernetes. Supports DAG-based pipeline definitions with reusable components and experiment tracking.

TPU

Tensor Processing Unit. Google's custom ASIC designed for high-throughput ML training and inference. Available in Cloud TPU pods for distributed training at massive scale.

MLOps

Machine Learning Operations. The practice of applying DevOps principles to ML systems -- continuous integration, delivery, and training (CI/CD/CT) for ML models in production.

Model Registry

A central catalog for managing ML model versions, metadata, lineage, and deployment status. Vertex AI Model Registry tracks models from training through deployment.

Dataflow

A fully managed stream and batch data processing service based on Apache Beam. Used for ETL pipelines, data transformation, and real-time feature computation for ML.

Cloud Composer

A managed Apache Airflow service for orchestrating complex workflows. Commonly used to schedule and coordinate ML pipeline steps, data ingestion, and model retraining.

TF Transform

A TFX library for preprocessing data using TensorFlow. Ensures the same transformations are applied during both training and serving, preventing training-serving skew.

Vertex AI Pipelines

A serverless ML workflow orchestration service. Supports both TFX and Kubeflow Pipelines SDK, with built-in lineage tracking and integration with other Vertex AI services.

Model Monitoring

Continuous tracking of model performance in production. Detects data drift, concept drift, and feature skew by comparing serving data distributions against training baselines.

Data Drift

When the statistical properties of production input data diverge from training data over time. A primary cause of model degradation that triggers retraining in MLOps pipelines.

Training-Serving Skew

Differences between how data is processed during training versus serving. Causes prediction errors in production. TF Transform and Feature Store are key mitigations.

Hyperparameter Tuning

The process of finding optimal model configuration (learning rate, layers, batch size). Vertex AI Vizier provides automated Bayesian optimization for hyperparameter search.

Vertex AI Experiments

A managed experiment tracking service for comparing model training runs. Logs metrics, parameters, and artifacts to help identify the best-performing model configurations.

Explainable AI

Techniques for understanding model predictions. Vertex AI provides feature attributions using Sampled Shapley, Integrated Gradients, and XRAI for image models.

SHAP

SHapley Additive exPlanations. A game-theoretic approach to model interpretability that assigns each feature an importance value for a particular prediction.

What-If Tool

An interactive visual tool for exploring ML model behavior. Allows probing model performance, testing fairness across subgroups, and investigating individual predictions without code.

Differential Privacy

A mathematical framework for quantifying privacy guarantees in data analysis and ML. Adds calibrated noise to ensure individual records cannot be identified from model outputs.

Federated Learning

A distributed ML approach where models are trained across multiple devices without centralizing data. Each device trains locally and only shares model updates, preserving data privacy.

Vertex AI Endpoints

Managed serving infrastructure for deployed models. Supports online (real-time) and batch predictions, traffic splitting for A/B testing, autoscaling, and model versioning.

MLflow

An open-source platform for the ML lifecycle. Provides experiment tracking, model packaging, registry, and deployment. Can be integrated with Vertex AI for hybrid workflows.

Colab Enterprise

A managed Jupyter notebook environment in Google Cloud with enterprise security, VPC-SC support, and integration with BigQuery, GCS, and Vertex AI services.

Dataprep

An intelligent data preparation service (by Trifacta) for visually exploring, cleaning, and transforming structured and unstructured data for ML. Runs on Dataflow under the hood.

Feature Cross

A synthetic feature created by combining two or more features. Enables linear models to learn non-linear relationships. Common in BigQuery ML and wide-and-deep models.

Distributed Training

Training ML models across multiple machines or accelerators. Strategies include data parallelism (split data) and model parallelism (split model). Supported natively in Vertex AI.

RAG

Retrieval-Augmented Generation. Combines information retrieval with generative models to ground responses in external knowledge. Vertex AI Search and Conversation enables managed RAG.

Machine LearningEngineer Path

Machine Learning
Engineer Path