Real-Time Fraud Detection — CareerAlign GCP MLE Use Cases

1

The Problem

Credit card fraud costs the global financial industry an estimated $32 billion per year, and that figure is growing as digital payments accelerate. Traditional rule-based fraud detection systems — built on static thresholds like "flag any transaction over $5,000" — are increasingly inadequate. Fraudsters adapt their tactics faster than analysts can write new rules, creating an arms race that rules-based systems are losing.

The True Cost of Fraud

The direct losses from fraudulent transactions are only part of the story. The downstream costs are staggering:

$118B/year in declined legitimate transactions (false positives)
Customer churn from blocked good transactions (13x more likely to switch banks)
Operational cost of manual review teams (average bank employs 200+ fraud analysts)
Regulatory fines for inadequate fraud prevention (PCI-DSS non-compliance)
Reputational damage and loss of consumer trust

Exam Insight

The GCP MLE exam frequently tests your understanding of class imbalance, real-time serving, and model monitoring — all of which are core to fraud detection. This use case maps to multiple exam objectives including ML pipeline design, feature engineering, and production ML systems.

Why Fraud Detection Is Hard

Fraud detection presents a unique combination of ML challenges that makes it one of the most demanding production systems to build and maintain:

Challenge	Impact	GCP Solution
Extreme class imbalance	Only 0.1–0.5% of transactions are fraudulent	SMOTE, class weights, custom loss functions
Real-time latency	Must decide in <100ms to block transactions	Vertex AI Online Prediction, Feature Store
Concept drift	Fraud patterns change weekly/monthly	Vertex AI Model Monitoring, automated retraining
Adversarial actors	Fraudsters actively try to evade detection	Ensemble models, feature diversification
Asymmetric cost	Cost of missing fraud >> cost of false alarm	Custom threshold optimization, cost-sensitive learning
Explainability	Regulators require reasoning for each decline	Vertex Explainable AI, SHAP values

The core constraint is latency. When a customer swipes their card at a point-of-sale terminal, the payment network has approximately 2 seconds to authorize or decline the transaction. Within that window, your fraud model must receive the transaction data, compute features, run inference, and return a decision. In practice, the ML component gets less than 100ms of that budget, and our target is under 50ms.

2

Solution Architecture

Our fraud detection system uses a fully GCP-native architecture optimized for real-time, low-latency prediction. Every component is a managed service, minimizing operational overhead while maximizing reliability.

System Diagram

Real-time fraud detection architecture — Transaction Stream → Pub/Sub → Dataflow → Feature Store + Online Prediction → Decision Engine → Monitoring

Component Breakdown

Each component in the pipeline has a specific role and maps to GCP services you will encounter on the MLE exam:

Cloud Pub/Sub serves as the streaming ingestion layer. Every transaction from every payment terminal, mobile app, and e-commerce checkout is published as a message to a Pub/Sub topic. Pub/Sub provides at-least-once delivery with global availability, handling millions of transactions per second with sub-10ms publish latency. Messages are retained for up to 7 days, providing a buffer for downstream processing failures.

Cloud Dataflow (Apache Beam) performs real-time feature computation. As transaction messages arrive, Dataflow computes sliding-window aggregations (transaction velocity, average amount in last N minutes), joins with user profile data, and enriches the transaction with computed features. Dataflow autoscales horizontally to handle traffic spikes like Black Friday.

Vertex AI Feature Store serves pre-computed user profile features with single-digit millisecond latency. Features like "average transaction amount over 30 days," "typical transaction countries," and "device fingerprint history" are batch-computed and synced to the online store. This avoids recomputing expensive aggregations at inference time.

Vertex AI Online Prediction hosts the trained XGBoost model behind an autoscaling endpoint. The endpoint receives the enriched feature vector and returns a fraud probability score in under 50ms. Multiple model versions can be deployed simultaneously for A/B testing.

Decision Engine applies business rules on top of the model score. A score above 0.85 triggers an automatic decline, 0.5–0.85 routes to manual review, and below 0.5 is approved. These thresholds are tunable per merchant category and customer segment.

Model Monitoring via Vertex AI continuously tracks prediction distributions, feature distributions, and model performance. When concept drift is detected (fraud patterns changing), it triggers an alert via Cloud Functions that can initiate an automated retraining pipeline.

Architecture Note

This architecture separates online features (computed in real-time by Dataflow) from batch features (pre-computed and served by Feature Store). This dual-path approach is a common pattern in production ML systems and is frequently tested on the exam.

3

Feature Engineering for Fraud

Feature engineering is the single most important factor in fraud detection model performance. Raw transaction data (amount, timestamp, merchant ID) is necessary but insufficient. The signal for fraud lies in behavioral patterns — deviations from a customer's established transaction profile.

Velocity Features

Velocity features capture the rate of transactions over sliding time windows. A legitimate customer might make 3–5 transactions per day; a stolen card often generates dozens of transactions in rapid succession as the fraudster tries to extract maximum value before the card is blocked.

# Velocity feature computation in Dataflow (Apache Beam)
def compute_velocity_features(transaction, user_history):
    """Compute transaction velocity over multiple time windows."""
    current_time = transaction['timestamp']

    windows = {
        '1min':  60,
        '5min':  300,
        '1hour': 3600,
        '24hour': 86400,
    }

    features = {}
    for name, seconds in windows.items():
        window_start = current_time - seconds
        window_txns = [
            t for t in user_history
            if t['timestamp'] >= window_start
        ]
        features[f'txn_count_{name}'] = len(window_txns)
        features[f'txn_amount_sum_{name}'] = sum(
            t['amount'] for t in window_txns
        )
        features[f'txn_amount_avg_{name}'] = (
            features[f'txn_amount_sum_{name}'] / max(features[f'txn_count_{name}'], 1)
        )

    # Time since last transaction (seconds)
    features['time_since_last_txn'] = (
        current_time - user_history[-1]['timestamp']
        if user_history else -1
    )

    return features

Geo-Anomaly Features

Geographic anomaly detection identifies physically impossible transaction patterns. If a customer makes a purchase in New York and another in London 30 minutes later, one of those transactions is almost certainly fraudulent. We compute the haversine distance between consecutive transactions and compare it against physically possible travel speeds.

import math

def compute_geo_features(transaction, last_transaction):
    """Detect impossible travel and geographic anomalies."""
    if last_transaction is None:
        return {'geo_distance_km': 0, 'travel_speed_kmh': 0, 'impossible_travel': 0}

    # Haversine distance calculation
    lat1, lon1 = math.radians(transaction['lat']), math.radians(transaction['lon'])
    lat2, lon2 = math.radians(last_transaction['lat']), math.radians(last_transaction['lon'])

    dlat = lat2 - lat1
    dlon = lon2 - lon1
    a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
    distance_km = 2 * 6371 * math.asin(math.sqrt(a))

    time_hours = (transaction['timestamp'] - last_transaction['timestamp']) / 3600
    speed_kmh = distance_km / max(time_hours, 0.001)

    return {
        'geo_distance_km': distance_km,
        'travel_speed_kmh': speed_kmh,
        'impossible_travel': 1 if speed_kmh > 900 else 0,  # > commercial flight speed
        'same_country': 1 if transaction['country'] == last_transaction['country'] else 0,
    }

Device Fingerprint Features

Device fingerprinting identifies the hardware and software characteristics of the device used for the transaction. A legitimate customer typically uses 1–3 devices; fraudsters often use virtual machines, emulators, or rapidly switch devices. Key device features include:

Device ID consistency — has this device been seen with this customer before?
Device count — how many unique devices has the customer used in 30 days?
Browser/OS combination — unusual combinations suggest emulators
IP address analysis — VPN/proxy detection, IP geolocation vs billing address
Screen resolution and timezone — mismatches with claimed location

Merchant Category Analysis

Different merchant categories have very different fraud profiles. High-risk categories include digital goods (instant delivery, hard to reverse), wire transfers, and gambling sites. We encode merchant category behavior as features:

def compute_merchant_features(transaction, user_profile):
    """Merchant-based behavioral features."""
    mcc = transaction['merchant_category_code']

    # Has customer ever used this merchant category?
    known_categories = set(user_profile.get('mcc_history', []))

    # High-risk MCC codes (digital goods, gambling, wire transfer)
    high_risk_mccs = {5816, 5817, 5818, 7995, 6012, 4829}

    return {
        'is_new_mcc': 1 if mcc not in known_categories else 0,
        'is_high_risk_mcc': 1 if mcc in high_risk_mccs else 0,
        'mcc_txn_ratio': user_profile.get(f'mcc_{mcc}_ratio', 0.0),
        'mcc_avg_amount': user_profile.get(f'mcc_{mcc}_avg', 0.0),
        'amount_vs_mcc_avg': (
            transaction['amount'] / max(user_profile.get(f'mcc_{mcc}_avg', 1.0), 0.01)
        ),
    }

Feature Engineering Tip

The most powerful fraud features are relative, not absolute. "Transaction amount is $500" is weakly predictive. "Transaction amount is 8x the customer's average" is strongly predictive. Always normalize features against the customer's own behavioral baseline.

4

Handling Extreme Class Imbalance

With a fraud rate of only 0.1–0.5%, a naive model that predicts "not fraud" for every transaction would achieve 99.5%+ accuracy. This is useless. We need specialized techniques to train models that can identify the rare fraudulent transactions while minimizing false positives on the overwhelming majority of legitimate ones.

SMOTE (Synthetic Minority Over-sampling Technique)

SMOTE generates synthetic fraudulent examples by interpolating between existing fraud cases in feature space. For each minority class sample, SMOTE finds its k-nearest neighbors (also minority class) and creates new samples along the line segments connecting them.

from imblearn.over_sampling import SMOTE
from imblearn.combine import SMOTETomek

# Basic SMOTE
smote = SMOTE(
    sampling_strategy=0.3,    # Target 30% minority ratio (not 50%)
    k_neighbors=5,
    random_state=42
)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)

# SMOTE + Tomek links (removes borderline samples)
smote_tomek = SMOTETomek(
    smote=SMOTE(sampling_strategy=0.3, random_state=42),
    random_state=42
)
X_clean, y_clean = smote_tomek.fit_resample(X_train, y_train)

print(f"Original: {sum(y_train)} fraud / {len(y_train)} total")
print(f"After SMOTE: {sum(y_resampled)} fraud / {len(y_resampled)} total")

Important

Never apply SMOTE to the test set. Synthetic samples must only be generated from training data. Applying SMOTE before the train/test split leads to data leakage and inflated evaluation metrics.

Class Weights

An alternative to resampling is telling the model to penalize misclassification of the minority class more heavily. XGBoost supports this natively via the scale_pos_weight parameter:

import xgboost as xgb

# Calculate class weight ratio
n_negative = sum(y_train == 0)
n_positive = sum(y_train == 1)
scale_weight = n_negative / n_positive  # e.g., 199 for 0.5% fraud rate

model = xgb.XGBClassifier(
    scale_pos_weight=scale_weight,
    max_depth=6,
    learning_rate=0.1,
    n_estimators=500,
    eval_metric='aucpr',   # Use PR-AUC, NOT ROC-AUC for imbalanced data
    early_stopping_rounds=20,
)

model.fit(
    X_train, y_train,
    eval_set=[(X_val, y_val)],
    verbose=50
)

Anomaly Detection Approach

For extremely rare fraud types, supervised classification may not have enough positive examples to learn from. In these cases, an unsupervised anomaly detection model trained only on legitimate transactions can flag outliers as potential fraud. This approach is especially useful for detecting novel fraud patterns that have never been seen before (zero-day fraud).

from sklearn.ensemble import IsolationForest

# Train only on legitimate transactions
legitimate_txns = X_train[y_train == 0]

iso_forest = IsolationForest(
    n_estimators=200,
    contamination=0.005,  # Expected fraud rate
    max_features=0.8,
    random_state=42
)
iso_forest.fit(legitimate_txns)

# Anomaly scores (lower = more anomalous)
anomaly_scores = iso_forest.decision_function(X_test)
predictions = iso_forest.predict(X_test)  # -1 for anomaly, 1 for normal

In production, we often use an ensemble approach: combine the supervised XGBoost model (good at catching known fraud patterns) with an unsupervised anomaly detector (good at catching novel patterns). The decision engine weighs both scores.

5

XGBoost Model Training

Training on Vertex AI

We use Vertex AI Custom Training to train our XGBoost model at scale. Vertex AI manages the compute infrastructure, experiment tracking, and model versioning. The training job reads data from BigQuery, applies feature transformations, handles class imbalance, and outputs a trained model artifact to Cloud Storage.

# Vertex AI Custom Training Job (training script)
from google.cloud import aiplatform
import xgboost as xgb
from sklearn.model_selection import StratifiedKFold
import json, os

# Initialize Vertex AI
aiplatform.init(
    project='your-project-id',
    location='us-central1',
    staging_bucket='gs://your-bucket/staging'
)

# Define hyperparameters
params = {
    'objective': 'binary:logistic',
    'eval_metric': ['aucpr', 'auc'],
    'max_depth': 6,
    'learning_rate': 0.05,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'min_child_weight': 5,
    'scale_pos_weight': 199,  # Inverse fraud ratio
    'reg_alpha': 0.1,
    'reg_lambda': 1.0,
    'tree_method': 'hist',   # Fast histogram-based training
    'seed': 42,
}

# Create DMatrix with optimized data format
dtrain = xgb.DMatrix(X_train, label=y_train)
dval = xgb.DMatrix(X_val, label=y_val)
dtest = xgb.DMatrix(X_test, label=y_test)

# Train with early stopping
model = xgb.train(
    params,
    dtrain,
    num_boost_round=1000,
    evals=[(dtrain, 'train'), (dval, 'val')],
    early_stopping_rounds=30,
    verbose_eval=50
)

# Save model artifact
model_path = os.path.join(os.environ.get('AIP_MODEL_DIR', '.'), 'model.bst')
model.save_model(model_path)

Evaluation Metrics

For imbalanced classification, standard accuracy is meaningless. We focus on metrics that specifically measure performance on the minority class:

Metric	What It Measures	Target
Precision-Recall AUC	Overall trade-off between precision and recall	> 0.75
Recall @ 5% FPR	Fraud caught while keeping false positives at 5%	> 0.90
F2-Score	Weighted harmonic mean (recall weighted 2x over precision)	> 0.80
Cost-Weighted Loss	Business-specific: cost of missed fraud vs false positives	Minimize $
ROC-AUC	Discriminative ability across all thresholds	> 0.98

Exam Tip

The MLE exam will ask you which metric to use for imbalanced datasets. PR-AUC (Precision-Recall AUC) is preferred over ROC-AUC because ROC-AUC can be misleadingly high when the negative class dominates. A model with 0.99 ROC-AUC might still have poor precision on the minority class.

6

Vertex AI Feature Store

The Feature Store is critical for bridging the training-serving skew gap. Features computed during training must be served identically during inference. Without a Feature Store, you risk subtle bugs where features are computed differently in the training pipeline (Python/Spark) versus the serving pipeline (Java/Go), leading to degraded model performance in production.

Online Serving Setup

# Create Feature Store for fraud detection
from google.cloud import aiplatform

aiplatform.init(project='your-project', location='us-central1')

# Create the Feature Store instance
feature_store = aiplatform.Featurestore.create(
    featurestore_id='fraud_detection_fs',
    online_store_fixed_node_count=3,  # For low-latency serving
)

# Create entity type for user profiles
user_entity = feature_store.create_entity_type(
    entity_type_id='user_profile',
    description='Customer transaction profiles for fraud detection'
)

# Define features
features_config = {
    'avg_txn_amount_30d':     {'value_type': 'DOUBLE'},
    'txn_count_30d':           {'value_type': 'INT64'},
    'unique_merchants_30d':    {'value_type': 'INT64'},
    'unique_countries_30d':    {'value_type': 'INT64'},
    'avg_txn_amount_7d':      {'value_type': 'DOUBLE'},
    'max_txn_amount_30d':     {'value_type': 'DOUBLE'},
    'typical_txn_hour':       {'value_type': 'DOUBLE'},
    'device_count_30d':       {'value_type': 'INT64'},
    'high_risk_mcc_ratio':    {'value_type': 'DOUBLE'},
    'days_since_first_txn':   {'value_type': 'INT64'},
}

for feat_id, config in features_config.items():
    user_entity.create_feature(
        feature_id=feat_id,
        value_type=config['value_type'],
    )

Batch-to-Online Sync

User profile features are computed daily in a BigQuery batch pipeline and then synced to the Feature Store's online serving layer. This ensures the online store always has fresh data while keeping compute costs manageable:

# Batch ingest from BigQuery to Feature Store
user_entity.ingest_from_bq(
    feature_ids=[
        'avg_txn_amount_30d', 'txn_count_30d',
        'unique_merchants_30d', 'unique_countries_30d',
        'avg_txn_amount_7d', 'max_txn_amount_30d',
        'typical_txn_hour', 'device_count_30d',
        'high_risk_mcc_ratio', 'days_since_first_txn',
    ],
    feature_time='feature_timestamp',
    entity_id_field='user_id',
    bq_source_uri='bq://project.dataset.user_profiles_latest',
)

# Online read at serving time (single-digit ms latency)
user_features = user_entity.read(
    entity_ids=['user_12345'],
    feature_ids=[
        'avg_txn_amount_30d', 'txn_count_30d',
        'unique_merchants_30d', 'max_txn_amount_30d',
    ]
)

Training-Serving Consistency

The Feature Store ensures that the same feature values used during training are available at serving time. This eliminates training-serving skew, one of the most insidious bugs in production ML. The exam tests this concept under the topic of "ML pipeline best practices."

7

Real-Time Prediction Pipeline

Pub/Sub Ingestion

Every transaction event is published to a Pub/Sub topic in a standardized JSON schema. The Pub/Sub topic acts as a durable, ordered buffer between the payment processing system and the ML pipeline:

# Publishing a transaction event to Pub/Sub
from google.cloud import pubsub_v1
import json

publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path('project-id', 'transactions')

transaction_event = {
    'transaction_id': 'txn_abc123',
    'user_id': 'user_12345',
    'amount': 459.99,
    'merchant_id': 'merch_789',
    'merchant_category_code': 5411,
    'timestamp': 1709723400,
    'lat': 40.7128,
    'lon': -74.0060,
    'country': 'US',
    'device_id': 'dev_456',
    'channel': 'pos_terminal',
}

future = publisher.publish(
    topic_path,
    json.dumps(transaction_event).encode('utf-8'),
    transaction_id='txn_abc123',  # Attribute for filtering
)
print(f"Published message ID: {future.result()}")

Dataflow Feature Computation

A Dataflow streaming pipeline subscribes to the Pub/Sub topic, computes real-time features, fetches user profile features from the Feature Store, and assembles the complete feature vector for prediction:

# Apache Beam pipeline for real-time feature computation
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions

options = PipelineOptions([
    '--runner=DataflowRunner',
    '--project=your-project',
    '--region=us-central1',
    '--streaming',
    '--autoscaling_algorithm=THROUGHPUT_BASED',
])

with beam.Pipeline(options=options) as p:
    transactions = (
        p
        | 'ReadPubSub' >> beam.io.ReadFromPubSub(
            topic='projects/your-project/topics/transactions'
        )
        | 'ParseJSON' >> beam.Map(json.loads)
    )

    # Compute real-time velocity features (windowed)
    velocity = (
        transactions
        | 'KeyByUser' >> beam.Map(lambda t: (t['user_id'], t))
        | 'Window5min' >> beam.WindowInto(
            beam.window.SlidingWindows(300, 60)  # 5min window, 1min slide
        )
        | 'ComputeVelocity' >> beam.ParDo(ComputeVelocityFn())
    )

    # Enrich with Feature Store profile data
    enriched = (
        velocity
        | 'FetchProfile' >> beam.ParDo(FetchFeatureStoreFn())
        | 'AssembleVector' >> beam.ParDo(AssembleFeatureVectorFn())
    )

    # Send to prediction endpoint
    results = (
        enriched
        | 'Predict' >> beam.ParDo(CallPredictionEndpointFn())
        | 'RouteDecision' >> beam.ParDo(DecisionEngineFn())
    )

Online Prediction Endpoint

The trained model is deployed to a Vertex AI endpoint with autoscaling configured for peak transaction volumes:

# Deploy model to Vertex AI endpoint
from google.cloud import aiplatform

# Upload model
model = aiplatform.Model.upload(
    display_name='fraud-detection-xgboost-v2',
    artifact_uri='gs://your-bucket/models/fraud_v2/',
    serving_container_image_uri=(
        'us-docker.pkg.dev/vertex-ai/prediction/'
        'xgboost-cpu.1-7:latest'
    ),
)

# Create endpoint
endpoint = aiplatform.Endpoint.create(
    display_name='fraud-detection-endpoint',
)

# Deploy with autoscaling
model.deploy(
    endpoint=endpoint,
    machine_type='n1-standard-4',
    min_replica_count=2,    # Always-on for low latency
    max_replica_count=10,   # Scale for peak traffic
    traffic_percentage=100,
    deploy_request_timeout=1200,
)

# Make a prediction (in the serving path)
prediction = endpoint.predict(
    instances=[feature_vector],  # Single transaction feature vector
)
fraud_score = prediction.predictions[0]  # Probability of fraud

Latency Optimization

Setting min_replica_count=2 eliminates cold-start latency. The endpoint always has warm instances ready to serve predictions. For fraud detection, a cold start adding 5–10 seconds is unacceptable. The trade-off is higher baseline cost, but for financial services, the cost of downtime far exceeds the cost of always-on compute.

8

Model Monitoring & Drift Detection

Concept Drift in Fraud

Fraud detection models face a unique challenge: the data distribution shifts intentionally. Unlike most ML domains where drift is a natural process (e.g., seasonal changes in demand), fraud patterns shift because adversaries are actively trying to evade your model. This makes continuous monitoring and rapid retraining essential.

Common types of drift in fraud detection:

Feature drift — input distributions change (new transaction patterns post-COVID)
Concept drift — the relationship between features and labels changes
Label drift — the fraud rate itself changes (new attack campaign)
Adversarial drift — fraudsters learn model weaknesses and adapt

# Set up Vertex AI Model Monitoring
from google.cloud import aiplatform

# Configure monitoring job
monitoring_job = aiplatform.ModelDeploymentMonitoringJob.create(
    display_name='fraud-model-monitoring',
    endpoint=endpoint,
    logging_sampling_strategy={
        'random_sample_config': {'sample_rate': 0.1}  # Sample 10% of predictions
    },
    schedule_config={'monitor_interval': {'seconds': 3600}},  # Check hourly
    objective_configs=[
        {
            'deployed_model_id': deployed_model.id,
            'objective_config': {
                'training_dataset': {
                    'bigquery_source': {
                        'data_uri': 'bq://project.dataset.training_data'
                    }
                },
                'training_prediction_skew_detection_config': {
                    'skew_thresholds': {
                        'txn_amount': {'value': 0.3},
                        'velocity_1hour': {'value': 0.2},
                        'geo_distance': {'value': 0.25},
                    }
                },
                'prediction_drift_detection_config': {
                    'drift_thresholds': {
                        'txn_amount': {'value': 0.2},
                        'velocity_1hour': {'value': 0.15},
                    }
                },
            },
        }
    ],
    alert_config={
        'email_alert_config': {
            'user_emails': ['ml-team@company.com']
        }
    },
)

A/B Testing Fraud Models

Before fully deploying a new fraud model, we run it alongside the current production model using Vertex AI's traffic splitting. This allows us to compare real-world performance before committing to a full rollout:

# A/B test: deploy new model with 10% traffic
new_model = aiplatform.Model.upload(
    display_name='fraud-detection-xgboost-v3',
    artifact_uri='gs://your-bucket/models/fraud_v3/',
    serving_container_image_uri=(
        'us-docker.pkg.dev/vertex-ai/prediction/'
        'xgboost-cpu.1-7:latest'
    ),
)

# Deploy with traffic split
endpoint.deploy(
    model=new_model,
    machine_type='n1-standard-4',
    min_replica_count=1,
    max_replica_count=5,
    traffic_percentage=10,  # 10% of traffic to new model
)

# After validating performance, shift all traffic
endpoint.undeploy(deployed_model_id=old_model_deployment_id)
# New model now receives 100% of traffic

Production Safety

Never deploy a new fraud model to 100% traffic immediately. Even with strong offline metrics, real-world performance can differ. Start at 5–10%, monitor for 24–48 hours, then gradually increase. A bad fraud model can either block thousands of legitimate customers (too aggressive) or miss a fraud wave (too permissive).

9

Key GCP Components

This use case integrates multiple GCP services. Each maps to specific exam objectives and demonstrates real-world ML engineering skills:

⚡

Vertex AI

Unified ML platform for training, deploying, and monitoring models. Manages the full lifecycle from experiment tracking through custom training jobs to online prediction endpoints with autoscaling.

📊

Feature Store

Centralized repository for ML features with both batch and online serving. Eliminates training-serving skew by ensuring identical feature computation across pipelines. Single-digit ms online read latency.

📩

Cloud Pub/Sub

Serverless messaging service for streaming ingestion. Handles millions of transaction events per second with at-least-once delivery. Decouples payment systems from ML pipeline.

🌊

Cloud Dataflow

Fully managed stream and batch processing based on Apache Beam. Computes real-time features with sliding window aggregations. Auto-scales based on throughput.

🎯

XGBoost

Gradient boosted decision trees optimized for tabular data. Ideal for fraud detection due to handling of class imbalance (scale_pos_weight), feature importance, and fast inference (<1ms per prediction).

👁

Model Monitoring

Vertex AI Model Monitoring tracks feature drift, prediction drift, and training-serving skew. Automated alerts trigger retraining when fraud patterns change, ensuring the model stays current.

10

Results & Impact

After deploying the ML-based fraud detection system, we measured significant improvements across all key metrics compared to the previous rule-based system:

Metric	Rule-Based (Before)	ML Model (After)	Improvement
Fraud detection rate	82%	94.7%	+15.5%
False positive rate	5.2%	2.1%	-60%
Prediction latency (p99)	120ms	45ms	-62.5%
Annual fraud losses prevented	$3.2M	$8.5M	+$5.3M
Concept drift detection time	2–4 weeks (manual)	<2 hours (automated)	99%+ faster

Business Impact

The 60% reduction in false positives is arguably more valuable than the improved detection rate. Each false positive blocks a legitimate customer, risks them switching to a competitor, and costs the business revenue. At scale, reducing false positives from 5.2% to 2.1% means millions of additional approved legitimate transactions.

Key technical achievements:

94.7% fraud detection rate — catches 12.7% more fraud than rule-based system
60% reduction in false positives — fewer legitimate customers blocked
<45ms prediction latency — well within the 100ms transaction window
$8.5M annual fraud loss prevention — clear ROI for the ML investment
Concept drift detected within 2 hours — vs weeks with manual monitoring
Automated retraining pipeline — model refreshed weekly, emergency retrain on drift alert

11

Production Considerations

Deploying a fraud detection model to production introduces challenges beyond pure ML performance. These operational and regulatory concerns are frequently tested on the GCP MLE exam.

Explainability Requirements

When a transaction is declined, the customer (and often the regulator) needs to know why. "The model said so" is not an acceptable answer. We use Vertex Explainable AI to generate feature attributions for each prediction:

# Get prediction with explanations
prediction = endpoint.explain(
    instances=[feature_vector],
    parameters={'num_neighbors': 10}
)

# Extract feature attributions
explanation = prediction.explanations[0]
attributions = explanation.attributions[0]

# Top reasons for flagging
top_features = sorted(
    attributions.feature_attributions.items(),
    key=lambda x: abs(x[1]),
    reverse=True
)[:3]

# Human-readable explanation
# "Flagged due to: unusual transaction velocity (5x normal),
#  geographic anomaly (NY to London in 30min), high-risk merchant"

Regulatory Compliance (PCI-DSS)

Financial transaction data is subject to PCI-DSS (Payment Card Industry Data Security Standard) requirements. Key compliance considerations for the ML pipeline:

Data encryption — all data encrypted at rest (Cloud KMS) and in transit (TLS 1.3)
Access control — IAM roles restrict who can access training data and model artifacts
Audit logging — Cloud Audit Logs track every access to sensitive data
Data retention — automated deletion of raw transaction data after retention period
Network isolation — VPC Service Controls restrict data exfiltration

Handling Adversarial Attacks

Sophisticated fraudsters may probe the system to learn its boundaries. Countermeasures include:

Rate limiting prediction requests to prevent probing attacks
Feature diversification — use many uncorrelated features so gaming one does not bypass detection
Ensemble models — combine supervised + unsupervised so novel attacks are still caught
Honeypot features — features that adversaries might try to manipulate, triggering additional scrutiny
Randomized thresholds — slight randomization prevents adversaries from learning exact boundaries

Cold-Start Problem

New customers have no transaction history, making behavioral features unavailable. Strategies for cold-start fraud detection:

Fall back to population-level features (average behavior for similar demographics)
Use device/IP reputation scores from third-party services
Apply stricter thresholds for new customers, relaxing as history builds
Use graph-based features linking new accounts to known entities

Cost Trade-Off: False Negatives vs False Positives

The business must explicitly quantify the cost of each error type to set optimal decision thresholds:

Expected Cost = (FN × avg_fraud_amount) + (FP × avg_revenue_lost_per_block) + (TP × investigation_cost)

A typical credit card issuer might set these costs as: missed fraud = $500 average loss, false positive = $15 (lost transaction fee + customer friction), investigation = $3. This asymmetry (33:1 ratio) justifies a model threshold that favors recall over precision.

Model Retraining Frequency

Fraud models must be retrained more frequently than most production ML models. Our recommended schedule:

Trigger	Frequency	Scope
Scheduled retrain	Weekly	Full model retrain on rolling 90-day window
Drift alert	As needed	Emergency retrain triggered by monitoring
Feature refresh	Daily	Update Feature Store with latest user profiles
Full pipeline review	Quarterly	Re-evaluate feature set, model architecture, thresholds

Exam Connection

The MLE exam tests your ability to design retraining triggers and CI/CD for ML. Know the difference between scheduled retraining (time-based), performance-triggered retraining (metric degradation), and data-triggered retraining (drift detection). Vertex AI Pipelines + Cloud Scheduler enables all three.

12

Hands-On Notebook

Put these concepts into practice with the companion Colab notebook. You will generate synthetic transaction data, engineer fraud features, handle class imbalance with SMOTE and class weights, train an XGBoost model, evaluate with precision-recall curves, and optimize the decision threshold for real-world cost trade-offs.

Open the Notebook

Open in Google Colab

The notebook covers all the core techniques discussed in this page with working, end-to-end code. GCP-specific operations (Vertex AI deployment, Feature Store, Pub/Sub) are shown as annotated comments so you can run the ML portions locally while understanding how each step maps to the cloud architecture.

🛠️

Build Your Portfolio

Fork & Extend

Turn this notebook into a portfolio project in 5 steps:

Fork the notebook — Clone the repo and open in Google Colab or locally.
Swap in real data — Replace the synthetic dataset with the IEEE-CIS Fraud Detection dataset from Kaggle (590K transactions with 400+ engineered features). Download it at kaggle.com/c/ieee-fraud-detection.
Add graph-based features — Build a transaction graph linking cards, devices, and email domains. Compute PageRank and community detection scores to capture fraud rings that single-transaction models miss.
Deploy it — Wrap it in a FastAPI app with a transaction payload endpoint that returns a fraud probability score, risk tier (low/medium/high), and top contributing features. Add a Streamlit dashboard for analysts to review flagged transactions.
Write a README — Include architecture diagram, setup instructions, sample outputs, and metrics.

What Hiring Managers Look For

Pro Tip

Fraud detection portfolios stand out when they demonstrate awareness of class imbalance handling (SMOTE, class weights, focal loss) and use precision-recall curves instead of ROC-AUC as the primary evaluation metric. Show that you understand the business trade-off: a model with 95% precision blocks $X in fraud but also creates $Y in customer friction from false declines. Include latency benchmarks proving your model can score transactions within the 100ms SLA that payment processors require.

Public Datasets to Use

IEEE-CIS Fraud Detection — 590K e-commerce transactions with 400+ features including device fingerprints, card hashes, and time-delta patterns. Available on Kaggle. The gold standard for tabular fraud detection projects.
Credit Card Fraud (ULB) — 284K European credit card transactions with 28 PCA-transformed features and only 492 fraud cases (0.17% positive rate). Available on Kaggle. Perfect for demonstrating extreme class imbalance techniques.
PaySim Mobile Money — 6.3M synthetic mobile money transactions modeled after real transaction logs from an African mobile financial service. Available on Kaggle. Great for demonstrating streaming fraud detection on high-volume data.

Deployment Options

Platform	Best For	Effort
Streamlit	Analyst dashboard for reviewing flagged transactions with SHAP explanations	Low
Gradio	Quick demo where users paste transaction JSON and see fraud score with feature importance	Low
FastAPI	Low-latency scoring API returning fraud probability within 50ms for payment gateway integration	Medium
Docker + Cloud Run	Production fraud scoring microservice with auto-scaling for transaction volume spikes	High