Core AI & ML Concepts
Artificial Intelligence (AI) is the broad field of computer science focused on creating systems that can perform tasks that typically require human intelligence. This encompasses reasoning, learning, perception, language understanding, and decision-making. AI systems range from simple rule-based engines to sophisticated deep learning models.
Machine Learning (ML) is a subset of AI where systems learn patterns from data rather than being explicitly programmed. Instead of writing rules, you provide examples and the algorithm discovers the rules. ML powers recommendations, fraud detection, language translation, and thousands of other applications.
Deep Learning (DL) is a subset of ML that uses neural networks with multiple layers (hence "deep") to learn hierarchical representations of data. Deep learning excels at unstructured data — images, text, audio, and video — and is the foundation for modern generative AI.
AI vs ML vs Deep Learning
| Concept | Definition | Example |
|---|---|---|
| AI | Machines performing tasks requiring human-like intelligence | Chess-playing programs, virtual assistants |
| ML | Systems that learn patterns from data without explicit programming | Spam filters, recommendation engines |
| Deep Learning | Neural networks with many layers learning hierarchical features | Image recognition, speech-to-text |
| Generative AI | AI that creates new content (text, images, audio, video, code) | Gemini, Imagen, ChatGPT |
The exam tests your understanding of the nested relationship: AI ⊃ ML ⊃ Deep Learning ⊃ Generative AI. Generative AI is a specialized application of deep learning, which itself is a subset of ML, which is a subset of AI.
NLP Fundamentals
Natural Language Processing (NLP) is the branch of AI focused on enabling computers to understand, interpret, and generate human language. Key NLP tasks include:
Text Classification
Categorizing text into predefined labels. Examples: sentiment analysis, spam detection, topic classification.
Named Entity Recognition
Identifying entities (people, places, organizations, dates) within text. Foundation for information extraction.
Machine Translation
Translating text between languages. Modern approaches use sequence-to-sequence models with attention mechanisms.
Text Generation
Creating new text that is coherent and contextually appropriate. The core task of large language models like Gemini.
What Is Generative AI
Generative AI refers to artificial intelligence systems that can create new content — text, images, audio, video, code, and structured data — rather than merely classifying or analyzing existing data. These models learn the underlying distribution of their training data and generate novel outputs that are statistically similar to, but not copies of, the training examples.
Generative AI represents a paradigm shift from traditional ML. Where traditional systems consume data and produce decisions (classify this email, predict this price), generative systems produce entirely new artifacts. A generative AI model can write a marketing email, generate a product image, compose music, or write working code.
Generative vs Discriminative Models
This is a fundamental distinction tested on the exam. A discriminative model learns the decision boundary between classes — it models P(Y|X), the probability of a label Y given input X. A generative model learns the full joint distribution P(X,Y) or P(X), enabling it to generate new data points.
| Aspect | Discriminative | Generative |
|---|---|---|
| Goal | Classify / predict labels | Generate new content |
| Models | P(Y|X) — conditional | P(X) or P(X,Y) — joint |
| Output | Label, score, category | Text, image, audio, video, code |
| Examples | Logistic regression, SVM, random forest | LLMs (Gemini), GANs, diffusion models |
Many modern AI systems combine both approaches. For example, Gemini is primarily generative but can perform discriminative tasks (classification, sentiment analysis) through prompting. The exam may test whether you understand that generative models can also classify.
Use cases for generative AI span every industry: content creation (marketing copy, articles), code generation and debugging, customer service automation, document summarization, data augmentation, drug discovery, creative design, and educational tutoring. The exam expects you to identify which scenarios are best suited for generative AI versus traditional ML approaches.
Foundation Models & LLMs
A foundation model is a large AI model pre-trained on broad, diverse data that can be adapted to a wide range of downstream tasks. The term was coined by Stanford's HAI in 2021 to describe models like BERT, GPT, and PaLM that serve as a "foundation" upon which many applications are built.
Foundation models exhibit emergent capabilities — abilities that appear only at sufficient scale that were not explicitly trained for. For example, large language models can perform arithmetic, translate between languages they were not specifically trained on, and reason about novel problems. These emergent properties make foundation models uniquely powerful.
Key Characteristics of Foundation Models
Pre-trained on Massive Data
Trained on terabytes of text, images, or multimodal data. This broad training gives them general knowledge and capabilities.
Adaptable (Fine-tunable)
Can be customized for specific tasks through fine-tuning, prompt engineering, or RAG without retraining from scratch.
Multi-task Capable
A single model handles translation, summarization, Q&A, classification, and code generation without separate training.
Emergent Abilities
At large scale, capabilities emerge that were never explicitly trained: in-context learning, chain-of-thought reasoning, tool use.
The Transformer Architecture
The Transformer is the neural network architecture behind virtually all modern LLMs. Introduced in the 2017 paper "Attention Is All You Need" by Vaswani et al., it replaced recurrent architectures (RNNs, LSTMs) with a self-attention mechanism that processes all tokens in parallel rather than sequentially.
Key components of the Transformer:
- Self-Attention — allows each token to attend to every other token, capturing long-range dependencies
- Multi-Head Attention — runs multiple attention operations in parallel to capture different relationship types
- Positional Encoding — injects position information since attention is order-agnostic
- Feed-Forward Networks — applied to each position independently after attention
- Layer Normalization — stabilizes training of deep networks
You do not need to know the math behind transformers for this exam. Focus on understanding that: (1) transformers use self-attention to process all tokens in parallel, (2) this enables them to capture long-range dependencies, and (3) they are the architecture behind Gemini and all major LLMs.
Large Language Models (LLMs) are transformer-based foundation models trained specifically on text data. They predict the next token given a sequence of preceding tokens (autoregressive generation). Modern LLMs have billions to trillions of parameters and are trained on datasets containing trillions of tokens from books, websites, code repositories, and other text sources.
Diffusion Models
Diffusion models are a class of generative models that create data by learning to reverse a noise-adding process. During training, the model learns to progressively denoise a corrupted input. During generation, it starts from pure random noise and iteratively refines it into a coherent output.
Diffusion models power image generation (Imagen), video generation (Veo), and audio synthesis. They produce higher-quality, more diverse outputs than earlier approaches like GANs (Generative Adversarial Networks) and VAEs (Variational Autoencoders), and are more stable to train.
| Model Type | How It Works | Strengths | Google Example |
|---|---|---|---|
| LLM (Transformer) | Autoregressive next-token prediction | Text, code, reasoning, multimodal | Gemini, Gemma |
| Diffusion Model | Iterative denoising from noise | Photorealistic images, video | Imagen, Veo |
| GAN | Generator vs discriminator adversarial training | Fast generation, style transfer | StyleGAN (historical) |
| VAE | Encode to latent space, decode to output | Smooth interpolation, latent control | Used in Imagen pipeline |
ML Approaches
Machine learning approaches are categorized by the type of learning signal available during training. Understanding these distinctions is critical for the exam, as questions may ask when to use each approach.
Supervised, Unsupervised, and Reinforcement Learning
Supervised Learning
Learns from labeled data (input-output pairs). The model maps inputs to known correct outputs. Examples: classification, regression, object detection. Requires labeled datasets.
Unsupervised Learning
Discovers patterns in unlabeled data. No correct answers are provided. Examples: clustering, dimensionality reduction, anomaly detection. Works with raw, unlabeled datasets.
Reinforcement Learning (RL)
Agent learns by interacting with an environment and receiving rewards/penalties. Used in robotics, game playing, and RLHF for LLM alignment.
Semi-Supervised / Self-Supervised
Combines small amounts of labeled data with large amounts of unlabeled data. Self-supervised learning (used in LLM pre-training) creates labels from the data itself (predict next token).
RLHF (Reinforcement Learning from Human Feedback) is a critical technique for aligning LLMs. After pre-training, human evaluators rank model outputs, and a reward model is trained on these preferences. The LLM is then fine-tuned using RL to maximize the reward model's score, making outputs more helpful and less harmful.
ML Lifecycle
The ML lifecycle describes the end-to-end process of developing, deploying, and maintaining ML models. For generative AI projects, the lifecycle includes additional considerations around prompt engineering, evaluation of generated content, and responsible AI guardrails.
| Phase | Activities | Gen AI Considerations |
|---|---|---|
| 1. Problem Definition | Define business problem, success metrics, feasibility | Determine if gen AI is the right approach vs. traditional ML |
| 2. Data Collection | Gather, clean, and prepare training data | Collect examples for fine-tuning, RAG documents, evaluation datasets |
| 3. Model Selection | Choose architecture, model size, pre-trained model | Select from Model Garden: Gemini Pro vs Flash, open-source options |
| 4. Training / Tuning | Train from scratch or fine-tune, optimize hyperparameters | Prompt engineering, supervised fine-tuning, RLHF alignment |
| 5. Evaluation | Test on held-out data, measure metrics | Human evaluation, automated metrics (BLEU, ROUGE), safety testing |
| 6. Deployment | Serve model via API, integrate into applications | Vertex AI Endpoints, grounding configuration, safety filters |
| 7. Monitoring | Track performance, detect drift, retrain as needed | Monitor for hallucinations, bias drift, cost optimization |
Choosing the Right Model
Selecting the right model is a critical decision. The exam tests whether you can match scenarios to appropriate model choices. Key factors include:
- Task type — text generation, image creation, classification, or multimodal
- Latency requirements — real-time (use smaller/faster models like Gemini Flash) vs. batch
- Cost constraints — larger models cost more per request; fine-tuning adds upfront cost
- Data sensitivity — on-premises deployment needs (use open-weight Gemma) vs. managed API
- Quality needs — complex reasoning tasks need larger models (Gemini Pro/Ultra)
- Customization depth — prompt engineering (fast) vs. fine-tuning (slower but more tailored)
Do not assume the largest model is always best. The exam values cost-efficiency. If a task can be accomplished with Gemini Flash and prompt engineering, choosing the more expensive Ultra model is the wrong answer.
Data Fundamentals
Data is the fuel for all AI systems. The exam covers data types, data quality principles, and the distinction between structured/unstructured and labeled/unlabeled data.
Structured vs Unstructured Data
| Type | Definition | Examples | AI Approach |
|---|---|---|---|
| Structured | Organized in rows/columns with a defined schema | Databases, spreadsheets, CSV files | Traditional ML, BigQuery ML |
| Semi-structured | Has some organizational properties but no rigid schema | JSON, XML, email, log files | NLP, parsing, gen AI extraction |
| Unstructured | No predefined schema or organization | Images, audio, video, free text, PDFs | Deep learning, generative AI |
Labeled vs Unlabeled Data
Labeled data has known correct outputs (a tagged image, a classified document). It is required for supervised learning and fine-tuning. Creating labeled data is expensive and time-consuming.
Unlabeled data has no associated labels. It is abundant and cheap. Unsupervised learning and self-supervised pre-training (used for LLMs) work with unlabeled data. LLMs are pre-trained on massive unlabeled text corpora using self-supervised objectives (predict the next token).
Data Quality
Data quality directly impacts model performance. The principle "garbage in, garbage out" applies even more strongly to generative AI, where poor data leads to inaccurate, biased, or hallucinated outputs.
Accuracy
Data correctly represents the real-world entities it describes. Inaccurate training data leads to models that produce wrong answers.
Completeness
No missing values or gaps in coverage. Incomplete data causes models to have blind spots on certain topics or demographics.
Timeliness
Data is current and up-to-date. Stale training data causes models to lack knowledge of recent events (a key reason for grounding/RAG).
Consistency
No contradictions across data sources. Inconsistent data confuses models and leads to unpredictable outputs.
Representativeness
Data reflects the diversity of real-world use. Biased training data leads to models that perform poorly on underrepresented groups.
Privacy Compliance
Data handling complies with regulations (GDPR, CCPA). PII must be properly managed in training data to avoid legal and ethical issues.
Gen AI Landscape Layers
The generative AI technology stack can be understood as a layered architecture. Each layer builds on the one below and serves different stakeholders. The exam expects you to understand where Google's offerings fit within this landscape.
| Layer | Description | Google Cloud Offerings |
|---|---|---|
| Infrastructure | Hardware (GPUs, TPUs) and compute for training/serving | Cloud TPUs, A3 GPU VMs, Hypercomputer |
| Foundation Models | Pre-trained large models that power applications | Gemini, Gemma, Imagen, Veo, Chirp |
| Developer Platform | Tools for building AI apps: APIs, SDKs, fine-tuning, RAG | Vertex AI, Model Garden, Agent Builder |
| Pre-built Applications | Ready-to-use AI products for end users | Gemini App, Gemini for Workspace, Customer Engagement Suite |
The exam frequently asks you to match a scenario to the correct layer. A developer building a custom chatbot uses the Developer Platform layer (Vertex AI). A business user summarizing emails uses the Pre-built Application layer (Gemini for Workspace). A team training their own model uses the Infrastructure layer.
Google Foundation Models
Google offers a comprehensive family of foundation models, each optimized for different modalities and use cases. Understanding the distinctions between these models is heavily tested on the exam.
The Gemini Family
Gemini is Google's most capable multimodal AI model family. It natively handles text, images, audio, video, and code in a single model. Gemini comes in multiple size variants optimized for different use cases:
| Variant | Size | Best For | Key Features |
|---|---|---|---|
| Gemini Ultra | Largest | Complex reasoning, research, multimodal analysis | Highest capability, powers Gemini Advanced |
| Gemini Pro | Medium | General-purpose tasks, balanced quality and speed | Best price-performance ratio for most use cases |
| Gemini Flash | Lighter | High-volume, low-latency applications | Fastest response time, lowest cost per token |
| Gemini Nano | Smallest | On-device (phones, IoT), offline scenarios | Runs locally on Pixel, no cloud dependency |
Gemini is natively multimodal — trained from the ground up on text, images, audio, and video simultaneously. This differs from earlier approaches that bolted vision capabilities onto text models. Gemini can understand a video, read text in an image, process audio, and generate code all within a single prompt.
Gemma, Imagen, and Veo
Beyond Gemini, Google offers specialized foundation models for specific modalities:
Gemma
Google's family of open-weight models built from the same research as Gemini. Available in 2B and 7B parameter sizes. Ideal for on-premises deployment, custom fine-tuning, research, and scenarios requiring full model control. Runs on consumer hardware.
Imagen
Google's text-to-image diffusion model that generates photorealistic images from text descriptions. Available through Vertex AI. Features include inpainting, outpainting, style transfer, and image editing. Enterprise-grade with built-in safety filters.
Veo
Google's video generation model that creates high-quality videos from text or image prompts. Supports various cinematic styles, camera movements, and creative directions. Targeted at media, advertising, and creative production workflows.
Chirp
Google's speech foundation model powering Cloud Speech-to-Text V2. Supports 100+ languages with improved accuracy. Used for transcription, voice commands, and audio understanding workflows.
Gemini = proprietary, managed API, most capable. Gemma = open-weight, self-deployable, smaller. The exam tests when to recommend each: Gemma when customers need on-premises deployment, full model control, or want to avoid vendor lock-in. Gemini when they need maximum capability, multimodal features, or prefer a managed service.
# Quick comparison: Gemini vs Gemma via Vertex AI SDK
from vertexai.generative_models import GenerativeModel
# Gemini Pro - managed API, multimodal
gemini = GenerativeModel("gemini-1.5-pro")
response = gemini.generate_content("Explain quantum computing in 3 sentences.")
# Gemma - open model, deploy to your own endpoint
# Deploy Gemma from Model Garden, then call your custom endpoint
gemma = GenerativeModel("gemma-7b") # from your deployed endpoint
response = gemma.generate_content("Explain quantum computing in 3 sentences.")