📋 Generative AI Learning Path

Generative AI
Learning Path

Your complete pre-reading hub. Seventeen deep-dive study guides covering everything from transformer internals to deploying autonomous multi-agent systems in production — designed for learners who want to go from zero to building production GenAI systems.

Begin with Prerequisites → Jump to Module 1
17
Study Guides
16
Course Modules
2
Capstone Projects
17
Notebooks
🌐
Open Source
Learning Path Flowchart
The dependency graph of the course. Each phase builds on the previous. Click any box to open that study guide.
PREREQUISITES PHASE 1 · FOUNDATIONS PHASE 2 · APIs & TRAINING PHASE 3 · PROMPTING & RAG PHASE 4 · AGENTS PHASE 5 · QUALITY & SAFETY PHASE 6 · CLOUD & TOOLS PHASE 7 · CAPSTONE PROJECTS 00 · PREREQUISITES Python · APIs · Git · ML Basics · Docker MODULE 01 Foundations of Modern GenAI MODULE 02 LLMs, SLMs & Multimodal MODULE 03 API for Accessing LLMs MODULE 04 Fine-Tuning MODULE 05 LLM Hosting & APIs MODULE 06 Prompt Engineering MODULE 07 RAG Systems MODULE 08 Advanced RAG & Multimodal MODULE 09 Agents & Multi-Agent Systems MODULE 10 Evaluation Strategies MODULE 11 Guardrails MODULE 12 Model Context Protocol MODULE 13 AWS Cloud Services MODULE 14 n8n No-Code CAPSTONE · MODULE 15 Document Portal System CAPSTONE · MODULE 16 Autonomous Report Agent required path recommended order leads to capstone
All Study Guides
Each guide covers key concepts with hands-on examples.
Prerequisites
00 · PREREQUISITES
Before You Begin: Prerequisites
Everything you need to know before day one. Covers Python, REST APIs, Git, command-line fluency, NumPy/Pandas, basic neural networks, Docker, and cloud fundamentals.
PythonREST APIsGitML BasicsDocker
Phase 1 · Foundations
MODULE 01
Foundations of Modern GenAI
The bedrock of everything. How text becomes numbers, how transformers process those numbers, and how they generate coherent language. Start here.
TransformersTokenizationEmbeddingsAttention
MODULE 02
LLMs, SLMs & Multimodal Models
The complete model zoo. What separates GPT-4o from LLaMA 3 from Phi-3? When do you choose a small model over a large one? How do vision-language models work?
GPT / ClaudeLLaMA / MistralPhi / GemmaCLIP / LLaVA
Phase 2 · APIs & Training
MODULE 03
API for Accessing LLMs
The practical layer. How to make API calls, control parameters like temperature, manage token costs, switch providers without rewriting code, and use enterprise cloud services.
OpenAI APIGroqStreamingAzure / Bedrock
MODULE 04
Fine-Tuning Techniques
Teaching an LLM new tricks without breaking what it already knows. Covers the full spectrum from LoRA to RLHF, why QLoRA makes fine-tuning accessible, and how to prepare training data.
LoRA / QLoRAPEFTRLHF / DPOUnsloth
MODULE 05
LLM Hosting & API Exposure
Taking a fine-tuned model from a notebook to a production endpoint. SageMaker training jobs, real-time inference endpoints, API Gateway configuration, and client integration patterns.
SageMakerAPI GatewayECS FargateLambda
Phase 3 · Prompting & RAG
MODULE 06
Prompt Engineering
The art and science of getting what you want from an LLM. System prompts, Chain-of-Thought reasoning, ReAct loops, Jinja2 template management, structured JSON outputs, and token cost optimization.
CoTReActFew-shotStructured Output
MODULE 07
RAG Systems
How to give an LLM a memory it doesn't have by retrieving relevant documents at query time. The full pipeline: ingestion, chunking, embedding, vector search, re-ranking, and grounded generation.
Vector DBsChunkingEmbeddingsRe-ranking
MODULE 08
Advanced RAG & Multimodal
RAG systems as they exist in the wild: context window management, semantic caching, RAGAS evaluation metrics, failure modes and how to debug them, and extending RAG to handle images alongside text.
RAGASMMRMultimodalHyDE
Phase 4 · Agents
MODULE 09
Agents & Multi-Agent Systems
Where LLMs stop being chatbots and start being autonomous workers. The Observe-Think-Act loop, tool use, multi-agent coordination with LangGraph and CrewAI, memory management, human-in-the-loop controls, and safety.
LangGraphToolsMemoryHITLSafety
Phase 5 · Quality & Safety
MODULE 10
Evaluation Strategies
Knowing whether your GenAI system actually works — and why standard ML metrics spectacularly fail at this. LLM-as-a-judge patterns, RAGAS for RAG systems, observability pipelines, and the quality-speed-cost triangle.
LLM-as-JudgeRAGASBLEU/ROUGETracing
MODULE 11
Guardrails
Keeping LLM systems safe, predictable, and compliant. Input sanitization, output validation, Pydantic schema enforcement, prompt injection attack vectors and defences, and the available guardrail frameworks.
Input ValidationPydanticPrompt InjectionGuardrails.ai
MODULE 12
Model Context Protocol (MCP)
The emerging standard for how LLMs connect to tools, data, and each other. MCP versus function calling, the Host-Client-Server architecture, transport types, building FastMCP servers, and using MCP as the backbone of agentic systems.
FastMCPSTDIO/SSETool LayerOAuth
Phase 6 · Cloud & Tools
MODULE 13
AWS Cloud Services for GenAI
The AWS ecosystem for building enterprise GenAI applications. SageMaker for training and deployment, Bedrock for managed model access, OpenSearch for vector search, and specialized services for documents, NLP, vision, and speech.
SageMakerBedrockOpenSearchTextract
MODULE 14
No-Code Agent Tools (n8n)
Building powerful GenAI automations without writing a single line of Python. n8n workflows, AI nodes, multi-agent patterns, RAG integration with Pinecone/Supabase, MCP in no-code context, and real-world automation examples.
n8n WorkflowsLLM NodesAgent ChainsRAG
Phase 7 · Capstone Projects
CAPSTONE · MODULE 15
Capstone: Document Portal System
An end-to-end intelligent document analysis platform. Async ingestion pipeline, advanced RAG with query rewriting and MMR, multi-document chat, document comparison engine, Redis caching, LLM routing, and production deployment on AWS ECS/Fargate.
FastAPIRedisECS/FargateCI/CDRAGAS
CAPSTONE · MODULE 16
Capstone: Autonomous Report Agent
A multi-agent system that autonomously researches topics, reads sources, analyses findings, and generates structured reports. LangGraph state graphs, role-based agent design, web search integration, human-in-the-loop checkpoints, and streaming FastAPI backend.
LangGraphCrewAISSE StreamingHITL
Glossary
Transformer
The neural network architecture behind all modern LLMs. Uses self-attention to process entire sequences in parallel, unlike RNNs which processed tokens one at a time.
Token
The atomic unit LLMs work with — roughly 0.75 words in English. "tokenization" splits text into tokens; models are billed per token and have a maximum context window measured in tokens.
Embedding
A vector (list of numbers) that represents the meaning of text. Semantically similar texts produce similar vectors. The foundation of all similarity search and RAG systems.
Self-Attention
The mechanism that lets each token in a sequence "attend" to every other token, learning relationships regardless of positional distance. The key innovation in transformers.
Context Window
The maximum number of tokens an LLM can see at once (prompt + response combined). GPT-4o: 128k; Claude 3.5: 200k. Larger windows cost more and can reduce focus.
Temperature
Controls randomness of LLM output. 0 = deterministic/factual. 1 = creative/varied. For classification tasks, use 0. For creative writing, use 0.7–1.2. Never set above 2.
RAG
Retrieval-Augmented Generation. Giving an LLM access to relevant documents at query time rather than relying solely on its training data. The primary solution to hallucination in domain-specific applications.
Vector Database
A database optimised for storing and querying embeddings via approximate nearest-neighbour search. Key options: Pinecone (managed), Qdrant (open-source), ChromaDB (local), pgvector (PostgreSQL extension).
Fine-Tuning
Continuing to train a pre-trained model on your own data to specialise its knowledge or style. Much cheaper than training from scratch. Can be full (all parameters) or parameter-efficient (LoRA/QLoRA).
LoRA
Low-Rank Adaptation. A PEFT technique that freezes the original model weights and trains only small low-rank matrices inserted into each layer. Reduces trainable parameters by ~99% while preserving quality.
RLHF
Reinforcement Learning from Human Feedback. The technique used to align LLMs with human preferences. Humans rank outputs, a reward model learns from rankings, and the LLM is optimised against that reward model.
DPO
Direct Preference Optimization. A simpler alternative to RLHF that skips the reward model entirely, directly training the LLM on preference pairs. Widely used for instruction-following alignment.
Hallucination
When an LLM confidently generates factually incorrect information. Arises because LLMs generate statistically plausible text, not truth-checked facts. RAG and grounding are the primary mitigations.
Chain-of-Thought (CoT)
A prompting technique where you ask the model to reason step-by-step before answering. Dramatically improves accuracy on multi-step reasoning tasks. "Let's think step by step" is the simplest CoT trigger.
ReAct
Reason + Act. A prompting pattern for agents: the model alternates between Thought (reasoning), Action (tool call), and Observation (tool result) until it reaches a final answer.
Agent
An LLM that can take actions — call tools, browse the web, write code, read files — in a loop until it completes a task. Differs from a chatbot by having autonomy and access to external capabilities.
Tool Use / Function Calling
A capability where the LLM generates a structured call to a pre-defined function (e.g. search_web, run_python, query_database) rather than prose. The function executes and its result is fed back to the model.
LangGraph
A Python library for building stateful, graph-based agent workflows. Represents agent logic as a directed graph of nodes (tasks) and edges (transitions), with built-in support for loops, branching, and human-in-the-loop interrupts.
RAGAS
RAG Assessment — a framework for evaluating RAG systems. Key metrics: Faithfulness (does the answer stay grounded in context?), Answer Relevancy (does it actually address the question?), Context Precision, Context Recall.
Prompt Injection
An attack where malicious text in user input or retrieved documents overrides the system prompt, causing the LLM to ignore its instructions. A critical security concern for any LLM system processing untrusted text.
MCP
Model Context Protocol. An open standard by Anthropic for connecting LLMs to external tools and data sources. Defines a Host-Client-Server architecture with standardised transport (STDIO, SSE, HTTP) and capability types (Tools, Resources, Prompts).
PEFT
Parameter-Efficient Fine-Tuning. A family of techniques (LoRA, QLoRA, prefix tuning, adapters) that fine-tune only a small fraction of model parameters, making fine-tuning feasible on consumer hardware.
Chunking
The process of splitting documents into smaller pieces before embedding for RAG. Critical decisions: chunk size (too small = no context, too large = noisy retrieval), overlap, and whether to chunk at character, sentence, or semantic boundaries.
LLM-as-Judge
Using a capable LLM (e.g. GPT-4o) to evaluate the output of another LLM or system. Produces scores for dimensions like helpfulness, factuality, and safety. Scalable but biased toward verbose and confident answers.
Amazon Bedrock
AWS's managed service for accessing foundation models from Anthropic, Meta, Mistral, Cohere, and Amazon. Provides enterprise-grade security, VPC integration, and access control without needing to manage infrastructure.
Guardrails
Validation and safety layers that sit before and after an LLM in a production pipeline. Prevent harmful inputs from reaching the model, and catch problematic outputs before they reach the user.
System Prompt
The invisible preamble to every conversation that defines the LLM's role, instructions, constraints, and persona. Set by the developer, not the user. Has the highest level of instruction priority in most models.
Quantization
Reducing the precision of model weights (e.g. from 32-bit float to 4-bit int) to shrink model size and increase inference speed, with a small quality trade-off. QLoRA uses quantization to enable fine-tuning on consumer GPUs.