Prerequisites — CareerAlign GenAI

00

Why These Prerequisites Exist

This course is designed around building real production-grade GenAI systems — not studying theory from a whiteboard. Every module will involve writing code that calls APIs, processes data, orchestrates multi-step pipelines, and deploys to cloud infrastructure. That means the material will not pause to explain what a Python decorator is, what an HTTP status code means, or how to commit code to a repository. These things are used constantly, without narration.

Think of it this way: if you were starting a construction project, the prerequisite would not be "know that hammers exist." It would be "you have used a hammer and can drive a nail reliably." The prerequisites for this course work the same way. You do not need to be an expert at any of these skills — you need to be functionally fluent: able to read code using these concepts, write basic implementations without looking up the fundamentals, and debug when something is slightly off.

This guide exists to close the gap. Each section is organized as: a plain-language explanation of the concept (for readers who learned it once but need a refresher), followed by a deep technical dive (for readers who want to really understand what is happening under the hood), and code examples showing the exact patterns you will see repeatedly throughout this guide.

At the end of each section there is an honest assessment of what "comfortable" means. Do not skip the self-assessment — Module 01 will include a practical exercise for each of these topics, and feeling unprepared at that stage is both stressful and sets you behind for modules that build on top of these concepts.

Each prerequisite skill is used repeatedly across many course modules. The diagram shows the primary dependency paths.

01

Python — Intermediate Level

Python is the lingua franca of the entire GenAI ecosystem. Every library you will use — OpenAI's SDK, LangChain, LangGraph, FAISS, ChromaDB, FastAPI, SageMaker — is Python-first. This course does not use Python as a scripting toy; it uses Python as an engineering language, with proper structure, error handling, classes, and asynchronous code.

Knowing Python "a little" is not the same as being ready for this course. Being ready means you can read a file with a function that takes another function as an argument (higher-order functions), understand why a class inherits from another class, know what a context manager is without looking it up, and write a generator expression without pausing to think. These patterns appear in the very first code examples in Module 01.

Do not worry if you learned Python for data analysis and never built web services. The course will teach you FastAPI and deployment. What matters is that you are solid on the language's core mechanics: how Python handles scope, how objects work, and how to organize code into readable, reusable units. Everything else is library-level knowledge that you can look up.

Functions, Decorators & Higher-Order Patterns

A function in Python is a first-class object — it can be passed as an argument, returned from another function, and stored in a variable. This is not a theoretical curiosity; the course uses this pattern constantly. LangChain's tool system expects you to pass a function into a decorator that registers it as a callable tool for an agent. FastAPI expects you to decorate functions with route annotations. Understanding that decorators are just syntactic sugar for wrapping a function in another function is the key that makes all of this legible.

A decorator works like this: @some_decorator above a function is exactly equivalent to writing my_func = some_decorator(my_func) after defining my_func. The decorator receives the original function, wraps some behavior around it (logging, timing, authentication checking, route registration), and returns a new function that has the extra behavior. When you call your decorated function, you are actually calling the wrapper — which calls the original function internally. This is the single most important Python pattern in the course; it appears in FastAPI routes, LangChain tools, caching layers, and retry logic.

# Decorators: the course uses this pattern everywhere
import time
from functools import wraps

def timing(func):
    # 'wraps' preserves the original function's name/docstring
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        elapsed = time.perf_counter() - start
        print(f"{func.__name__} took {elapsed:.3f}s")
        return result
    return wrapper

@timing
def call_llm(prompt: str) -> str:
    # Simulating an API call
    time.sleep(0.5)
    return "LLM response"

# FastAPI uses exactly this pattern for route registration:
# @app.post("/generate") — registers the function as a POST handler
# @tool — registers the function as an agent tool in LangChain

Classes, Inheritance & OOP

The course uses classes extensively, primarily through inheritance — you will subclass BaseModel from Pydantic to define structured data schemas, subclass BaseTool or use abstract base classes from LangChain, and define your own pipeline classes. You do not need to be an expert in Python's full object system, but you must be comfortable with: defining __init__, understanding self, calling super().__init__() to initialize a parent class, and using dataclasses or Pydantic models for clean data structure definitions.

Pydantic is especially important. It is used in virtually every modern Python API project because it combines a class-based syntax with automatic validation. You define a class that inherits from BaseModel, declare your fields with type annotations, and Pydantic automatically validates input, coerces types, and produces structured error messages. The course uses Pydantic for: defining request/response schemas in FastAPI, defining structured LLM output schemas for forced JSON outputs, and defining pipeline configuration objects.

from pydantic import BaseModel, Field
from typing import Optional, List

# Pydantic model — used EVERYWHERE in the course
class DocumentChunk(BaseModel):
    content: str = Field(..., description="The text content of this chunk")
    source: str = Field(..., description="Source file or URL")
    chunk_index: int = Field(0, ge=0)
    embedding: Optional[List[float]] = None
    metadata: dict = Field(default_factory=dict)

# Automatic validation — try passing a string where int is expected:
chunk = DocumentChunk(
    content="Transformers are sequence-to-sequence models...",
    source="module-01.pdf",
    chunk_index=3
)
# chunk.model_dump() → {"content": "...", "source": "...", ...}

Async/Await & Context Managers

Asynchronous Python — using async def, await, and asyncio — is required for production LLM applications because API calls are I/O-bound: your program spends most of its time waiting for a network response, not computing. With async code, Python can handle hundreds of pending API requests simultaneously in a single thread, switching between them as each one becomes ready. FastAPI, the web framework the course uses for serving LLM APIs, is asynchronous by default. Most modern LLM SDK clients expose both sync and async interfaces, and the course uses both.

Context managers (the with statement) are equally important. They guarantee that resources are properly cleaned up — database connections are closed, file handles are released, HTTP sessions are torn down — even when exceptions occur. The course uses context managers for: managing database session lifecycles, handling LLM streaming responses, and creating scoped clients for AWS SDK calls. You should understand the __enter__ and __exit__ protocol and the contextlib.contextmanager decorator for building your own.

import asyncio
from anthropic import AsyncAnthropic
from contextlib import asynccontextmanager

# Async LLM call — typical course pattern
async def generate(prompt: str) -> str:
    client = AsyncAnthropic()
    message = await client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )
    return message.content[0].text

# Context manager for a database session (SQLAlchemy style)
@asynccontextmanager
async def get_db_session():
    session = Session()
    try:
        yield session
        await session.commit()
    except Exception:
        await session.rollback()
        raise
    finally:
        await session.close()

# FastAPI lifespan — sets up/tears down resources on startup/shutdown
@asynccontextmanager
async def lifespan(app):
    # startup: load model, connect to DB, warm up vector index
    yield
    # shutdown: close connections, flush caches

Self-check

You are ready if you can: write a decorator without looking at documentation, explain why super().__init__() is needed, read async code and explain the execution flow, and write a context manager using @contextmanager.

02

REST APIs & HTTP

Every LLM you will work with in this course is accessed via a REST API over HTTPS. Whether you are calling OpenAI's /v1/chat/completions endpoint, Anthropic's /v1/messages endpoint, or a model you deployed yourself on AWS, the mechanics are identical: you send an HTTP request with a JSON body, and you get back an HTTP response with a JSON body. If you do not understand these mechanics, the SDK libraries become black boxes that break mysteriously.

HTTP is a request/response protocol. Every request has a method (GET, POST, PUT, DELETE, PATCH), a URL identifying the resource, headers (key-value metadata — authentication tokens, content type, API version), and an optional body (for POST/PUT requests, this is where your data goes). Every response has a status code (200 = OK, 201 = Created, 400 = Bad Request, 401 = Unauthorized, 422 = Unprocessable Entity, 429 = Rate Limited, 500 = Server Error), headers, and a body. Understanding status codes is crucial for debugging — a 422 from an OpenAI call means your request body is malformed; a 429 means you are hitting rate limits; a 401 means your API key is wrong or expired.

The course uses the httpx library (not the older requests library) because it supports both synchronous and asynchronous code. However, in practice you will mostly use the official SDK libraries (which internally use httpx or requests), not raw HTTP. The reason to understand raw HTTP is: when the SDK fails, you need to be able to read the error response, understand what the API actually received, and debug from first principles.

A single LLM API call: your code sends an authenticated POST request; the API routes it to the model and streams or returns the generated text.

import httpx
import os

# Raw HTTP call to Anthropic — what the SDK does internally
async def raw_anthropic_call(prompt: str) -> dict:
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "https://api.anthropic.com/v1/messages",
            headers={
                "x-api-key": os.environ["ANTHROPIC_API_KEY"],
                "anthropic-version": "2023-06-01",
                "content-type": "application/json",
            },
            json={
                "model": "claude-sonnet-4-6",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": prompt}]
            }
        )
    response.raise_for_status()  # raises on 4xx/5xx
    return response.json()        # {"content": [{"text": "..."}], "usage": {...}}

Tip

Install httpx and practice making a real call to a free API (OpenWeatherMap, JSONPlaceholder) before starting. Understanding what happens when you get a 401, a timeout, or a malformed JSON response will save you hours of debugging during the course.

03

Environment Variables & .env Files

API keys are secrets. A secret should never appear in your source code — not even in a private repository, because secrets committed to git leave a permanent audit trail that can be exposed if the repository is ever made public, transferred, or compromised. The course will use four or more API keys simultaneously (OpenAI, Anthropic, Groq, AWS), and every project will require you to manage them cleanly.

The standard pattern is: store secrets as environment variables — key-value pairs that the operating system provides to your process at runtime. A .env file (with a leading dot, which makes it hidden on Unix systems) is a text file that lists these variables in KEY=VALUE format. The python-dotenv library reads this file and loads its contents into the process environment when your application starts. Your code then reads the values using os.environ["KEY"] or os.getenv("KEY", "default"). The .env file must be listed in your .gitignore so it is never committed to version control.

In production (on AWS or any cloud), you do not use a .env file at all. You inject environment variables directly into the container or serverless function via the cloud provider's secrets management service (AWS Secrets Manager, Parameter Store, or just ECS task definition environment variables). The pattern is identical from the code's perspective — your code reads from the environment — but the source of those values changes. This is precisely why the environment variable pattern is the right abstraction: your code works identically in local development and production without any changes.

# .env file (NEVER commit this file to git)
# OPENAI_API_KEY=sk-proj-abc123...
# ANTHROPIC_API_KEY=sk-ant-abc123...
# AWS_REGION=us-east-1

from dotenv import load_dotenv
import os

load_dotenv()  # reads .env and loads into os.environ

api_key = os.environ["OPENAI_API_KEY"]      # raises if missing
region  = os.getenv("AWS_REGION", "us-east-1")  # returns default if missing

# In FastAPI, load_dotenv() at the top of main.py or in a config module
# In production containers, the env vars are set by the orchestrator

04

Git — Version Control

Git is the standard way to track changes in code, collaborate with others, and manage deployment pipelines. In this course, you will use git for three things: saving your progress on exercises (committing), sharing code with instructors and teammates (pushing to GitHub or GitLab), and in the deployment modules, triggering CI/CD pipelines that automatically build and deploy your LLM application when you push to a specific branch.

The concepts you must understand: repository (a directory tracked by git), working tree (the files as they exist on disk right now), staging area / index (a snapshot of changes you have selected to include in the next commit), commit (a permanent snapshot in the history, with a unique hash ID), branch (a named pointer to a commit, allowing parallel lines of development), merge (combining two branches), and remote (a copy of the repository on a server like GitHub). You should be able to execute the full workflow: clone, make changes, stage, commit, push — without hesitation.

# The complete daily git workflow
git clone https://github.com/CareerAlign/genai-exercises.git
cd genai-exercises

# Create a branch for your work
git checkout -b module-03-api-exercise

# ... do your work ...

# Check what changed
git status
git diff

# Stage specific files (NEVER stage .env or secrets)
git add src/api_client.py src/models.py

# Commit with a clear message
git commit -m "feat: add retry logic to LLM API client"

# Push to remote
git push -u origin module-03-api-exercise

Critical

Always add .env to your .gitignore before your first commit. Many developers have accidentally exposed API keys on GitHub — it is one of the most common and costly security mistakes, and it can result in massive unexpected charges from cloud services.

05

Command Line & Terminal

All course work happens in the terminal. Deployment scripts, Docker commands, Python virtual environment management, running local servers, managing cloud infrastructure with the AWS CLI — none of this has a reliable GUI equivalent. You need to be genuinely comfortable navigating the file system, running scripts, reading command output, and understanding what a command is doing before you run it.

On macOS and Linux you will use Bash or Zsh. On Windows, you should use WSL2 (Windows Subsystem for Linux) — the course's shell commands assume a Unix-like environment, and running natively on Windows PowerShell will cause friction. The commands you must know: cd, ls, pwd, mkdir, rm, cp, mv, cat, grep, echo, export, ps, kill, and piping with |. You should also understand how to run a Python script (python main.py), redirect output (> and >>), and run a server in the background (& at end of command, or a second terminal tab).

06

Virtual Environments & pip

Python's package ecosystem is vast, and different projects need different (often conflicting) versions of the same library. Virtual environments solve this by creating an isolated Python installation per project, so the packages you install for this course do not interfere with your system Python or other projects. This is not optional hygiene — it is a prerequisite for the course to run correctly on your machine.

The course uses venv (built into Python) or conda. Each module will have a requirements.txt listing its dependencies. You install them once per project with pip install -r requirements.txt. When something breaks unexpectedly, the first debugging step is almost always: "am I in the right virtual environment?" You should be able to create, activate, and deactivate virtual environments, install packages, and list installed packages with pip freeze.

# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate      # macOS/Linux
# .venv\Scripts\activate        # Windows

# Verify you're using the right Python
which python   # should show .venv/bin/python

# Install course dependencies
pip install -r requirements.txt

# Freeze current versions (for reproducible installs)
pip freeze > requirements.txt

07

NumPy & Pandas Basics

NumPy and Pandas are the two most important data libraries in Python. NumPy provides fast multi-dimensional arrays and mathematical operations. Pandas provides DataFrames — tabular data structures with named columns and powerful query/transform operations. In the context of this course, these libraries show up in specific ways that are worth understanding before you arrive.

The most critical NumPy concept for this course is the embedding vector. When an LLM converts a text chunk into its numeric representation (for storage in a vector database), the result is a Python list of floats — which is represented as a NumPy array. Operations like cosine similarity (computing how similar two text chunks are) are pure NumPy: dot products, norms, and element-wise arithmetic. You will not usually write these from scratch, but you need to read and understand code that does. A working understanding of NumPy arrays — shape, dtype, broadcasting, and the common operations — is sufficient.

Pandas is used for: loading datasets for fine-tuning, processing evaluation results from RAGAS (the RAG evaluation framework), reading CSV/JSON data, and basic exploratory analysis. You should know how to create a DataFrame, filter rows, select columns, apply a function to each row, handle missing values, and write to CSV. If you have used Pandas for any data analysis project before, you are likely already at the right level.

import numpy as np
import pandas as pd

# Embedding vectors are just numpy arrays
emb_a = np.array([0.2, 0.8, -0.3, 0.5])
emb_b = np.array([0.1, 0.9, -0.2, 0.4])

# Cosine similarity — the similarity metric used in RAG
cosine_sim = np.dot(emb_a, emb_b) / (np.linalg.norm(emb_a) * np.linalg.norm(emb_b))
# → ~0.998 (nearly identical vectors = very similar text)

# Pandas for RAG evaluation results
results = pd.DataFrame({
    "question": ["What is RAG?", "How does chunking work?"],
    "answer": ["RAG is...", "Chunking splits..."],
    "faithfulness": [0.92, 0.78],
    "relevance": [0.88, 0.91],
})

# Filter low-quality results
poor = results[results["faithfulness"] < 0.8]
print(poor[["question", "faithfulness"]])

08

Neural Networks — Conceptual Understanding

You do not need to implement a neural network from scratch. You do not need to understand backpropagation derivations or the intricacies of gradient descent. What you need is a clear mental model of what a neural network is and how it relates to the LLMs you will be working with. Without this, Module 01 (Foundations of Modern GenAI) will be confusing, and concepts like "attention layers," "weight updates," and "inference vs. training" will feel like opaque jargon.

A neural network is a mathematical function with a very large number of adjustable parameters called weights. During training, you feed it examples (inputs with known correct outputs), compute a measure of how wrong the network's output was (the loss), and slightly adjust the weights to make the output less wrong. Repeat this millions of times on billions of examples, and the network learns to generalize — to produce reasonable outputs for inputs it has never seen.

An LLM is a specific type of neural network called a Transformer. It was trained on an enormous amount of text. Its "task" during training was simple: predict the next token (roughly, the next word) given the preceding tokens. After training on enough text, the network learned not just word frequencies but semantics, syntax, reasoning patterns, and factual associations. When you call an LLM API, you are running inference — passing your prompt through the fixed, trained weights to generate the most likely continuation. No learning happens at inference time; the weights are frozen.

Fine-tuning (Module 04) is the exception: you are resuming training on a smaller, task-specific dataset to adjust the weights toward a specific behavior or domain. Understanding the distinction between inference (weights fixed, fast, cheap) and training/fine-tuning (weights updated, slow, expensive) is essential context for every architectural decision the course will ask you to make.

09

JSON — Reading, Writing & Nested Structures

JSON (JavaScript Object Notation) is the universal data format for web APIs. Every LLM API request and response uses JSON. Every vector database stores metadata as JSON. Every agent tool call is described in JSON. Every RAG pipeline configuration is often serialized as JSON. You will read and write JSON constantly — both as Python data structures and as raw text that needs to be parsed or serialized.

The Python json module converts between Python dictionaries/lists and JSON strings. json.loads() parses a JSON string into Python objects. json.dumps() serializes Python objects into a JSON string. JSON objects map to Python dict; JSON arrays map to list; JSON strings map to str; JSON numbers to int or float. The specific patterns the course uses most: deeply nested JSON (LLM responses have a nested structure like response["content"][0]["text"]), JSON with optional keys (use .get()), and JSON arrays of objects (the messages list in chat APIs: [{"role": "user", "content": "..."}]).

import json

# Typical LLM API response (nested JSON)
response_json = """
{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {"type": "text", "text": "The capital of France is Paris."}
  ],
  "model": "claude-sonnet-4-6",
  "usage": {"input_tokens": 14, "output_tokens": 9}
}
"""

data = json.loads(response_json)

# Navigating nested structure
text = data["content"][0]["text"]    # "The capital of France is Paris."
tokens_used = data["usage"]["input_tokens"] + data["usage"]["output_tokens"]

# Chat messages format (used in EVERY LLM API call)
messages = [
    {"role": "system",  "content": "You are a helpful assistant."},
    {"role": "user",    "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user",    "content": "What about Germany?"},
]
# This is the conversation history you maintain when building chatbots

10

Tools Checklist for Day One

Install and verify these tools before starting. Each item below includes the verification command so you can confirm everything is working.

🐍

Python 3.11+

Verify: python --version. Must be 3.11 or newer. Some libraries require 3.11+ for type annotation syntax.

📦

pip & venv

Verify: pip --version and python -m venv --help. Both are bundled with Python 3.11+.

🔑

API Keys Set Up

Get keys from: OpenAI Platform, Anthropic Console, Groq Console, OpenRouter. Store in .env file.

☁️

AWS CLI

Install: pip install awscli. Verify: aws --version. Configure with aws configure.

📒

Jupyter

Install: pip install jupyter ipykernel. Verify: jupyter notebook opens in browser.

🐙

Git 2.40+

Verify: git --version. Configure: git config --global user.name and user.email.

Core Python Packages

# Run this in your project virtual environment
pip install openai anthropic google-generativeai
pip install litellm tiktoken
pip install langchain langchain-openai langchain-anthropic
pip install numpy pandas matplotlib scikit-learn
pip install chromadb qdrant-client
pip install fastapi uvicorn python-dotenv httpx
pip install boto3 sagemaker
pip install jupyter ipykernel
pip install pydantic ragas

# Verify key packages loaded correctly
python -c "import anthropic, openai, langchain, chromadb, fastapi; print('All OK')"

Note on 101.ipynb

The course repository includes a 101.ipynb notebook that covers NumPy, Pandas, and Python fundamentals interactively. Work through it before starting — it is calibrated to the exact level of knowledge the course assumes.

Python 3.11+ installed and accessible as python or python3
At least one virtual environment created and activated successfully
All core packages installed (verify with the command above)
API keys for OpenAI and Anthropic stored in a .env file
Git configured with your name and email; at least one repository cloned
AWS CLI installed and configured with your credentials
Jupyter notebook opens in browser without errors
You can make a successful API call to at least one LLM provider

🎯

Interview Ready

How to Explain This in 2 Minutes

Elevator Pitch

Building production GenAI systems is fundamentally a software engineering discipline, not a research exercise. Before you ever touch a large language model, you need fluency in the tools that surround it — Python for orchestration, REST APIs for communicating with model endpoints, environment variables for secrets management, Git for version control, and libraries like NumPy and Pandas for data manipulation. You also need a conceptual grasp of neural networks — not enough to derive backpropagation, but enough to understand why a model produces embeddings, what a softmax layer does, and why temperature affects output randomness. These aren't nice-to-haves; every single module in a GenAI pipeline assumes you can write async Python, parse JSON responses, manage virtual environments, and navigate a terminal without hesitation. Getting these foundations right means you spend your time solving actual AI engineering problems instead of fighting your toolchain.

Likely Interview Questions

Question	What They're Really Asking
Why do GenAI projects use async Python, and when would you choose it over synchronous code?	Do you understand I/O-bound vs CPU-bound workloads and how LLM API calls benefit from concurrency?
How do you securely manage API keys in a project that calls multiple LLM providers?	Do you follow production security practices or do you hardcode secrets?
Explain what a virtual environment is and why it matters in ML/AI projects.	Can you manage dependency isolation and avoid the "works on my machine" problem?
What is a neural network embedding, and why does it matter for tasks like search or RAG?	Do you have the conceptual ML foundation to understand vector representations?
Walk me through how you would consume a streaming REST API response from an LLM endpoint.	Can you integrate with real-world LLM APIs beyond just calling a wrapper library?

Model Answers

1. Why async Python for GenAI? — LLM API calls are I/O-bound — you send a prompt and wait hundreds of milliseconds to seconds for a response. With synchronous code, your program blocks during that wait. Using async/await, you can fire off multiple LLM calls concurrently with asyncio.gather(), dramatically improving throughput. This matters in production when you need to process batches of documents or run parallel chains. You would not use async for CPU-bound work like heavy NumPy computation — that needs multiprocessing instead.

import asyncio
import httpx

async def call_llm(client, prompt):
    resp = await client.post("/v1/chat/completions", json={"prompt": prompt})
    return resp.json()

async def main():
    async with httpx.AsyncClient(base_url=BASE_URL) as client:
        results = await asyncio.gather(*[call_llm(client, p) for p in prompts])

2. Secure API key management — Never hardcode keys. Store them in a .env file that is listed in .gitignore, then load them at runtime with python-dotenv. In production, use a secrets manager like AWS Secrets Manager or environment variables injected by your deployment platform. When calling multiple providers (OpenAI, Anthropic, Cohere), each key gets its own variable — OPENAI_API_KEY, ANTHROPIC_API_KEY, etc. — and your code reads them via os.getenv() with a clear error if any are missing.

import os
from dotenv import load_dotenv

load_dotenv()  # reads .env into os.environ

api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    raise EnvironmentError("OPENAI_API_KEY not set")

3. Virtual environments — A virtual environment is an isolated Python installation with its own site-packages. AI projects are particularly sensitive to dependency conflicts — PyTorch, transformers, LangChain, and dozens of other packages each pin specific versions. Without isolation, installing one project's dependencies can break another's. I create a venv per project using python -m venv .venv, activate it, then pin dependencies with pip freeze > requirements.txt so that any team member can reproduce the exact environment.

4. Neural network embeddings — An embedding is a dense vector representation of input data — text, images, or anything else — produced by a neural network's internal layers. Instead of treating words as discrete symbols, embeddings place semantically similar content near each other in vector space. This is the foundation of RAG systems: you embed documents and queries into the same space, then retrieve the closest matches using cosine similarity or dot product. Without understanding embeddings, you cannot reason about why retrieval works, why chunk size matters, or how to debug poor search results.

5. Consuming a streaming REST response — LLM endpoints often support server-sent events (SSE) so tokens arrive incrementally. You make a POST request with "stream": true, then iterate over the response line by line. Each line prefixed with data: contains a JSON chunk with the next token. This lets you display output to users in real time rather than waiting for the entire completion. In Python, httpx or requests both support streaming iteration.

import httpx

with httpx.stream("POST", url, json={"stream": True, **payload}) as resp:
    for line in resp.iter_lines():
        if line.startswith("data: "):
            chunk = json.loads(line[6:])
            print(chunk["choices"][0]["delta"]["content"], end="")

System Design Scenario

Design Challenge

You are setting up the development environment for a new team of five engineers who will build a multi-model GenAI application that calls OpenAI, Anthropic, and a self-hosted model. The app processes financial documents using Pandas, stores embeddings in a vector database, and deploys on AWS. Design the project structure, environment management, and secrets handling strategy.

A strong answer should cover:

Monorepo vs multi-repo — a single repo with clear package boundaries, a shared pyproject.toml, and Git branching strategy
Dependency management — pinned requirements per service, separate venvs or Docker containers, and a CI step that validates environment reproducibility
Secrets architecture — .env files for local dev (gitignored), AWS Secrets Manager for production, and a config module that validates all keys at startup
Data pipeline tooling — Pandas for preprocessing, NumPy for numerical operations, and a clear boundary between data prep code and LLM orchestration code
Onboarding automation — a Makefile or setup script that creates the venv, installs dependencies, copies .env.example to .env, and runs a smoke test against each API

Common Mistakes

Committing API keys to Git — Even if you delete them in a later commit, they remain in history. Always use .env files with .gitignore, and rotate any key that has ever been committed.
Confusing async with parallel — asyncio runs in a single thread and only helps with I/O waits. CPU-heavy work like tokenization on large datasets needs multiprocessing or offloading to a worker. Candidates often claim async "makes things parallel" — it does not.
Installing packages globally instead of in a virtual environment — This leads to version conflicts that surface as mysterious import errors. Every project should have its own isolated environment, especially in AI work where library version mismatches can silently change model behavior.