Chapter 1in progress

How ChatGPT Understands Your Questions? (Draft / Notes)

Updated July 1, 20265 min read

⚠️ This is a raw notes / draft version. I'm writing this to keep up with the ChaiCode GenAI cohort deadline and maintain my streak — this is NOT the final article. A proper, detailed, well-structured version of this piece is coming soon. Treat this as thinking-out-loud, not a polished blog post. If something reads rough or incomplete, that's intentional — it's a draft.

Why I'm even writing this

Every time I type something into ChatGPT and it replies in like two seconds with something that actually makes sense — I used to just accept that as magic. But "magic" is a lazy answer. If I'm going to call myself someone who builds with AI, I should at least understand what's happening between me hitting Enter and the response showing up. So — notes on that.

1. What is an LLM?

LLM = Large Language Model.

Break the phrase down instead of memorizing it:

Language → it deals with text/words
Model → it's a mathematical structure that's learned patterns from data
Large → because it's trained on a massive amount of text and has billions of parameters (knobs it can tune)

Problem it solves: Before LLMs, computers were bad at understanding loosely structured human language — sarcasm, context, ambiguity, "what did they mean by that." LLMs are trained to predict language well enough that they can hold a conversation, summarize, translate, write code, etc.

Popular examples: GPT (OpenAI), Claude (Anthropic), Gemini (Google), LLaMA (Meta), Mistral.

Daily life applications:

Chatbots / assistants (ChatGPT, Claude)
Autocomplete / writing help (Grammarly-ish tools, Notion AI)
Code generation (Cursor, Copilot)
Customer support bots
Summarizing long docs/emails

Note to self: expand this section later with a proper "why now" — why did LLMs suddenly become good around 2022–23 and not earlier. Transformers is the answer but I want to explain the timeline better in the final version.

2. What Happens When You Send a Message to ChatGPT?

Rough flow, step by step:

You type a prompt. Just plain text from your side.
It gets tokenized. Your sentence is broken into smaller chunks (tokens — more on this below) because the model doesn't read words, it reads numbers.
The model processes it. It runs your tokens through its neural network (a Transformer — section 5) and figures out, statistically, what the most likely next token should be, then the next, then the next.
Response is generated token by token. This is why sometimes you can literally see it "typing" — it's generating one token at a time, not printing a pre-written answer.

Why responses aren't copied from the internet:

This is the part people get wrong the most. The model isn't searching a database and pasting a match. It learned patterns of language during training — grammar, facts, reasoning style — and then it's generating a brand new sequence of tokens based on probability. It's less "copy-paste" and more "predict what comes next, over and over, based on everything I've learned."

[DIAGRAM IDEA 1: User → Prompt → LLM → Response]

3. Why Computers Don't Understand Human Language

Computers, at the core, only understand numbers — specifically binary (0s and 1s). Text is just symbols to a computer unless it's converted into some numeric representation first.

Text vs numbers: "Hello" means nothing to a CPU. But a numeric ID or vector representing "Hello" — that, it can do math on.
Why numbers are needed: Because everything a model does under the hood is math — matrix multiplications, probability calculations. You can't multiply the word "cat" by 0.7. You can multiply a number that represents "cat."
So the very first job before any "understanding" happens is: convert text into numbers.

That's where tokens come in.

4. Tokenization

What tokens are: Small pieces of text — could be a whole word, part of a word, or even a single character/punctuation mark — that get mapped to a number (token ID).

Why it's needed: Because the model needs a fixed vocabulary of numeric chunks to work with. It can't handle raw text directly. Tokenizing text is the translation step from human language → machine-readable format.

Words vs Tokens:

Not always 1 word = 1 token.
Common short words might be 1 token.
Longer/rarer words often get split into multiple tokens.

Simple example (rough, illustrative):

"I love coding" → might become tokens like: I, love, coding (3 tokens, close to word-level)
"unbelievable" → might get split into un, believ, able (subword tokens)

The point is: tokenization isn't always clean word boundaries. That's why LLMs sometimes struggle with things like counting letters in a word — they're not "seeing" letters the way we do, they're seeing tokens.

[DIAGRAM IDEA 2: Text → Tokens → Transformer → Response]

Note to self: get an actual tokenizer example (tiktoken or similar) and show real token splits in the final version instead of guessing.

5. Transformers

What a Transformer is: The neural network architecture that basically every modern LLM is built on. Introduced in the 2017 paper "Attention Is All You Need."

Why it changed AI: Before Transformers, models processed text mostly sequentially (one word after another, remembering context step-by-step) which made it hard to handle long-range relationships in text efficiently. Transformers introduced the attention mechanism — a way for the model to look at all tokens in the input at once and figure out which ones matter most to each other, regardless of distance in the sentence.

How it helps understand language:

Attention lets the model figure out relationships like — in "The trophy didn't fit in the suitcase because it was too big" — does "it" refer to the trophy or the suitcase? That kind of contextual disambiguation is exactly what attention is good at solving.

Why almost every modern LLM uses Transformers:

Because it parallelizes well (fast to train on GPUs), scales well with more data/parameters, and captures long-range context way better than older architectures like RNNs/LSTMs.

[DIAGRAM IDEA 3: Context window visualization]

[DIAGRAM IDEA 4: Low Temperature vs High Temperature output comparison]

[DIAGRAM IDEA 5: Complete high-level LLM workflow]

Note to self: this section deserves the most expansion in the final version — attention mechanism deserves its own mini-breakdown with a visual, and I want to actually explain temperature/context window properly instead of just leaving them as diagram placeholders.

Closing (draft note)

This whole piece is basically me thinking through: text → numbers → tokens → Transformer → prediction → response. That's the skeleton. The real article will flesh out attention, temperature, context windows, and actual tokenizer examples with proper diagrams instead of placeholder brackets.

Note to self: in the final article, I have to include all the resources — videos, articles, blogs, and any other sources I actually used to understand and write this — under a proper References section. Right now this draft has none listed, need to go back through my notes/history and compile them properly.

Again — this is a notes/draft version. Full detailed article coming soon.