LLMs in practice: capabilities, limits, hallucinations

Classify which tasks an AI assistant is reliably strong at (summarizing, rewriting, translating, reformatting, first drafts) so you hand it work it can actually do well.

Time: 20–25 min
Type: exercise
Bloom: Apply → Create
XP: 100

Concept architecture for LLMs in practice: capabilities, limits, hallucinations — Lesson 2.3 — concept architecture

You'll be able to

Classify which tasks an AI assistant is reliably strong at (summarizing, rewriting, translating, reformatting, first drafts) so you hand it work it can actually do well.
Diagnose the predictable ways it fails so you catch them before they cost you: hallucination (confident, well-written, and wrong), stale knowledge (it does not know what happened after it was trained), and losing track of long documents.
Apply those limits to how you use the tool based on those limits: trust it on fluency, verify it on facts, and never let a confident tone stand in for being correct.

Key concepts · tap to reveal

1/15·Watch·Beat 1 · Hook

Hook

You paste an AI answer into a client document and send it. Two days later the client writes back: the regulation it cited does not exist. The model said it in the same confident voice as everything else. Why does it sound so sure when it is wrong, and what does that change about how you use it?

Prompt Labruns here · claude

Your task Write a prompt that asks Claude to recommend the right AI setup for a real task you're facing — then weigh its answer against this lesson, "LLMs in practice: capabilities, limits, hallucinations."

a strong prompt:role · context · task · format · example

⌘↵ to run

Create a clean diagram contrasting where an AI assistant is reliable versus where it fails, for a professional audience. Show a central split: on one side, a 'Trust it' lane labeled with language tasks the tool is strong at (summarize, rewrite, trans — Diagram · generated brief

Exercise · scenario

A regional hospital system is deploying an LLM-powered chatbot to answer patient questions about medication side effects and drug interactions. During testing, the bot occasionally provides confident-sounding responses about rare drug combinations that contradict the hospital's pharmaceutical database. When developers check the training data, they find no specific information about these combinations. The bot generates plausible-sounding medical terminology and citation formats that don't correspond to real studies.

Deliverable

Add a page to your AI Fluency Playbook called **Where I Trust It, Where I Check It**. Catalog three tasks from your own work where you use, or could use, an AI assistant. For each one, write down: (1) the task in plain words (summarize, draft, translate, look up a fact, calculate), (2) whether it sits in the tool's strength or its blind spot, (3) the failure you would watch for if it is in the blind spot (hallucination, stale knowledge, or losing the thread on long material), and (4) the one verification step you will run before that output goes anywhere with your name on it.

Reveal model answer

Hallucination due to pattern completion without grounding

Practice · Scenarios

0 of 8 revealed

Scenario 1 of 8

An e-commerce platform uses an LLM to generate product descriptions from manufacturer specifications. The system excels at creating engaging copy for electronics and apparel, transforming technical specs into customer-friendly language. However, when tasked with creating original product comparison charts or calculating shipping cost optimization across multiple warehouses, the model produces inconsistent numerical results and logical errors in multi-step reasoning, even when the calculations are explicitly shown in the prompt.

Step 1 · Classify

Context window too small for mathematical operationsHallucination in numerical content generationFundamental limitation in mathematical and logical reasoningToken encoding issues with numerical data

Sources

[1]Frontiers of Computer Science·A Survey of Large Language Models (2026) · Research
[2]arXiv·Mitigating Hallucination in Large Language Models: An Application-Oriented Survey on RAG, Reasoning, and Agentic AI (2025) · Research

Capstone artifact · auto-graded

Submit your work for review

Paste your capstone artifact below. You'll get back a 4-level rubric grade, per-criterion feedback, and three concrete edits to strengthen it.

0 chars · minimum 50