aryem.dev

An Agentic AI Tutor: Adaptive Learning Paths with Mem0 and Generative UI

Most “AI tutors” today are chat windows with a system prompt that says “you are a helpful tutor.” That’s fine for one session. It falls apart on the second one — when the learner comes back, the model has forgotten everything it knew about them, and the next lesson starts from zero.

We wanted something different at EG-Labs: a tutor that remembers what a learner knows, adapts the next lesson to what they don’t, and renders interactive UI inline instead of just streaming text. Three architectural decisions shaped the product:

  1. Long-term agent memory via Mem0 — so the tutor accumulates a learner model across sessions
  2. Generative UI via JSON-streamed React components — so lessons aren’t walls of text
  3. A planning loop, not a chat loop — the agent decides what to teach next, not the user

Here’s the design and what we learned shipping it.

The problem with chat-shaped tutors

A chat-shaped tutor has two fatal limits.

Memory is a context-window hack. You can stuff prior conversation into the prompt, but that scales to maybe a dozen sessions before token cost and latency become unbearable. And it’s not actually a learner model — it’s a transcript. The model can’t query it (“has this learner mastered subjunctive?”) without re-reading the whole thing.

Text is the wrong primitive. A learner who’s stuck on quadratics doesn’t need a wall of text explaining the discriminant; they need a slider that lets them drag b and watch the parabola shift, then a question, then a hint, then a check. Chat UIs reduce all of that to bullet points and code blocks.

We needed memory that was a model of the learner, not a transcript. And we needed a UI layer that the agent could compose, not just narrate.

Memory: Mem0 as the learner model

Mem0 is the right shape for this. Instead of dumping conversation history, it extracts and stores structured facts and preferences as memories tied to a user_id, then retrieves the relevant subset at query time.

In our setup, every tutor turn writes back to Mem0 in three buckets:

await mem0.add({
  userId: learner.id,
  memories: [
    // What the learner has demonstrated they know
    { type: "mastery", concept: "linear_equations", confidence: 0.85 },

    // What they got wrong, and how
    { type: "misconception",
      concept: "fraction_addition",
      pattern: "adds numerators and denominators independently" },

    // What the learner *prefers* — pacing, depth, examples
    { type: "preference",
      key: "explanation_style",
      value: "shows worked example before formula" },
  ],
});

When the next session starts, the agent doesn’t see “here’s the chat history.” It sees:

const context = await mem0.search({
  userId: learner.id,
  query: `Plan next lesson for ${currentTopic}`,
});
// → returns top-k memories relevant to the next decision

This is the single biggest unlock in the architecture. The agent’s working context is now a concise learner model — a few hundred tokens of structured facts — instead of an ever-growing transcript. Cost is bounded. Reasoning is sharper. And the same model can answer “should we revisit fractions?” because the misconception is in the retrieved set.

Memory hygiene

A few non-obvious things we got wrong before getting right:

Don’t write a memory for every turn. Early on we wrote whatever the model surfaced. Mem0 grew quickly with low-signal junk like “learner asked a follow-up question.” We now have a small judge model that decides whether a turn produced anything durable — a new fact, a corrected misconception, a stable preference — and only those get written.

Confidence decays. A “mastery” memory from three months ago shouldn’t weight the same as one from yesterday. We attach an assessed_at timestamp and decay confidence on a half-life when retrieving for plan decisions. This is the difference between a tutor that thinks the learner is good at fractions and one that thinks the learner was.

Resolve contradictions. When new evidence contradicts an old memory (“learner just got 4/4 on linear equations after previously struggling”), we explicitly merge — old memory annotated as superseded, new one written. Mem0 handles this if you ask it to; the trick is asking.

Generative UI: components as agent output

The tutor doesn’t just stream text. It streams structured plans for UI that the frontend renders as live React components.

We considered two paths:

We went with the second.

The schema

Every “block” the tutor emits matches one of a small set of schemas:

type LessonBlock =
  | { kind: "explanation"; markdown: string }
  | { kind: "interactive_chart"; equation: string; controls: Slider[] }
  | { kind: "worked_example"; steps: Step[]; reveal: "tap" | "auto" }
  | { kind: "check"; question: string; choices: string[]; correct: number; hint?: string }
  | { kind: "free_response"; prompt: string; rubric: Rubric }
  | { kind: "summary"; key_points: string[]; mastery_delta: ConceptDelta[] };

The agent’s structured output is a sequence of these. The frontend renders them, streams the next one in as it arrives, and the user interacts with each in turn.

Streaming and rendering

We use the Vercel AI SDK’s streamObject to emit blocks one at a time. Each completed block is rendered immediately — the user starts seeing the explanation while the chart is still being generated.

const { partialObjectStream } = streamObject({
  model: tutorModel,
  schema: lessonBlockSchema,
  prompt: planNextBlock(learner, lastInteraction),
});

for await (const block of partialObjectStream) {
  // The block may be partially complete — render what we have
  appendToLesson(block);
}

Two pleasant surprises:

  1. The schema disciplines the model. With the JSON contract enforced by the SDK, the model produces fewer “I’m sorry, I can’t render an interactive chart” hedges. The contract says it can; the model figures it out.
  2. The frontend stays dumb. We have ~7 component types and a renderer that maps kind to component. Adding a new block type is one component and one schema entry. No agent logic changes.

What I’d avoid

Don’t let the agent author HTML. We tried this once. It was a mistake. HTML is too expressive — the model will inject inline styles, attempt scripts, drift away from the design system. JSON with a constrained vocabulary is the right level.

Don’t make every block interactive. Early prototypes had the agent insert a chart or quiz every other block because we’d implied that interactivity = quality. Learners hated it. The pacing — explanation → check → explanation → worked example → check — matters more than the diversity of UI types.

The planning loop

Underneath the memory and the UI is the actual decision the agent makes: what should this learner do next?

A naive loop reacts to the previous answer:

loop:
  user_answer = await user_response()
  next_block = LLM("respond appropriately")

A planning loop projects forward:

plan = LLM(
  "Given the learner model, the current topic, and the lesson goal,
   produce the next 3-5 blocks. Each block should advance toward the
   goal. After each interactive block, decide whether to continue,
   re-explain, or branch into a prerequisite."
)

for block in plan:
  emit(block)
  if block.kind == "check":
    result = await user_response()
    update_learner_model(result)
    if result.suggests_revisit:
      plan = replan_from(current_block, learner_model)

The plan is a committed direction the agent can change when evidence demands it. This produces lessons that feel coherent — there’s a destination — without being rigid: a misconception revealed at block 3 reroutes blocks 4 and 5.

What I’d do differently

Three honest critiques of where we landed:

1. Memory eval is hard. It’s straightforward to evaluate “is this answer correct” or “did the agent stay in persona.” It’s much harder to evaluate “did the tutor make the right decision given this learner model” — because the right answer depends on counterfactual learner trajectories. We don’t have a great answer here yet. Currently: hand-labeled tutor sessions, expert review, slow.

2. Cold-start is awkward. A new learner has no memories. The first session has to be calibration-heavy, but if it’s too assessment-shaped it feels like a placement test, not a tutor. We’re still tuning.

3. The agent thinks in topics; learners think in goals. A learner who says “I want to pass my exam in two weeks” doesn’t want a topic walk — they want a curriculum. We have a separate curriculum-planner that emits a topic sequence, which the tutor agent then traverses. Splitting “what to learn” from “how to learn it” was a late, valuable refactor.

The takeaway

Three pieces, each independently boring, that compose into something interesting:

The fashionable bits — RAG, agents, multimodal — are downstream of these decisions. If the memory is a transcript and the UI is text, you’ve built a chatbot with extra steps. Get the memory shape and the output shape right first.