NLU Teacher (LLM) MVP

This document describes the minimal teacher-in-the-loop implementation for AdaOS NLU.

Pipeline (MVP)

Router emits nlp.intent.detect.request (text + webspace_id + request_id).
nlu.pipeline tries:
built-in + dynamic regex (fast, deterministic)
if not matched -> delegates to Rasa service (nlp.intent.detect.rasa)
If intent is found -> nlp.intent.detected { via: "regex" | "regex.dynamic" | "rasa" }.
If intent is not obtained -> nlp.intent.not_obtained { reason, via, ... }.
Teacher bridge reacts to nlp.intent.not_obtained and emits:
nlp.teacher.request { webspace_id, request }
Teacher runtimes store state for UI inspection (YJS, per webspace):
data.nlu_teacher.events[] (includes llm.request / llm.response)
data.nlu_teacher.candidates[] (regex rules / skill candidates / scenario candidates)
data.nlu_teacher.revisions[] (proposed dataset revisions)
data.nlu_teacher.llm_logs[] (request/response logs; debugging)
Teacher state is also persisted on disk so it survives YJS reload/reset:
.adaos/state/skills/nlu_teacher/<webspace_id>.json

Set env vars on hub:

LLM teacher receives a compact context snapshot (per webspace), including:

current scenario id
scenario-level NLU (scenario.json:nlu)
catalog of apps/widgets (with origins) + installed ids
built-in regex rules (nlu.pipeline)
existing regex rules (from skills/scenarios + legacy per-webspace cache)
routing hints (intent_routes: scenario intent -> callSkill topic -> skill)
system actions visible in the current scenario (system_actions) and a published host action catalog (host_actions)
skill manifests (skills_manifest: tools/events/llm_policy summary for installed skills)

Goal: prefer improving existing intents (regex rule / dataset revision) over creating a new capability, when possible.

Apply can be triggered from UI or programmatically:

apply a proposed dataset revision:
nlp.teacher.revision.apply { revision_id, intent, examples[], slots }
apply a teacher candidate:
nlp.teacher.candidate.apply { candidate_id, target? }
for regex_rule candidates the runtime delegates to nlp.teacher.regex_rule.apply { intent, pattern, target? }

The teacher does not “bake” regexes into the hub code. A rule is stored as data owned by a workspace artifact:

Skill-owned (preferred): .adaos/workspace/skills/<skill>/skill.yaml → nlu.regex_rules[]
Scenario-owned: .adaos/workspace/scenarios/<scenario>/scenario.json → nlu.regex_rules[]
Legacy runtime cache: mirrored into YJS data.nlu.regex_rules[] so it starts matching immediately after Apply.

Every rule has a stable identity: id="rx.<uuid>".

When the teacher proposes a regex rule, it should also propose a storage target:

Prefer the skill that actually handles the intent (derived from scenario intent callSkill actions + skill events.subscribe).
If the intent triggers host/system behavior (callHost), the target is usually the scenario.

Apply supports a UI override (“Apply to Scenario”), in addition to an LLM-suggested target.

Skills can opt into automatic application of teacher-proposed regex rules:

If enabled and the candidate target is that skill, the hub auto-emits nlp.teacher.candidate.apply after a candidate is proposed.

Each time the dynamic regex stage matches, the hub appends a JSONL record to:

This is intended for later cleanup/optimization (identify dead rules, consolidate duplicates, etc.).

Utterance: Покажи температуру в Берлине

Assume built-in weather regex only matches погода / weather, so the regex stage misses the intent.

Expected teacher decision:

decision="propose_regex_rule"
regex_rule.intent="desktop.open_weather"
regex_rule.pattern should be a Python regex with named capture groups, e.g. (?P<city>...)
target should usually be the owning skill (e.g. {"type":"skill","id":"weather_skill"})

After you click Apply (UI emits nlp.teacher.candidate.apply):

the rule is persisted into the chosen owner (skill/scenario) and mirrored into data.nlu.regex_rules
the next time the same utterance is sent, nlu.pipeline should resolve it as via="regex.dynamic" without calling the LLM