NLU in AdaOS

This document describes the current production MVP direction for intent detection in AdaOS.

MVP baseline

Pipeline: regex -> rasa (service-skill) -> teacher (LLM in the loop)
System boundary: NLU runtime code is one; only data varies per scenario/skill.
Transport: intent detection is integrated into AdaOS event bus (not CLI-only).

Event flow (high level)

UI / Telegram / Voice publishes:
nlp.intent.detect.request { text, webspace_id, request_id, _meta... }
nlu.pipeline tries regex rules:
built-in rules (nlu.pipeline)
dynamic rules loaded centrally from:
- workspace scenarios (scenario.json:nlu.regex_rules)
- workspace skills (skill.yaml:nlu.regex_rules)
- legacy per-webspace cache (data.nlu.regex_rules)
If regex does not match:
nlp.intent.detect.rasa is emitted (delegates to the Rasa service skill)
If an intent is found:
nlp.intent.detected { intent, confidence, slots, text, webspace_id, request_id, via }
If intent is not obtained:
nlp.intent.not_obtained { reason, text, via, webspace_id, request_id }
Router emits a human-friendly io.out.chat.append and records the request for NLU Teacher.
If teacher is enabled:
nlp.teacher.request { webspace_id, request } is emitted for teacher runtimes.

Rasa as a service-skill

Rasa is treated as a service-type skill (separate Python/venv, managed lifecycle) to avoid dependency conflicts with the hub runtime.

The hub supervises:

health checks
crash frequency
request failures/timeouts

Issues can trigger:

skill.service.issue
skill.service.doctor.request -> skill.service.doctor.report (LLM doctor can be plugged later)

Teacher-in-the-loop (LLM)

When regex and rasa do not produce an intent, AdaOS calls an LLM teacher to:

propose a dataset revision (existing intent + new examples + slots), or
propose a regex rule to improve the regex stage, or
propose a new capability (skill / scenario candidate), or
decide to ignore (non-actionable).

Teacher receives scenario + skill context, including:

current scenario NLU (scenario.json:nlu)
installed catalog (apps/widgets + origins)
existing dynamic regex rules (from scenarios/skills + legacy per-webspace cache)
built-in regex rules (nlu.pipeline)
selected skill-level NLU artifacts (e.g. interpreter/intents.yml)
intent routing hints (intent_routes: scenario intent -> callSkill topic -> skill)
system/host actions catalog (system_actions, host_actions)

Teacher state is projected into YJS under data.nlu_teacher.* for UI inspection, and also persisted on disk under .adaos/state/skills/nlu_teacher/<webspace_id>.json so it survives YJS reload/reset.

Web UI: NLU Teacher

In the default web desktop scenario the NLU Teacher UI is a schema-driven modal:

Tabs: User requests / Candidates
Grouping:
User requests: grouped by request_id
Candidates: grouped by candidate.name, then by request_id
Logs: groups show event payloads inline (raw JSON)
Apply actions:
nlp.teacher.revision.apply
nlp.teacher.candidate.apply:
- for regex_rule candidates: persists the rule into a workspace owner (preferably a skill), then mirrors into data.nlu.regex_rules as a runtime cache so the next request matches immediately (via="regex.dynamic")
- for skill/scenario candidates: creates a development plan item
a successful apply emits ui.notify with the owner (skill/scenario) where the rule was installed

Dynamic regex rules (current contract)

Storage (source of truth):
skill: .adaos/workspace/skills/<skill>/skill.yaml → nlu.regex_rules[]
scenario: .adaos/workspace/scenarios/<scenario>/scenario.json → nlu.regex_rules[]
Rule identity:
every rule has id="rx.<uuid>"
Observability:
every regex.dynamic match appends a JSONL record into state/nlu/regex_usage.jsonl (webspace_id, scenario_id, rule_id, intent, slots…)
Optional trust policy:
skill.yaml: llm_policy.autoapply_nlu_teacher=true enables automatic Apply for teacher-proposed regex candidates targeting that skill

Later (not MVP)

Rhasspy / offline NLU
Retriever-style NLU (graph/context retrieval)
Multi-step, stateful NLU workflows across scenarios