NLU Roadmap Checklist

Current implementation estimate: 49% for the practical AdaOS NLU roadmap. The target architecture now treats Neural NLU as a default-installed provider, but the productionization checklist remains mostly open.

Phase 1: Baseline Runtime

[x] Regex-first pipeline with dynamic scenario/skill regex rules.
[x] Optional neural delegation event (nlp.intent.detect.neural) behind ADAOS_NLU_NEURAL.
[x] Rasa NLU service-skill isolated from the hub Python environment.
[x] Rasa service-skill prepared in A/B skill runtime slots.
[x] Confidence/fallback path to nlp.intent.not_obtained.
[x] Baseline desktop intents for opening modals and node-scoped modals.
[ ] Remove runtime-provider delivery through src/adaos/interpreter_data.
[ ] Ensure parse bridges only discover/start installed service skills and do not mutate workspace skills or A/B slots on demand.

Phase 2: Operator Feedback Loop

[x] NLU Teacher stores not-obtained requests per webspace.
[x] Teacher can apply regex candidates into scenario/skill-owned artifacts.
[x] Teacher can apply dataset revisions into scenario training content.
[x] Dry-run phrase probe API for Teacher UI:
POST /api/nlu/teacher/{webspace_id}/probe
regex-first, optional Rasa fallback
returns intent_ranking, entities, slots, stages
does not dispatch actions
[x] Human verification checklist separates current API/CLI checks from target UI behavior.
[ ] UI field for "check phrase" wired to the probe endpoint.
[ ] UI buttons: "correct", "fix", "save example".
[ ] Operator-approved positive feedback stored with audit metadata.
[ ] Route accepted feedback to the owning artifact: skill, scenario, system action catalog, or named-entity source.
[ ] Add explicit correction targets for core/client actions that are not implemented as skills.

Phase 3: Observability

[x] data.nlu_trace.items[] stores request/detected/not-obtained events.
[x] Stage trace event nlu.trace.stage records:
request
regex
pipeline delegate
rasa
dispatcher action/reject
[ ] Trace UI should show voice text -> regex/neural/rasa -> intent -> action.
[ ] Add latency per stage and service timing.
[ ] Add golden phrase regression reports.
[ ] Add neural usage statistics: request count, latency, confidence histogram, accept/abstain/reject counts, fallback ratio, and per-intent confusion evidence.
[ ] Add named-entity canonicalization statistics: hit/miss/ambiguity counts and unresolved spans.

Human Verification Gates

[x] Current implemented behavior has a manual checklist: nlu-human-verification.md.
[x] Documentation marks which NLU Teacher behaviors are current UI, backend/API only, or target architecture.
[ ] NLU Teacher UI can run a phrase probe without terminal access.
[ ] NLU Teacher UI shows stage trace, ranking, entities, slots, lookup matches, confidence, and action preview.
[ ] NLU Teacher UI supports Correct/Fix/Save example with target selection and audit metadata.
[ ] Template correction flow uses stable ids and stale-write fingerprints.

Phase 4: Dynamic Lookups and Template Inventory

[x] Export baseline desktop lookup tables from workspace/packaged desktop manifests:
modal_id
node_ref
app_id
scenario_id
webspace_id
[x] Feed lookup tables into Rasa training data.
[x] Expose lookup tables for Teacher/LLM inspection:
GET /api/nlu/teacher/{webspace_id}/lookups
[x] Overlay live YJS desktop registry values on top of manifest lookups for Teacher API.
[ ] Expose stable template ids for regex, Rasa examples, neural labels, and lookup sets.
[ ] Implement stale-write protection using template fingerprints.
[ ] Define the system action catalog for core/client commands such as move, hide, open, pin, switch, and other shell actions.
[ ] Include system action examples in NLU authoring context without treating those actions as user skills.

Phase 4a: Runtime Named Entities and Canonicalization

[x] Add a named-entity read model over devices, nodes, browsers, webspaces, scenarios, skills, apps, and modals.
[x] Add a deterministic resolver that maps display names, observed names, and aliases to canonical refs before model dispatch.
[x] Add entity masking so model-facing text can use placeholders such as {device}, {webspace}, and {scenario}.
[x] Add ambiguity handling instead of silently choosing between conflicting aliases.
[x] Add Teacher/probe output for resolved entities, unresolved spans, canonical refs, and ambiguity evidence.
[x] Add regression tests proving alias and device-name changes do not require Rasa/neural retraining.
[x] Track the full target design in Named Entities and Canonical Naming.
[ ] Feed canonicalized text and entity evidence into the neural provider contract.
[ ] Ensure Rasa and neural training fingerprints exclude runtime aliases by default.

Phase 5: MCP-Assisted Authoring

[ ] MCP Server modal issues scoped NLU authoring token.
[ ] Root resolves token to subnet/zone/capabilities.
[ ] Root MCP surfaces:
nlu.describe_pipeline
nlu.check_phrase
nlu.list_templates
nlu.get_template
nlu.preview_template_patch
nlu.apply_template_patch
desktop.registry.lookup
skill.describe_tools
[ ] LLM receives current template inventory before proposing changes.
[ ] Template patches are previewed and operator-approved before apply.

Phase 6: Neural NLU Provider

Provider Boundary

[ ] Move neural_nlu_service_skill out of src/adaos/interpreter_data into normal registry/workspace skill delivery.
[ ] Add default-on adaos install preparation for Neural NLU.
[ ] Add --no-neural-nlu install option for constrained devices.
[ ] Make the neural bridge discover/start only installed service skills.
[ ] Remove hot-path workspace mutation/bootstrap from neural parse handling.
[ ] Keep provider dependencies (torch, faiss-cpu, etc.) out of the hub root venv.

Inference Contract

[ ] Freeze /parse request/response schema with top_intent, confidence, alternatives, slots, model_id, and evidence.
[ ] Pass named-entity canonicalization evidence into /parse.
[ ] Return matched examples, score components, and canonicalized text in evidence.
[ ] Add confidence gates for accept/abstain/reject.
[ ] Add neural abstain/error fallback to Rasa.
[ ] Route Rasa miss/low confidence to NLU Teacher.

Notebook Approach Port

[ ] Port masking logic into provider-owned runtime code.
[ ] Port Char-CNN + BiLSTM model loader.
[ ] Fix and test special-token compatibility between training and runtime.
[ ] Port supervised-contrastive embedding projection usage.
[ ] Add FAISS positive example index.
[ ] Add FAISS negative example indexes.
[ ] Add weighted ranker over softmax, k-NN similarity, and action/skill priors.
[ ] Add intent/action id mapping from research labels to AdaOS canonical intents and system actions.

Artifacts and ModelOps

[ ] Define node-level active model layout owned by the service skill runtime.
[ ] Store model.pt, labels.json/intents_manifest.json, vocab.json, faiss.index, examples_manifest.jsonl, ranker_config.json, and metrics.json.
[ ] Add immutable model_id and model provenance metadata.
[ ] Add rollback pointer for the node-level active model.
[ ] Add golden phrase regression report before model promotion.
[ ] Add quality gates using accuracy, macro-F1, abstain rate, and latency.
[ ] Defer per-locale/webspace/profile models until usage statistics justify the added operational complexity.

Usage Statistics

[ ] Record neural request count and latency per stage.
[ ] Record confidence distributions and threshold bands.
[ ] Record accept/abstain/reject counts per intent.
[ ] Record fallback ratio neural -> Rasa -> Teacher.
[ ] Record canonicalization hit/miss/ambiguity counts for neural requests.
[ ] Record abstained/rejected samples for Teacher review and retraining.

Training Data Feedback

[ ] Export skill-owned examples from skills.
[ ] Export scenario-owned examples from scenarios.
[ ] Export core/client command examples from the system action catalog.
[ ] Export named-entity classes as masks, not as local alias training data.
[ ] Let Teacher-approved corrections update regex, Neural, and Rasa datasets through the owning artifact.
[ ] Rebuild/reindex the neural provider from curated examples after approved changes.

Immediate Next Steps

Remove src/adaos/interpreter_data from the provider delivery path and document the migration path for existing experimental templates.
Add default-on Neural NLU install preparation plus a --no-neural-nlu escape hatch.
Port the full notebook ranker into the neural provider: masking, Char-CNN/BiLSTM, FAISS positives/negatives, priors, and evidence.
Define the system action catalog for core/client commands and include it in NLU authoring context.
Add neural usage statistics and stage latency before making rollout decisions about per-locale/webspace/profile models.
Wire the Teacher UI Check phrase flow to show canonicalization, neural, Rasa, and action-preview evidence.
Add "save correct example" backend action with skill/scenario/system-action target selection and audit metadata.
Add golden phrase reports and model promotion gates.

Last Completed Slice

Rasa is packaged as an optional default-on service-skill and installed into skill runtime slots.
NLU Teacher has a dry-run phrase probe API with regex-first and optional Rasa fallback.
NLU Teacher exposes baseline desktop lookup tables for modal_id, node_ref, app_id, scenario_id, and webspace_id.
Teacher lookup API overlays live YJS values from ui.application.modals, registry.merged.modals, data.catalog.apps, data.installed.apps, data.nodes, and ui.current_scenario.
Rasa export writes native lookup tables and data/lookup_tables.json; lookup summary is included in the training fingerprint.
Runtime emits stage trace events for regex, pipeline delegation, Rasa, and dispatcher actions/rejects.
Trace items are persisted to data.nlu_trace.items[] for the future UI timeline.
NLU documentation now includes a human verification checklist and clearly separates current UI, backend/API-only behavior, and target UI.