Commit Graph

9 Commits

Author SHA1 Message Date
serversdown f3530cf4ae feat: separate CHAT_MODEL (gpt-4o) for persona fidelity
Mid-size models (gpt-4o-mini, qwen2.5-14b) resist persona instructions —
help-desk closers and feelings-disclaimers leak through regardless. Route live
chat to a stronger model while keeping bulk consolidation cheap:

- config: CHAT_MODEL (default gpt-4o), distinct from CLOUD_MODEL (gpt-4o-mini)
- llm.complete gains a `model` override; chat.respond uses chat_model on cloud,
  consolidation paths keep cloud_model
- persona: reword the "no sign-off" rule so genuine questions are welcome and
  only reflexive customer-service closers are discouraged

Verified: on gpt-4o she owns her mood without disclaimers and drops most
help-desk tails — clearly more in-character than mini/qwen.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 21:05:47 +00:00
serversdown aae95bfa6c fix: point MI50 backend at 10.0.0.42 (avoid terra-mechanics conflict)
CT202's old static 10.0.0.44 collided with the terra-mechanics dev VM (tmi-dev).
Reassigned CT202 to 10.0.0.42 and repointed MI50_BASE_URL accordingly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 05:52:15 +00:00
serversdown 30185f3fd8 feat: MI50 as a Lyra backend (OpenAI-compatible local GPU)
The MI50 box (CT202) runs an OpenAI-compatible llama.cpp server on
10.0.0.44:8080. Wire it in as a third backend:

- llm.complete gains backend="mi50" (OpenAI client pointed at MI50_BASE_URL)
- config: MI50_BASE_URL (default http://10.0.0.44:8080/v1) + MI50_MODEL
- chat.respond labels the model per backend; web _backend_for maps "mi50"
- UI backend selector adds "MI50 — local GPU"

Verified end-to-end: llm.complete(backend="mi50") returns from the live server.
See homelab-inference memory for the box topology.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 05:37:22 +00:00
serversdown d7c258eba0 feat: tiered, compacting memory (phase 1.5)
Older sessions fade to a general idea; details stay retrievable.

- memory: summaries table (one compacted gist per session, embedded), plus
  store_summary/get_summary/recall_summaries and unsummarized_count (tracks
  exchanges newer than the current summary)
- lyra/summary.py: summarize_session compacts a session's raw turns into a
  third-person gist (default SUMMARY_BACKEND=local, so compaction is free);
  maybe_summarize re-summarizes once SUMMARIZE_AFTER new turns accumulate
- chat.build_messages now layers context in tiers: persona -> gists of other
  sessions -> a few sharp raw cross-session details -> current session raw
  turns -> new message; respond() compacts the session after each turn
- web: POST /sessions/{id}/summarize to compact on demand
- summarization activity surfaces in the live log

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 18:52:58 +00:00
serversdown 3b9e0bb1e0 feat: persona chat loop, web UI, and local (Ollama) embeddings
Phase 1 — persona + persistent memory chat loop:
- lyra/persona.py + personas/lyra.md: editable identity/voice (friend-first,
  honest, never invents poker math)
- lyra/chat.py: turn loop assembling persona + cross-session recall + recent
  context, persisting both sides to SQLite
- lyra/session.py, lyra/__main__.py: session lifecycle + `lyra` REPL

Phase 1.25 — reuse the old web UI:
- vendored the prior single-page UI into lyra/web/static, repointed to
  same-origin
- lyra/web/server.py (FastAPI): serves the UI and backs its endpoint contract
  (/v1/chat/completions, session CRUD, health, inert thinking-stream) with the
  new chat loop + memory; SQLite stays the single source of truth
- `lyra-web` console script

Local backends — test for free, no OpenAI key:
- llm.embed routes via EMBED_BACKEND (cloud=OpenAI, local=Ollama /api/embed)
- simplified UI backend selector to Local (Ollama) / Cloud (OpenAI), default local
- memory connection opened check_same_thread=False for the threaded server

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 18:36:31 +00:00
Claude 6a1255dfdb feat: LLM router with local (Ollama) and cloud (OpenAI) backends
- lyra.config.load() reads env into a frozen Config dataclass
- lyra.llm.complete(messages, backend) routes to Ollama /api/chat or
  OpenAI chat completions
- lyra.llm.embed(texts) calls OpenAI embeddings
- .env.example switched from Anthropic to OpenAI to match available key
2026-05-16 06:10:48 +00:00
Claude b2523c2561 chore: project scaffold (uv, .env.example, README, lyra package) 2026-05-16 06:01:08 +00:00
Claude faf4e8a1aa chore: nuke legacy code, keep design docs for restart
Preserved on the archive branch. Keeping only the architecture and
design thinking that survives the rewrite:

- docs/ARCH_v0-6-1.md (Inner Self / Executive / Chat / Persona model)
- docs/ARCHITECTURE_v0-6-0.md (predecessor architecture)
- docs/PROJECT_SUMMARY.md (project history and rationale)
- docs/PROJECT_LYRA_COMPLETE_BREAKDOWN.md (detailed design notes)
- docs/ENVIRONMENT_VARIABLES.md (multi-backend env conventions)
- docs/LLMS.md
- docs/TRILLIUM_API.md (for future tool integration)

Removed: all service code (cortex, core/relay, neomem, rag, sandbox,
persona-sidecar), docker-compose, migration/logging docs, stale root
test scripts, CHANGELOG.
2026-05-16 05:57:07 +00:00
claude a5f3e0248a env cleanup round 2 2025-11-26 03:18:15 -05:00