feat: persona chat loop, web UI, and local (Ollama) embeddings

Phase 1 — persona + persistent memory chat loop:
- lyra/persona.py + personas/lyra.md: editable identity/voice (friend-first,
  honest, never invents poker math)
- lyra/chat.py: turn loop assembling persona + cross-session recall + recent
  context, persisting both sides to SQLite
- lyra/session.py, lyra/__main__.py: session lifecycle + `lyra` REPL

Phase 1.25 — reuse the old web UI:
- vendored the prior single-page UI into lyra/web/static, repointed to
  same-origin
- lyra/web/server.py (FastAPI): serves the UI and backs its endpoint contract
  (/v1/chat/completions, session CRUD, health, inert thinking-stream) with the
  new chat loop + memory; SQLite stays the single source of truth
- `lyra-web` console script

Local backends — test for free, no OpenAI key:
- llm.embed routes via EMBED_BACKEND (cloud=OpenAI, local=Ollama /api/embed)
- simplified UI backend selector to Local (Ollama) / Cloud (OpenAI), default local
- memory connection opened check_same_thread=False for the threaded server

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-15 18:36:31 +00:00
parent 6d88505697
commit 3b9e0bb1e0
17 changed files with 2973 additions and 4 deletions
+5 -1
View File
@@ -16,7 +16,9 @@ class Config:
local_model: str
openai_api_key: str
cloud_model: str
embed_model: str
embed_backend: str # "cloud" (OpenAI) or "local" (Ollama)
embed_model: str # OpenAI embedding model
local_embed_model: str # Ollama embedding model
db_path: Path
@@ -26,6 +28,8 @@ def load() -> Config:
local_model=os.getenv("LOCAL_MODEL", "qwen2.5:7b-instruct"),
openai_api_key=os.getenv("OPENAI_API_KEY", ""),
cloud_model=os.getenv("CLOUD_MODEL", "gpt-4o-mini"),
embed_backend=os.getenv("EMBED_BACKEND", "cloud").lower(),
embed_model=os.getenv("EMBED_MODEL", "text-embedding-3-small"),
local_embed_model=os.getenv("LOCAL_EMBED_MODEL", "nomic-embed-text"),
db_path=Path(os.getenv("LYRA_DB_PATH", "data/lyra.db")),
)