Files
project-lyra/lyra/llm.py
T
serversdown 3b9e0bb1e0 feat: persona chat loop, web UI, and local (Ollama) embeddings
Phase 1 — persona + persistent memory chat loop:
- lyra/persona.py + personas/lyra.md: editable identity/voice (friend-first,
  honest, never invents poker math)
- lyra/chat.py: turn loop assembling persona + cross-session recall + recent
  context, persisting both sides to SQLite
- lyra/session.py, lyra/__main__.py: session lifecycle + `lyra` REPL

Phase 1.25 — reuse the old web UI:
- vendored the prior single-page UI into lyra/web/static, repointed to
  same-origin
- lyra/web/server.py (FastAPI): serves the UI and backs its endpoint contract
  (/v1/chat/completions, session CRUD, health, inert thinking-stream) with the
  new chat loop + memory; SQLite stays the single source of truth
- `lyra-web` console script

Local backends — test for free, no OpenAI key:
- llm.embed routes via EMBED_BACKEND (cloud=OpenAI, local=Ollama /api/embed)
- simplified UI backend selector to Local (Ollama) / Cloud (OpenAI), default local
- memory connection opened check_same_thread=False for the threaded server

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 18:36:31 +00:00

60 lines
1.9 KiB
Python

"""LLM router: local (Ollama) chat, cloud (OpenAI) chat + embeddings."""
from __future__ import annotations
from typing import Literal, TypedDict
import httpx
from openai import OpenAI
from lyra.config import load
class Message(TypedDict):
role: Literal["system", "user", "assistant"]
content: str
Backend = Literal["local", "cloud"]
def complete(messages: list[Message], backend: Backend = "local") -> str:
cfg = load()
if backend == "cloud":
if not cfg.openai_api_key:
raise RuntimeError("OPENAI_API_KEY is not set")
client = OpenAI(api_key=cfg.openai_api_key)
resp = client.chat.completions.create(model=cfg.cloud_model, messages=messages)
return resp.choices[0].message.content or ""
resp = httpx.post(
f"{cfg.local_base_url}/api/chat",
json={"model": cfg.local_model, "messages": messages, "stream": False},
timeout=120,
)
resp.raise_for_status()
return resp.json()["message"]["content"]
def embed(texts: list[str]) -> list[list[float]]:
"""Embed texts using the configured backend (EMBED_BACKEND: "cloud" or "local").
Note: OpenAI and Ollama embeddings live in different vector spaces (and
dimensions). A given database is tied to whichever backend created it — don't
switch EMBED_BACKEND against an existing DB or cosine recall will break.
"""
cfg = load()
if cfg.embed_backend == "local":
resp = httpx.post(
f"{cfg.local_base_url}/api/embed",
json={"model": cfg.local_embed_model, "input": texts},
timeout=120,
)
resp.raise_for_status()
return resp.json()["embeddings"]
if not cfg.openai_api_key:
raise RuntimeError("OPENAI_API_KEY is not set")
client = OpenAI(api_key=cfg.openai_api_key)
resp = client.embeddings.create(model=cfg.embed_model, input=texts)
return [d.embedding for d in resp.data]