Initial clean commit - unified Lyra stack

This commit is contained in:
serversdwn
2025-11-16 03:17:32 -05:00
commit 94fb091e59
270 changed files with 74200 additions and 0 deletions

643
core/CHANGELOG.md Normal file
View File

@@ -0,0 +1,643 @@
# Project Lyra — Modular Changelog
All notable changes to Project Lyra are organized by component.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
and adheres to [Semantic Versioning](https://semver.org/).
# Last Updated: 11-13-25
---
## 🧠 Lyra-Core ##############################################################################
## [Lyra_RAG v0.1.0] 2025-11-07
### Added
- Initial standalone RAG module for Project Lyra.
- Persistent ChromaDB vector store (`./chromadb`).
- Importer `rag_chat_import.py` with:
- Recursive folder scanning and category tagging.
- Smart chunking (~5 k chars).
- SHA-1 deduplication and chat-ID metadata.
- Timestamp fields (`file_modified`, `imported_at`).
- Background-safe operation (`nohup`/`tmux`).
- 68 Lyra-category chats imported:
- **6 556 new chunks added**
- **1 493 duplicates skipped**
- **7 997 total vectors** now stored.
### API
- `/rag/search` FastAPI endpoint implemented (port 7090).
- Supports natural-language queries and returns top related excerpts.
- Added answer synthesis step using `gpt-4o-mini`.
### Verified
- Successful recall of Lyra-Core development history (v0.3.0 snapshot).
- Correct metadata and category tagging for all new imports.
### Next Planned
- Optional `where` filter parameter for category/date queries.
- Graceful “no results” handler for empty retrievals.
- `rag_docs_import.py` for PDFs and other document types.
## [Lyra Core v0.3.2 + Web Ui v0.2.0] - 2025-10-28
### Added
- ** New UI **
- Cleaned up UI look and feel.
- ** Added "sessions" **
- Now sessions persist over time.
- Ability to create new sessions or load sessions from a previous instance.
- When changing the session, it updates what the prompt is sending relay (doesn't prompt with messages from other sessions).
- Relay is correctly wired in.
## [Lyra-Core 0.3.1] - 2025-10-09
### Added
- **NVGRAM Integration (Full Pipeline Reconnected)**
- Replaced legacy Mem0 service with NVGRAM microservice (`nvgram-api` @ port 7077).
- Updated `server.js` in Relay to route all memory ops via `${NVGRAM_API}/memories` and `/search`.
- Added `.env` variable:
```
NVGRAM_API=http://nvgram-api:7077
```
- Verified end-to-end Lyra conversation persistence:
- `relay → nvgram-api → postgres/neo4j → relay → ollama → ui`
- ✅ Memories stored, retrieved, and re-injected successfully.
### Changed
- Renamed `MEM0_URL` → `NVGRAM_API` across all relay environment configs.
- Updated Docker Compose service dependency order:
- `relay` now depends on `nvgram-api` healthcheck.
- Removed `mem0` references and volumes.
- Minor cleanup to Persona fetch block (null-checks and safer default persona string).
### Fixed
- Relay startup no longer crashes when NVGRAM is unavailable — deferred connection handling.
- `/memories` POST failures no longer crash Relay; now logged gracefully as `relay error Error: memAdd failed: 500`.
- Improved injected prompt debugging (`DEBUG_PROMPT=true` now prints clean JSON).
### Goals / Next Steps
- Add salience visualization (e.g., memory weights displayed in injected system message).
- Begin schema alignment with NVGRAM v0.1.2 for confidence scoring.
- Add relay auto-retry for transient 500 responses from NVGRAM.
---
## [Lyra-Core] v0.3.1 - 2025-09-27
### Changed
- Removed salience filter logic; Cortex is now the default annotator.
- All user messages stored in Mem0; no discard tier applied.
### Added
- Cortex annotations (`metadata.cortex`) now attached to memories.
- Debug logging improvements:
- Pretty-print Cortex annotations
- Injected prompt preview
- Memory search hit list with scores
- `.env` toggle (`CORTEX_ENABLED`) to bypass Cortex when needed.
### Fixed
- Parsing failures from Markdown-wrapped Cortex JSON via fence cleaner.
- Relay no longer “hangs” on malformed Cortex outputs.
---
### [Lyra-Core] v0.3.0 — 2025-09-26
#### Added
- Implemented **salience filtering** in Relay:
- `.env` configurable: `SALIENCE_ENABLED`, `SALIENCE_MODE`, `SALIENCE_MODEL`, `SALIENCE_API_URL`.
- Supports `heuristic` and `llm` classification modes.
- LLM-based salience filter integrated with Cortex VM running `llama-server`.
- Logging improvements:
- Added debug logs for salience mode, raw LLM output, and unexpected outputs.
- Fail-closed behavior for unexpected LLM responses.
- Successfully tested with **Phi-3.5-mini** and **Qwen2-0.5B-Instruct** as salience classifiers.
- Verified end-to-end flow: Relay → salience filter → Mem0 add/search → Persona injection → LLM reply.
#### Changed
- Refactored `server.js` to gate `mem.add()` calls behind salience filter.
- Updated `.env` to support `SALIENCE_MODEL`.
#### Known Issues
- Small models (e.g. Qwen2-0.5B) tend to over-classify as "salient".
- Phi-3.5-mini sometimes returns truncated tokens ("sali", "fi").
- CPU-only inference is functional but limited; larger models recommended once GPU is available.
---
### [Lyra-Core] v0.2.0 — 2025-09-24
#### Added
- Migrated Relay to use `mem0ai` SDK instead of raw fetch calls.
- Implemented `sessionId` support (client-supplied, fallback to `default`).
- Added debug logs for memory add/search.
- Cleaned up Relay structure for clarity.
---
### [Lyra-Core] v0.1.0 — 2025-09-23
#### Added
- First working MVP of **Lyra Core Relay**.
- Relay service accepts `POST /v1/chat/completions` (OpenAI-compatible).
- Memory integration with Mem0:
- `POST /memories` on each user message.
- `POST /search` before LLM call.
- Persona Sidecar integration (`GET /current`).
- OpenAI GPT + Ollama (Mythomax) support in Relay.
- Simple browser-based chat UI (talks to Relay at `http://<host>:7078`).
- `.env` standardization for Relay + Mem0 + Postgres + Neo4j.
- Working Neo4j + Postgres backing stores for Mem0.
- Initial MVP relay service with raw fetch calls to Mem0.
- Dockerized with basic healthcheck.
#### Fixed
- Resolved crash loop in Neo4j by restricting env vars (`NEO4J_AUTH` only).
- Relay now correctly reads `MEM0_URL` and `MEM0_API_KEY` from `.env`.
#### Known Issues
- No feedback loop (thumbs up/down) yet.
- Forget/delete flow is manual (via memory IDs).
- Memory latency ~14s depending on embedding model.
---
## 🧩 lyra-neomem (used to be NVGRAM / Lyra-Mem0) ##############################################################################
## [NeoMem 0.1.2] - 2025-10-27
### Changed
- **Renamed NVGRAM to neomem**
- All future updates will be under the name NeoMem.
- Features have not changed.
## [NVGRAM 0.1.1] - 2025-10-08
### Added
- **Async Memory Rewrite (Stability + Safety Patch)**
- Introduced `AsyncMemory` class with fully asynchronous vector and graph store writes.
- Added **input sanitation** to prevent embedding errors (`'list' object has no attribute 'replace'`).
- Implemented `flatten_messages()` helper in API layer to clean malformed payloads.
- Added structured request logging via `RequestLoggingMiddleware` (FastAPI middleware).
- Health endpoint (`/health`) now returns structured JSON `{status, version, service}`.
- Startup logs now include **sanitized embedder config** with API keys masked for safety:
```
>>> Embedder config (sanitized): {'provider': 'openai', 'config': {'model': 'text-embedding-3-small', 'api_key': '***'}}
✅ Connected to Neo4j on attempt 1
🧠 NVGRAM v0.1.1 — Neural Vectorized Graph Recall and Memory initialized
```
### Changed
- Replaced synchronous `Memory.add()` with async-safe version supporting concurrent vector + graph writes.
- Normalized indentation and cleaned duplicate `main.py` references under `/nvgram/` vs `/nvgram/server/`.
- Removed redundant `FastAPI()` app reinitialization.
- Updated internal logging to INFO-level timing format:
2025-10-08 21:48:45 [INFO] POST /memories -> 200 (11189.1 ms)
- Deprecated `@app.on_event("startup")` (FastAPI deprecation warning) → will migrate to `lifespan` handler in v0.1.2.
### Fixed
- Eliminated repeating 500 error from OpenAI embedder caused by non-string message content.
- Masked API key leaks from boot logs.
- Ensured Neo4j reconnects gracefully on first retry.
### Goals / Next Steps
- Integrate **salience scoring** and **embedding confidence weight** fields in Postgres schema.
- Begin testing with full Lyra Relay + Persona Sidecar pipeline for live session memory recall.
- Migrate from deprecated `on_event` → `lifespan` pattern in 0.1.2.
---
## [NVGRAM 0.1.0] - 2025-10-07
### Added
- **Initial fork of Mem0 → NVGRAM**:
- Created a fully independent local-first memory engine based on Mem0 OSS.
- Renamed all internal modules, Docker services, and environment variables from `mem0` → `nvgram`.
- New service name: **`nvgram-api`**, default port **7077**.
- Maintains same API endpoints (`/memories`, `/search`) for drop-in compatibility with Lyra Core.
- Uses **FastAPI**, **Postgres**, and **Neo4j** as persistent backends.
- Verified clean startup:
```
✅ Connected to Neo4j on attempt 1
INFO: Uvicorn running on http://0.0.0.0:7077
```
- `/docs` and `/openapi.json` confirmed reachable and functional.
### Changed
- Removed dependency on the external `mem0ai` SDK — all logic now local.
- Re-pinned requirements:
- fastapi==0.115.8
- uvicorn==0.34.0
- pydantic==2.10.4
- python-dotenv==1.0.1
- psycopg>=3.2.8
- ollama
- Adjusted `docker-compose` and `.env` templates to use new NVGRAM naming and image paths.
### Goals / Next Steps
- Integrate NVGRAM as the new default backend in Lyra Relay.
- Deprecate remaining Mem0 references and archive old configs.
- Begin versioning as a standalone project (`nvgram-core`, `nvgram-api`, etc.).
---
## [Lyra-Mem0 0.3.2] - 2025-10-05
### Added
- Support for **Ollama LLM reasoning** alongside OpenAI embeddings:
- Introduced `LLM_PROVIDER=ollama`, `LLM_MODEL`, and `OLLAMA_HOST` in `.env.3090`.
- Verified local 3090 setup using `qwen2.5:7b-instruct-q4_K_M`.
- Split processing pipeline:
- Embeddings → OpenAI `text-embedding-3-small`
- LLM → Local Ollama (`http://10.0.0.3:11434/api/chat`).
- Added `.env.3090` template for self-hosted inference nodes.
- Integrated runtime diagnostics and seeder progress tracking:
- File-level + message-level progress bars.
- Retry/back-off logic for timeouts (3 attempts).
- Event logging (`ADD / UPDATE / NONE`) for every memory record.
- Expanded Docker health checks for Postgres, Qdrant, and Neo4j containers.
- Added GPU-friendly long-run configuration for continuous seeding (validated on RTX 3090).
### Changed
- Updated `main.py` configuration block to load:
- `LLM_PROVIDER`, `LLM_MODEL`, and `OLLAMA_BASE_URL`.
- Fallback to OpenAI if Ollama unavailable.
- Adjusted `docker-compose.yml` mount paths to correctly map `/app/main.py`.
- Normalized `.env` loading so `mem0-api` and host environment share identical values.
- Improved seeder logging and progress telemetry for clearer diagnostics.
- Added explicit `temperature` field to `DEFAULT_CONFIG['llm']['config']` for tuning future local inference runs.
### Fixed
- Resolved crash during startup:
`TypeError: OpenAIConfig.__init__() got an unexpected keyword argument 'ollama_base_url'`.
- Corrected mount type mismatch (file vs directory) causing `OCI runtime create failed` errors.
- Prevented duplicate or partial postings when retry logic triggered multiple concurrent requests.
- “Unknown event” warnings now safely ignored (no longer break seeding loop).
- Confirmed full dual-provider operation in logs (`api.openai.com` + `10.0.0.3:11434/api/chat`).
### Observations
- Stable GPU utilization: ~8 GB VRAM @ 92 % load, ≈ 67 °C under sustained seeding.
- Next revision will re-format seed JSON to preserve `role` context (user vs assistant).
---
## [Lyra-Mem0 0.3.1] - 2025-10-03
### Added
- HuggingFace TEI integration (local 3090 embedder).
- Dual-mode environment switch between OpenAI cloud and local.
- CSV export of memories from Postgres (`payload->>'data'`).
### Fixed
- `.env` CRLF vs LF line ending issues.
- Local seeding now possible via huggingface server running
---
## [Lyra-mem0 0.3.0]
### Added
- Support for **Ollama embeddings** in Mem0 OSS container:
- Added ability to configure `EMBEDDER_PROVIDER=ollama` and set `EMBEDDER_MODEL` + `OLLAMA_HOST` via `.env`.
- Mounted `main.py` override from host into container to load custom `DEFAULT_CONFIG`.
- Installed `ollama` Python client into custom API container image.
- `.env.3090` file created for external embedding mode (3090 machine):
- EMBEDDER_PROVIDER=ollama
- EMBEDDER_MODEL=mxbai-embed-large
- OLLAMA_HOST=http://10.0.0.3:11434
- Workflow to support **multiple embedding modes**:
1. Fast LAN-based 3090/Ollama embeddings
2. Local-only CPU embeddings (Lyra Cortex VM)
3. OpenAI fallback embeddings
### Changed
- `docker-compose.yml` updated to mount local `main.py` and `.env.3090`.
- Built **custom Dockerfile** (`mem0-api-server:latest`) extending base image with `pip install ollama`.
- Updated `requirements.txt` to include `ollama` package.
- Adjusted Mem0 container config so `main.py` pulls environment variables with `dotenv` (`load_dotenv()`).
- Tested new embeddings path with curl `/memories` API call.
### Fixed
- Resolved container boot failure caused by missing `ollama` dependency (`ModuleNotFoundError`).
- Fixed config overwrite issue where rebuilding container restored stock `main.py`.
- Worked around Neo4j error (`vector.similarity.cosine(): mismatched vector dimensions`) by confirming OpenAI vs. Ollama embedding vector sizes and planning to standardize at 1536-dim.
--
## [Lyra-mem0 v0.2.1]
### Added
- **Seeding pipeline**:
- Built Python seeder script to bulk-insert raw Cloud Lyra exports into Mem0.
- Implemented incremental seeding option (skip existing memories, only add new ones).
- Verified insert process with Postgres-backed history DB and curl `/memories/search` sanity check.
- **Ollama embedding support** in Mem0 OSS container:
- Added configuration for `EMBEDDER_PROVIDER=ollama`, `EMBEDDER_MODEL`, and `OLLAMA_HOST` via `.env`.
- Created `.env.3090` profile for LAN-connected 3090 machine with Ollama.
- Set up three embedding modes:
1. Fast LAN-based 3090/Ollama
2. Local-only CPU model (Lyra Cortex VM)
3. OpenAI fallback
### Changed
- Updated `main.py` to load configuration from `.env` using `dotenv` and support multiple embedder backends.
- Mounted host `main.py` into container so local edits persist across rebuilds.
- Updated `docker-compose.yml` to mount `.env.3090` and support swap between profiles.
- Built **custom Dockerfile** (`mem0-api-server:latest`) including `pip install ollama`.
- Updated `requirements.txt` with `ollama` dependency.
- Adjusted startup flow so container automatically connects to external Ollama host (LAN IP).
- Added logging to confirm model pulls and embedding requests.
### Fixed
- Seeder process originally failed on old memories — now skips duplicates and continues batch.
- Resolved container boot error (`ModuleNotFoundError: ollama`) by extending image.
- Fixed overwrite issue where stock `main.py` replaced custom config during rebuild.
- Worked around Neo4j `vector.similarity.cosine()` dimension mismatch by investigating OpenAI (1536-dim) vs Ollama (1024-dim) schemas.
### Notes
- To fully unify embedding modes, a Hugging Face / local model with **1536-dim embeddings** will be needed (to match OpenAIs schema and avoid Neo4j errors).
- Current Ollama model (`mxbai-embed-large`) works, but returns 1024-dim vectors.
- Seeder workflow validated but should be wrapped in a repeatable weekly run for full Cloud→Local sync.
---
## [Lyra-Mem0 v0.2.0] - 2025-09-30
### Added
- Standalone **Lyra-Mem0** stack created at `~/lyra-mem0/`
- Includes **Postgres (pgvector)**, **Qdrant**, **Neo4j**, and **SQLite** for history tracking.
- Added working `docker-compose.mem0.yml` and custom `Dockerfile` for building the Mem0 API server.
- Verified REST API functionality:
- `POST /memories` works for adding memories.
- `POST /search` works for semantic search.
- Successful end-to-end test with persisted memory:
*"Likes coffee in the morning"* → retrievable via search. ✅
### Changed
- Split architecture into **modular stacks**:
- `~/lyra-core` (Relay, Persona-Sidecar, etc.)
- `~/lyra-mem0` (Mem0 OSS memory stack)
- Removed old embedded mem0 containers from the Lyra-Core compose file.
- Added Lyra-Mem0 section in README.md.
### Next Steps
- Wire **Relay → Mem0 API** (integration not yet complete).
- Add integration tests to verify persistence and retrieval from within Lyra-Core.
---
## 🧠 Lyra-Cortex ##############################################################################
## [ Cortex - v0.5] -2025-11-13
### Added
- **New `reasoning.py` module**
- Async reasoning engine.
- Accepts user prompt, identity, RAG block, and reflection notes.
- Produces draft internal answers.
- Uses primary backend (vLLM).
- **New `reflection.py` module**
- Fully async.
- Produces actionable JSON “internal notes.”
- Enforces strict JSON schema and fallback parsing.
- Forces cloud backend (`backend_override="cloud"`).
- Integrated `refine.py` into Cortex reasoning pipeline:
- New stage between reflection and persona.
- Runs exclusively on primary vLLM backend (MI50).
- Produces final, internally consistent output for downstream persona layer.
- **Backend override system**
- Each LLM call can now select its own backend.
- Enables multi-LLM cognition: Reflection → cloud, Reasoning → primary.
- **identity loader**
- Added `identity.py` with `load_identity()` for consistent persona retrieval.
- **ingest_handler**
- Async stub created for future Intake → NeoMem → RAG pipeline.
### Changed
- Unified LLM backend URL handling across Cortex:
- ENV variables must now contain FULL API endpoints.
- Removed all internal path-appending (e.g. `.../v1/completions`).
- `llm_router.py` rewritten to use env-provided URLs as-is.
- Ensures consistent behavior between draft, reflection, refine, and persona.
- **Rebuilt `main.py`**
- Removed old annotation/analysis logic.
- New structure: load identity → get RAG → reflect → reason → return draft+notes.
- Routes now clean and minimal (`/reason`, `/ingest`, `/health`).
- Async path throughout Cortex.
- **Refactored `llm_router.py`**
- Removed old fallback logic during overrides.
- OpenAI requests now use `/v1/chat/completions`.
- Added proper OpenAI Authorization headers.
- Distinct payload format for vLLM vs OpenAI.
- Unified, correct parsing across models.
- **Simplified Cortex architecture**
- Removed deprecated “context.py” and old reasoning code.
- Relay completely decoupled from smart behavior.
- Updated environment specification:
- `LLM_PRIMARY_URL` now set to `http://10.0.0.43:8000/v1/completions`.
- `LLM_SECONDARY_URL` remains `http://10.0.0.3:11434/api/generate` (Ollama).
- `LLM_CLOUD_URL` set to `https://api.openai.com/v1/chat/completions`.
### Fixed
- Resolved endpoint conflict where:
- Router expected base URLs.
- Refine expected full URLs.
- Refine always fell back due to hitting incorrect endpoint.
- Fixed by standardizing full-URL behavior across entire system.
- Reflection layer no longer fails silently (previously returned `[""]` due to MythoMax).
- Resolved 404/401 errors caused by incorrect OpenAI URL endpoints.
- No more double-routing through vLLM during reflection.
- Corrected async/sync mismatch in multiple locations.
- Eliminated double-path bug (`/v1/completions/v1/completions`) caused by previous router logic.
### Removed
- Legacy `annotate`, `reason_check` glue logic from old architecture.
- Old backend probing junk code.
- Stale imports and unused modules leftover from previous prototype.
### Verified
- Cortex → vLLM (MI50) → refine → final_output now functioning correctly.
- refine shows `used_primary_backend: true` and no fallback.
- Manual curl test confirms endpoint accuracy.
### Known Issues
- refine sometimes prefixes output with `"Final Answer:"`; next version will sanitize this.
- hallucinations in draft_output persist due to weak grounding (fix in reasoning + RAG planned).
### Pending / Known Issues
- **RAG service does not exist** — requires containerized FastAPI service.
- Reasoning layer lacks self-revision loop (deliberate thought cycle).
- No speak/persona generation layer yet (`speak.py` planned).
- Intake summaries not yet routing into RAG or reflection layer.
- No refinement engine between reasoning and speak.
### Notes
This is the largest structural change to Cortex so far.
It establishes:
- multi-model cognition
- clean layering
- identity + reflection separation
- correct async code
- deterministic backend routing
- predictable JSON reflection
The system is now ready for:
- refinement loops
- persona-speaking layer
- containerized RAG
- long-term memory integration
- true emergent-behavior experiments
## [ Cortex - v0.4.1] - 2025-11-5
### Added
- **RAG intergration**
- Added rag.py with query_rag() and format_rag_block().
- Cortex now queries the local RAG API (http://10.0.0.41:7090/rag/search) for contextual augmentation.
- Synthesized answers and top excerpts are injected into the reasoning prompt.
### Changed ###
- **Revised /reason endpoint.**
- Now builds unified context blocks:
- [Intake] → recent summaries
- [RAG] → contextual knowledge
- [User Message] → current input
- Calls call_llm() for the first pass, then reflection_loop() for meta-evaluation.
- Returns cortex_prompt, draft_output, final_output, and normalized reflection.
- **Reflection Pipeline Stability**
- Cleaned parsing to normalize JSON vs. text reflections.
- Added fallback handling for malformed or non-JSON outputs.
- Log system improved to show raw JSON, extracted fields, and normalized summary.
- **Async Summarization (Intake v0.2.1)**
- Intake summaries now run in background threads to avoid blocking Cortex.
- Summaries (L1L∞) logged asynchronously with [BG] tags.
- **Environment & Networking Fixes**
- Verified .env variables propagate correctly inside the Cortex container.
- Confirmed Docker network connectivity between Cortex, Intake, NeoMem, and RAG (shared serversdown_lyra_net).
- Adjusted localhost calls to service-IP mapping (10.0.0.41 for Cortex host).
- **Behavioral Updates**
- Cortex now performs conversation reflection (on user intent) and self-reflection (on its own answers).
- RAG context successfully grounds reasoning outputs.
- Intake and NeoMem confirmed receiving summaries via /add_exchange.
- Log clarity pass: all reflective and contextual blocks clearly labeled.
- **Known Gaps / Next Steps**
- NeoMem Tuning
- Improve retrieval latency and relevance.
- Implement a dedicated /reflections/recent endpoint for Cortex.
- Migrate to Cortex-first ingestion (Relay → Cortex → NeoMem).
- **Cortex Enhancements**
- Add persistent reflection recall (use prior reflections as meta-context).
- Improve reflection JSON structure ("insight", "evaluation", "next_action" → guaranteed fields).
- Tighten temperature and prompt control for factual consistency.
- **RAG Optimization**
-Add source ranking, filtering, and multi-vector hybrid search.
-Cache RAG responses per session to reduce duplicate calls.
- **Documentation / Monitoring**
-Add health route for RAG and Intake summaries.
-Include internal latency metrics in /health endpoint.
Consolidate logs into unified “Lyra Cortex Console” for tracing all module calls.
## [Cortex - v0.3.0] 2025-10-31
### Added
- **Cortex Service (FastAPI)**
- New standalone reasoning engine (`cortex/main.py`) with endpoints:
- `GET /health` reports active backend + NeoMem status.
- `POST /reason` evaluates `{prompt, response}` pairs.
- `POST /annotate` experimental text analysis.
- Background NeoMem health monitor (5-minute interval).
- **Multi-Backend Reasoning Support**
- Added environment-driven backend selection via `LLM_FORCE_BACKEND`.
- Supports:
- **Primary** → vLLM (MI50 node @ 10.0.0.43)
- **Secondary** → Ollama (3090 node @ 10.0.0.3)
- **Cloud** → OpenAI API
- **Fallback** → llama.cpp (CPU)
- Introduced per-backend model variables:
`LLM_PRIMARY_MODEL`, `LLM_SECONDARY_MODEL`, `LLM_CLOUD_MODEL`, `LLM_FALLBACK_MODEL`.
- **Response Normalization Layer**
- Implemented `normalize_llm_response()` to merge streamed outputs and repair malformed JSON.
- Handles Ollamas multi-line streaming and Mythomaxs missing punctuation issues.
- Prints concise debug previews of merged content.
- **Environment Simplification**
- Each service (`intake`, `cortex`, `neomem`) now maintains its own `.env` file.
- Removed reliance on shared/global env file to prevent cross-contamination.
- Verified Docker Compose networking across containers.
### Changed
- Refactored `reason_check()` to dynamically switch between **prompt** and **chat** mode depending on backend.
- Enhanced startup logs to announce active backend, model, URL, and mode.
- Improved error handling with clearer “Reasoning error” messages.
### Fixed
- Corrected broken vLLM endpoint routing (`/v1/completions`).
- Stabilized cross-container health reporting for NeoMem.
- Resolved JSON parse failures caused by streaming chunk delimiters.
---
## Next Planned [v0.4.0]
### Planned Additions
- **Reflection Mode**
- Introduce `REASONING_MODE=factcheck|reflection`.
- Output schema:
```json
{ "insight": "...", "evaluation": "...", "next_action": "..." }
```
- **Cortex-First Pipeline**
- UI → Cortex → [Reflection + Verifier + Memory] → Speech LLM → User.
- Allows Lyra to “think before speaking.”
- **Verifier Stub**
- New `/verify` endpoint for search-based factual grounding.
- Asynchronous external truth checking.
- **Memory Integration**
- Feed reflective outputs into NeoMem.
- Enable “dream” cycles for autonomous self-review.
---
**Status:** 🟢 Stable Core Multi-backend reasoning operational.
**Next milestone:** *v0.4.0 — Reflection Mode + Thought Pipeline orchestration.*
---
### [Intake] v0.1.0 - 2025-10-27
- Recieves messages from relay and summarizes them in a cascading format.
- Continues to summarize smaller amounts of exhanges while also generating large scale conversational summaries. (L20)
- Currently logs summaries to .log file in /project-lyra/intake-logs/
** Next Steps **
- Feed intake into neomem.
- Generate a daily/hourly/etc overall summary, (IE: Today Brian and Lyra worked on x, y, and z)
- Generate session aware summaries, with its own intake hopper.
### [Lyra-Cortex] v0.2.0 — 2025-09-26
**Added
- Integrated **llama-server** on dedicated Cortex VM (Proxmox).
- Verified Phi-3.5-mini-instruct_Uncensored-Q4_K_M running with 8 vCPUs.
- Benchmarked Phi-3.5-mini performance:
- ~18 tokens/sec CPU-only on Ryzen 7 7800X.
- Salience classification functional but sometimes inconsistent ("sali", "fi", "jamming").
- Tested **Qwen2-0.5B-Instruct GGUF** as alternative salience classifier:
- Much faster throughput (~350 tokens/sec prompt, ~100 tokens/sec eval).
- More responsive but over-classifies messages as “salient.”
- Established `.env` integration for model ID (`SALIENCE_MODEL`), enabling hot-swap between models.
** Known Issues
- Small models tend to drift or over-classify.
- CPU-only 7B+ models expected to be slow; GPU passthrough recommended for larger models.
- Need to set up a `systemd` service for `llama-server` to auto-start on VM reboot.
---
### [Lyra-Cortex] v0.1.0 — 2025-09-25
#### Added
- First deployment as dedicated Proxmox VM (5 vCPU / 18 GB RAM / 100 GB SSD).
- Built **llama.cpp** with `llama-server` target via CMake.
- Integrated **Phi-3.5 Mini Instruct (Uncensored, Q4_K_M GGUF)** model.
- Verified **API compatibility** at `/v1/chat/completions`.
- Local test successful via `curl` → ~523 token response generated.
- Performance benchmark: ~11.5 tokens/sec (CPU-only on Ryzen 7800X).
- Confirmed usable for salience scoring, summarization, and lightweight reasoning.

71
core/PROJECT_SUMMARY.md Normal file
View File

@@ -0,0 +1,71 @@
# Lyra Core — Project Summary
## v0.4 (2025-10-03)
### 🧠 High-Level Architecture
- **Lyra Core (v0.3.1)** — Orchestration layer.
- Accepts chat requests (`/v1/chat/completions`).
- Routes through Cortex for subconscious annotation.
- Stores everything in Mem0 (no discard).
- Fetches persona + relevant memories.
- Injects context back into LLM.
- **Cortex (v0.3.0)** — Subconscious annotator.
- Runs locally via `llama.cpp` (Phi-3.5-mini Q4_K_M).
- Strict JSON schema:
```json
{
"sentiment": "positive" | "neutral" | "negative",
"novelty": 0.01.0,
"tags": ["keyword", "keyword"],
"notes": "short string"
}
```
- Normalizes keys (lowercase).
- Strips Markdown fences before parsing.
- Configurable via `.env` (`CORTEX_ENABLED=true|false`).
- Currently generates annotations, but not yet persisted into Mem0 payloads (stored as empty `{cortex:{}}`).
- **Mem0 (v0.4.0)** — Persistent memory layer.
- Handles embeddings, graph storage, and retrieval.
- Dual embedder support:
- **OpenAI Cloud** (`text-embedding-3-small`, 1536-dim).
- **HuggingFace TEI** (gte-Qwen2-1.5B-instruct, 1536-dim, hosted on 3090).
- Environment toggle for provider (`.env.openai` vs `.env.3090`).
- Memory persistence in Postgres (`payload` JSON).
- CSV export pipeline confirmed (id, user_id, data, created_at).
- **Persona Sidecar**
- Provides personality, style, and protocol instructions.
- Injected at runtime into Core prompt building.
---
### 🚀 Recent Changes
- **Mem0**
- Added HuggingFace TEI integration (local 3090 embedder).
- Enabled dual-mode environment switch (OpenAI cloud ↔ local TEI).
- Fixed `.env` line ending mismatch (CRLF vs LF).
- Added memory dump/export commands for Postgres.
- **Core/Relay**
- No major changes since v0.3.1 (still routing input → Cortex → Mem0).
- **Cortex**
- Still outputs annotations, but not yet persisted into Mem0 payloads.
---
### 📈 Versioning
- **Lyra Core** → v0.3.1
- **Cortex** → v0.3.0
- **Mem0** → v0.4.0
---
### 📋 Next Steps
- [ ] Wire Cortex annotations into Mem0 payloads (`cortex` object).
- [ ] Add “export all memories” script to standard workflow.
- [ ] Consider async embedding for faster `mem.add`.
- [ ] Build visual diagram of data flow (Core ↔ Cortex ↔ Mem0 ↔ Persona).
- [ ] Explore larger LLMs for Cortex (Qwen2-7B, etc.) for richer subconscious annotation.

265
core/README.md Normal file
View File

@@ -0,0 +1,265 @@
##### Project Lyra - README v0.3.0 - needs fixing #####
Lyra is a modular persistent AI companion system.
It provides memory-backed chat using **NeoMem** + **Relay** + **Persona Sidecar**,
with optional subconscious annotation powered by **Cortex VM** running local LLMs.
## Mission Statement ##
The point of project lyra is to give an AI chatbot more abilities than a typical chatbot. typical chat bots are essentially amnesic and forget everything about your project. Lyra helps keep projects organized and remembers everything you have done. Think of her abilities as a notepad/schedule/data base/ co-creator/collaborattor all with its own executive function. Say something in passing, Lyra remembers it then reminds you of it later.
---
## Structure ##
Project Lyra exists as a series of docker containers that run independentally of each other but are all networked together. Think of it as how the brain has regions, Lyra has modules:
## A. VM 100 - lyra-core:
1. ** Core v0.3.1 - Docker Stack
- Relay - (docker container) - The main harness that connects the modules together and accepts input from the user.
- UI - (HTML) - This is how the user communicates with lyra. ATM its a typical instant message interface, but plans are to make it much more than that.
- Persona - (docker container) - This is the personality of lyra, set how you want her to behave. Give specific instructions for output. Basically prompt injection.
- All of this is built and controlled by a single .env and docker-compose.lyra.yml.
2. **NeoMem v0.1.0 - (docker stack)
- NeoMem is Lyra's main long term memory data base. It is a fork of mem0 oss. Uses vector databases and graph.
- NeoMem launches with a single separate docker-compose.neomem.yml.
## B. VM 101 - lyra - cortex
3. ** Cortex - VM containing docker stack
- This is the working reasoning layer of Lyra.
- Built to be flexible in deployment. Run it locally or remotely (via wan/lan)
- Intake v0.1.0 - (docker Container) gives conversations context and purpose
- Intake takes the last N exchanges and summarizes them into coherrent short term memories.
- Uses a cascading summarization setup that quantizes the exchanges. Summaries occur at L2, L5, L10, L15, L20 etc.
- Keeps the bot aware of what is going on with out having to send it the whole chat every time.
- Cortex - Docker container containing:
- Reasoning Layer
- TBD
- Reflect - (docker continer) - Not yet implemented, road map.
- Calls back to NeoMem after N exchanges and N summaries and edits memories created during the initial messaging step. This helps contain memories to coherrent thoughts, reduces the noise.
- Can be done actively and asynchronously, or on a time basis (think human sleep and dreams).
- This stage is not yet built, this is just an idea.
## C. Remote LLM APIs:
3. **AI Backends
- Lyra doesnt run models her self, she calls up APIs.
- Endlessly customizable as long as it outputs to the same schema.
---
## 🚀 Features ##
# Lyra-Core VM (VM100)
- **Relay **:
- The main harness and orchestrator of Lyra.
- OpenAI-compatible endpoint: `POST /v1/chat/completions`
- Injects persona + relevant memories into every LLM call
- Routes all memory storage/retrieval through **NeoMem**
- Logs spans (`neomem.add`, `neomem.search`, `persona.fetch`, `llm.generate`)
- **NeoMem (Memory Engine)**:
- Forked from Mem0 OSS and fully independent.
- Drop-in compatible API (`/memories`, `/search`).
- Local-first: runs on FastAPI with Postgres + Neo4j.
- No external SDK dependencies.
- Default service: `neomem-api` (port 7077).
- Capable of adding new memories and updating previous memories. Compares existing embeddings and performs in place updates when a memory is judged to be a semantic match.
- **UI**:
- Lightweight static HTML chat page.
- Connects to Relay at `http://<host>:7078`.
- Nice cyberpunk theme!
- Saves and loads sessions, which then in turn send to relay.
# Beta Lyrae (RAG Memory DB) - added 11-3-25
- **RAG Knowledge DB - Beta Lyrae (sheliak)**
- This module implements the **Retrieval-Augmented Generation (RAG)** layer for Project Lyra.
- It serves as the long-term searchable memory store that Cortex and Relay can query for relevant context before reasoning or response generation.
The system uses:
- **ChromaDB** for persistent vector storage
- **OpenAI Embeddings (`text-embedding-3-small`)** for semantic similarity
- **FastAPI** (port 7090) for the `/rag/search` REST endpoint
- Directory Layout
rag/
├── rag_chat_import.py # imports JSON chat logs
├── rag_docs_import.py # (planned) PDF/EPUB/manual importer
├── rag_build.py # legacy single-folder builder
├── rag_query.py # command-line query helper
├── rag_api.py # FastAPI service providing /rag/search
├── chromadb/ # persistent vector store
├── chatlogs/ # organized source data
│ ├── poker/
│ ├── work/
│ ├── lyra/
│ ├── personal/
│ └── ...
└── import.log # progress log for batch runs
- **OpenAI chatlog importer.
- Takes JSON formatted chat logs and imports it to the RAG.
- **fetures include:**
- Recursive folder indexing with **category detection** from directory name
- Smart chunking for long messages (5 000 chars per slice)
- Automatic deduplication using SHA-1 hash of file + chunk
- Timestamps for both file modification and import time
- Full progress logging via tqdm
- Safe to run in background with nohup … &
- Metadata per chunk:
```json
{
"chat_id": "<sha1 of filename>",
"chunk_index": 0,
"source": "chatlogs/lyra/0002_cortex_LLMs_11-1-25.json",
"title": "cortex LLMs 11-1-25",
"role": "assistant",
"category": "lyra",
"type": "chat",
"file_modified": "2025-11-06T23:41:02",
"imported_at": "2025-11-07T03:55:00Z"
}```
# Cortex VM (VM101, CT201)
- **CT201 main reasoning orchestrator.**
- This is the internal brain of Lyra.
- Running in a privellaged LXC.
- Currently a locally served LLM running on a Radeon Instinct HI50, using a customized version of vLLM that lets it use ROCm.
- Accessible via 10.0.0.43:8000/v1/completions.
- **Intake v0.1.1 **
- Recieves messages from relay and summarizes them in a cascading format.
- Continues to summarize smaller amounts of exhanges while also generating large scale conversational summaries. (L20)
- Intake then sends to cortex for self reflection, neomem for memory consolidation.
- **Reflect **
-TBD
# Self hosted vLLM server #
- **CT201 main reasoning orchestrator.**
- This is the internal brain of Lyra.
- Running in a privellaged LXC.
- Currently a locally served LLM running on a Radeon Instinct HI50, using a customized version of vLLM that lets it use ROCm.
- Accessible via 10.0.0.43:8000/v1/completions.
- **Stack Flow**
- [Proxmox Host]
└── loads AMDGPU driver
└── boots CT201 (order=2)
[CT201 GPU Container]
├── lyra-start-vllm.sh → starts vLLM ROCm model server
├── lyra-vllm.service → runs the above automatically
├── lyra-core.service → launches Cortex + Intake Docker stack
└── Docker Compose → runs Cortex + Intake containers
[Cortex Container]
├── Listens on port 7081
├── Talks to NVGRAM (mem API) + Intake
└── Main relay between Lyra UI ↔ memory ↔ model
[Intake Container]
├── Listens on port 7080
├── Summarizes every few exchanges
├── Writes summaries to /app/logs/summaries.log
└── Future: sends summaries → Cortex for reflection
# Additional information available in the trilium docs. #
---
## 📦 Requirements
- Docker + Docker Compose
- Postgres + Neo4j (for NeoMem)
- Access to an open AI or ollama style API.
- OpenAI API key (for Relay fallback LLMs)
**Dependencies:**
- fastapi==0.115.8
- uvicorn==0.34.0
- pydantic==2.10.4
- python-dotenv==1.0.1
- psycopg>=3.2.8
- ollama
---
🔌 Integration Notes
Lyra-Core connects to neomem-api:8000 inside Docker or localhost:7077 locally.
API endpoints remain identical to Mem0 (/memories, /search).
History and entity graphs managed internally via Postgres + Neo4j.
---
🧱 Architecture Snapshot
User → Relay → Cortex
[RAG Search]
[Reflection Loop]
Intake (async summaries)
NeoMem (persistent memory)
**Cortex v0.4.1 introduces the first fully integrated reasoning loop.**
- Data Flow:
- User message enters Cortex via /reason.
- Cortex assembles context:
- Intake summaries (short-term memory)
- RAG contextual data (knowledge base)
- LLM generates initial draft (call_llm).
- Reflection loop critiques and refines the answer.
- Intake asynchronously summarizes and sends snapshots to NeoMem.
RAG API Configuration:
Set RAG_API_URL in .env (default: http://localhost:7090).
---
## Setup and Operation ##
## Beta Lyrae - RAG memory system ##
**Requirements**
-Env= python 3.10+
-Dependences: pip install chromadb openai tqdm python-dotenv fastapi uvicorn jq
-Persistent storage path: ./chromadb (can be moved to /mnt/data/lyra_rag_db)
**Import Chats**
- Chats need to be formatted into the correct format of
```
"messages": [
{
"role:" "user",
"content": "Message here"
},
"messages": [
{
"role:" "assistant",
"content": "Message here"
},```
- Organize the chats into categorical folders. This step is optional, but it helped me keep it straight.
- run "python3 rag_chat_import.py", chats will then be imported automatically. For reference, it took 32 Minutes to import 68 Chat logs (aprox 10.3MB).
**Build API Server**
- Run: rag_build.py, this automatically builds the chromaDB using data saved in the /chatlogs/ folder. (docs folder to be added in future.)
- Run: rag_api.py or ```uvicorn rag_api:app --host 0.0.0.0 --port 7090```
**Query**
- Run: python3 rag_query.py "Question here?"
- For testing a curl command can reach it too
```
curl -X POST http://127.0.0.1:7090/rag/search \
-H "Content-Type: application/json" \
-d '{
"query": "What is the current state of Cortex and Project Lyra?",
"where": {"category": "lyra"}
}'
```
# Beta Lyrae - RAG System
## 📖 License
NeoMem is a derivative work based on the Mem0 OSS project (Apache 2.0).
This fork retains the original Apache 2.0 license and adds local modifications.
© 2025 Terra-Mechanics / ServersDown Labs. All modifications released under Apache 2.0.

File diff suppressed because one or more lines are too long

Binary file not shown.

43
core/docker-compose.yml Normal file
View File

@@ -0,0 +1,43 @@
services:
relay:
build:
context: ./relay
container_name: relay
restart: always
ports:
- "7078:7078"
env_file:
- .env
volumes:
- ./relay/sessions:/app/sessions
networks:
- lyra-net
# persona-sidecar:
# build:
# context: ./persona-sidecar
# container_name: persona-sidecar
# env_file:
# - .env
# ports:
# - "7080:7080"
# volumes:
# - ./persona-sidecar/personas.json:/app/personas.json:rw
# restart: unless-stopped
# networks:
# - lyra-net
lyra-ui:
image: nginx:alpine
container_name: lyra-ui
restart: unless-stopped
ports:
- "8081:80"
volumes:
- ./ui:/usr/share/nginx/html:ro
networks:
- lyra-net
networks:
lyra-net:
external: true

View File

@@ -0,0 +1,14 @@
FROM node:18-alpine
WORKDIR /app
# install deps
COPY package.json ./package.json
RUN npm install --production
# copy code + config
COPY persona-server.js ./persona-server.js
COPY personas.json ./personas.json
EXPOSE 7080
CMD ["node", "persona-server.js"]

View File

@@ -0,0 +1,8 @@
{
"name": "persona-sidecar",
"version": "0.1.0",
"type": "module",
"dependencies": {
"express": "^4.19.2"
}
}

View File

@@ -0,0 +1,78 @@
// persona-server.js — Persona Sidecar v0.1.0 (Docker Lyra)
// Node 18+, Express REST
import express from "express";
import fs from "fs";
const app = express();
app.use(express.json());
const PORT = process.env.PORT || 7080;
const CONFIG_FILE = process.env.PERSONAS_FILE || "./personas.json";
// allow JSON with // and /* */ comments
function parseJsonWithComments(raw) {
return JSON.parse(
raw
.replace(/\/\*[\s\S]*?\*\//g, "") // block comments
.replace(/^\s*\/\/.*$/gm, "") // line comments
);
}
function loadConfig() {
const raw = fs.readFileSync(CONFIG_FILE, "utf-8");
return parseJsonWithComments(raw);
}
function saveConfig(cfg) {
fs.writeFileSync(CONFIG_FILE, JSON.stringify(cfg, null, 2));
}
// GET /persona → active persona JSON
app.get("/persona", (_req, res) => {
try {
const cfg = loadConfig();
const active = cfg.active;
const persona = cfg.personas?.[active];
if (!persona) return res.status(404).json({ error: "Active persona not found" });
res.json({ active, persona });
} catch (err) {
res.status(500).json({ error: String(err.message || err) });
}
});
// GET /personas → all personas
app.get("/personas", (_req, res) => {
try {
const cfg = loadConfig();
res.json(cfg.personas || {});
} catch (err) {
res.status(500).json({ error: String(err.message || err) });
}
});
// POST /persona/select { name }
app.post("/persona/select", (req, res) => {
try {
const { name } = req.body || {};
if (!name) return res.status(400).json({ error: "Missing 'name'" });
const cfg = loadConfig();
if (!cfg.personas || !cfg.personas[name]) {
return res.status(404).json({ error: `Persona '${name}' not found` });
}
cfg.active = name;
saveConfig(cfg);
res.json({ ok: true, active: name });
} catch (err) {
res.status(500).json({ error: String(err.message || err) });
}
});
// health + fallback
app.get("/_health", (_req, res) => res.json({ ok: true, time: new Date().toISOString() }));
app.use((_req, res) => res.status(404).json({ error: "no such route" }));
app.listen(PORT, () => {
console.log(`Persona Sidecar listening on :${PORT}`);
});

View File

@@ -0,0 +1,17 @@
{
// v0.1.0 default active persona
"active": "Lyra",
// Personas available to the service
"personas": {
"Lyra": {
"name": "Lyra",
"style": "warm, slyly supportive, collaborative confidante",
"protocols": ["Project logs", "Confidence Bank", "Scar Notes"]
}
}
// Placeholders for later (commented out for now)
// "Doyle": { "name": "Doyle", "style": "gritty poker grinder", "protocols": [] },
// "Mr GPT": { "name": "Mr GPT", "style": "direct, tactical mentor", "protocols": [] }
}

16
core/relay/.dockerignore Normal file
View File

@@ -0,0 +1,16 @@
# Ignore node_modules - Docker will rebuild them inside
node_modules
npm-debug.log
yarn-error.log
*.log
# Ignore environment files
.env
.env.local
# Ignore OS/editor cruft
.DS_Store
*.swp
*.swo
.vscode
.idea

18
core/relay/Dockerfile Normal file
View File

@@ -0,0 +1,18 @@
# relay/Dockerfile
FROM node:18-alpine
# Create app directory
WORKDIR /app
# Copy package.json and install deps first (better caching)
COPY package.json ./
RUN npm install
# Copy the rest of the app
COPY . .
# Expose port
EXPOSE 7078
# Run the server
CMD ["npm", "start"]

73
core/relay/lib/cortex.js Normal file
View File

@@ -0,0 +1,73 @@
// relay/lib/cortex.js
import fetch from "node-fetch";
const REFLECT_URL = process.env.CORTEX_URL || "http://localhost:7081/reflect";
const INGEST_URL = process.env.CORTEX_URL_INGEST || "http://localhost:7081/ingest";
export async function reflectWithCortex(userInput, memories = []) {
const body = { prompt: userInput, memories };
try {
const res = await fetch(REFLECT_URL, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(body),
timeout: 120000,
});
const rawText = await res.text();
console.log("🔎 [Cortex-Debug] rawText from /reflect →", rawText.slice(0, 300));
if (!res.ok) {
throw new Error(`HTTP ${res.status}${rawText.slice(0, 200)}`);
}
let data;
try {
data = JSON.parse(rawText);
} catch (err) {
// Fallback ① try to grab a JSON-looking block
const match = rawText.match(/\{[\s\S]*\}/);
if (match) {
try {
data = JSON.parse(match[0]);
} catch {
data = { reflection_raw: rawText.trim(), notes: "partial parse" };
}
} else {
// Fallback ② if its already an object (stringified Python dict)
try {
const normalized = rawText
.replace(/'/g, '"') // convert single quotes
.replace(/None/g, 'null'); // convert Python None
data = JSON.parse(normalized);
} catch {
data = { reflection_raw: rawText.trim(), notes: "no JSON found" };
}
}
}
if (typeof data !== "object") {
data = { reflection_raw: rawText.trim(), notes: "non-object response" };
}
console.log("🧠 Cortex reflection normalized:", data);
return data;
} catch (e) {
console.warn("⚠️ Cortex reflect failed:", e.message);
return { error: e.message, reflection_raw: "" };
}
}
export async function ingestToCortex(user, assistant, reflection = {}, sessionId = "default") {
const body = { turn: { user, assistant }, reflection, session_id: sessionId };
try {
const res = await fetch(INGEST_URL, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(body),
timeout: 120000,
});
console.log(`📤 Sent exchange to Cortex ingest (${res.status})`);
} catch (e) {
console.warn("⚠️ Cortex ingest failed:", e.message);
}
}

93
core/relay/lib/llm.js Normal file
View File

@@ -0,0 +1,93 @@
async function tryBackend(backend, messages) {
if (!backend.url || !backend.model) throw new Error("missing url/model");
const isOllama = backend.type === "ollama";
const isOpenAI = backend.type === "openai";
const isVllm = backend.type === "vllm";
const isLlamaCpp = backend.type === "llamacpp";
let endpoint = backend.url;
let headers = { "Content-Type": "application/json" };
if (isOpenAI) headers["Authorization"] = `Bearer ${OPENAI_API_KEY}`;
// Choose correct endpoint automatically
if (isOllama && !endpoint.endsWith("/api/chat")) endpoint += "/api/chat";
if ((isVllm || isLlamaCpp) && !endpoint.endsWith("/v1/completions")) endpoint += "/v1/completions";
if (isOpenAI && !endpoint.endsWith("/v1/chat/completions")) endpoint += "/v1/chat/completions";
// Build payload based on backend style
const body = (isVllm || isLlamaCpp)
? {
model: backend.model,
prompt: messages.map(m => m.content).join("\n"),
max_tokens: 400,
temperature: 0.3,
}
: isOllama
? { model: backend.model, messages, stream: false }
: { model: backend.model, messages, stream: false };
const resp = await fetch(endpoint, {
method: "POST",
headers,
body: JSON.stringify(body),
timeout: 120000,
});
if (!resp.ok) throw new Error(`${backend.key} HTTP ${resp.status}`);
const raw = await resp.text();
// 🧩 Normalize replies
let reply = "";
try {
if (isOllama) {
// Ollama sometimes returns NDJSON lines; merge them
const merged = raw
.split("\n")
.filter(line => line.trim().startsWith("{"))
.map(line => JSON.parse(line))
.map(obj => obj.message?.content || obj.response || "")
.join("");
reply = merged.trim();
} else {
const data = JSON.parse(raw);
console.log("🔍 RAW LLM RESPONSE:", JSON.stringify(data, null, 2));
reply =
data?.choices?.[0]?.text?.trim() ||
data?.choices?.[0]?.message?.content?.trim() ||
data?.message?.content?.trim() ||
"";
}
} catch (err) {
reply = `[parse error: ${err.message}]`;
}
return { reply, raw, backend: backend.key };
}
// ------------------------------------
// Export the main call helper
// ------------------------------------
export async function callSpeechLLM(messages) {
const backends = [
{ key: "primary", type: "vllm", url: process.env.LLM_PRIMARY_URL, model: process.env.LLM_PRIMARY_MODEL },
{ key: "secondary",type: "ollama", url: process.env.LLM_SECONDARY_URL,model: process.env.LLM_SECONDARY_MODEL },
{ key: "cloud", type: "openai", url: process.env.LLM_CLOUD_URL, model: process.env.LLM_CLOUD_MODEL },
{ key: "fallback", type: "llamacpp", url: process.env.LLM_FALLBACK_URL, model: process.env.LLM_FALLBACK_MODEL },
];
for (const b of backends) {
if (!b.url || !b.model) continue;
try {
console.log(`🧠 Trying backend: ${b.key.toUpperCase()} (${b.url})`);
const out = await tryBackend(b, messages);
console.log(`✅ Success via ${b.key.toUpperCase()}`);
return out;
} catch (err) {
console.warn(`⚠️ ${b.key.toUpperCase()} failed: ${err.message}`);
}
}
throw new Error("all_backends_failed");
}

8799
core/relay/package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

16
core/relay/package.json Normal file
View File

@@ -0,0 +1,16 @@
{
"name": "lyra-relay",
"version": "0.1.0",
"type": "module",
"main": "server.js",
"scripts": {
"start": "node server.js"
},
"dependencies": {
"cors": "^2.8.5",
"dotenv": "^16.6.1",
"express": "^4.18.2",
"mem0ai": "^2.1.38",
"node-fetch": "^3.3.2"
}
}

350
core/relay/server.js Normal file
View File

@@ -0,0 +1,350 @@
import express from "express";
import dotenv from "dotenv";
import cors from "cors";
import fs from "fs";
import path from "path";
import { reflectWithCortex, ingestToCortex } from "./lib/cortex.js";
dotenv.config();
const sessionsDir = path.join(process.cwd(), "sessions");
if (!fs.existsSync(sessionsDir)) fs.mkdirSync(sessionsDir);
const app = express();
app.use(cors());
app.use(express.json());
// Cache and normalize env flags/values once
const {
NEOMEM_API,
MEM0_API_KEY,
OPENAI_API_KEY,
OLLAMA_URL,
PERSONA_URL,
CORTEX_ENABLED,
PORT: PORT_ENV,
DEBUG_PROMPT,
} = process.env;
const PORT = Number(PORT_ENV) || 7078;
const cortexEnabled = String(CORTEX_ENABLED).toLowerCase() === "true";
const debugPrompt = String(DEBUG_PROMPT).toLowerCase() === "true";
// Basic env validation warnings (non-fatal)
if (!NEOMEM_API || !MEM0_API_KEY) {
console.warn("⚠️ NeoMem configuration missing: NEOMEM_API or MEM0_API_KEY not set.");
}
/* ------------------------------
Helpers for NeoMem REST API
--------------------------------*/
// Small helper for fetch with timeout + JSON + error detail
async function fetchJSON(url, options = {}, timeoutMs = 30000) {
const controller = new AbortController();
const t = setTimeout(() => controller.abort(), timeoutMs);
try {
const resp = await fetch(url, { ...options, signal: controller.signal });
const text = await resp.text();
const parsed = text ? JSON.parse(text) : null;
if (!resp.ok) {
const msg = parsed?.error || parsed?.message || text || resp.statusText;
throw new Error(`${resp.status} ${msg}`);
}
return parsed;
} finally {
clearTimeout(t);
}
}
async function memAdd(content, userId, sessionId, cortexData) {
const url = `${NEOMEM_API}/memories`;
const payload = {
messages: [{ role: "user", content }],
user_id: userId,
// run_id: sessionId,
metadata: { source: "relay", cortex: cortexData },
};
return fetchJSON(url, {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${MEM0_API_KEY}`,
},
body: JSON.stringify(payload),
});
}
async function memSearch(query, userId, sessionId) {
const url = `${NEOMEM_API}/search`;
const payload = { query, user_id: userId };
return fetchJSON(url, {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${MEM0_API_KEY}`,
},
body: JSON.stringify(payload),
});
}
/* ------------------------------
Utility to time spans
--------------------------------*/
async function span(name, fn) {
const start = Date.now();
try {
return await fn();
} finally {
console.log(`${name} took ${Date.now() - start}ms`);
}
}
/* ------------------------------
Healthcheck
--------------------------------*/
app.get("/_health", (req, res) => {
res.json({ ok: true, time: new Date().toISOString() });
});
/* ------------------------------
Sessions
--------------------------------*/
// List all saved sessions
app.get("/sessions", (_, res) => {
const list = fs.readdirSync(sessionsDir)
.filter(f => f.endsWith(".json"))
.map(f => f.replace(".json", ""));
res.json(list);
});
// Load a single session
app.get("/sessions/:id", (req, res) => {
const file = path.join(sessionsDir, `${req.params.id}.json`);
if (!fs.existsSync(file)) return res.json([]);
res.json(JSON.parse(fs.readFileSync(file, "utf8")));
});
// Save or update a session
app.post("/sessions/:id", (req, res) => {
const file = path.join(sessionsDir, `${req.params.id}.json`);
fs.writeFileSync(file, JSON.stringify(req.body, null, 2));
res.json({ ok: true });
});
/* ------------------------------
Chat completion endpoint
--------------------------------*/
app.post("/v1/chat/completions", async (req, res) => {
try {
const { model, messages, sessionId: clientSessionId } = req.body || {};
if (!Array.isArray(messages) || !messages.length) {
return res.status(400).json({ error: "invalid_messages" });
}
if (!model || typeof model !== "string") {
return res.status(400).json({ error: "invalid_model" });
}
const sessionId = clientSessionId || "default";
const userId = "brian"; // fixed for now
console.log(`🛰️ Incoming request. Session: ${sessionId}`);
// Find last user message efficiently
const lastUserMsg = [...messages].reverse().find(m => m.role === "user")?.content;
if (!lastUserMsg) {
return res.status(400).json({ error: "no_user_message" });
}
// 1. Cortex Reflection (new pipeline)
/*let reflection = {};
try {
console.log("🧠 Reflecting with Cortex...");
const memoriesPreview = []; // we'll fill this in later with memSearch
reflection = await reflectWithCortex(lastUserMsg, memoriesPreview);
console.log("🔍 Reflection:", reflection);
} catch (err) {
console.warn("⚠️ Cortex reflect failed:", err.message);
reflection = { error: err.message };
}*/
// 2. Search memories
/* let memorySnippets = [];
await span("mem.search", async () => {
if (NEOMEM_API && MEM0_API_KEY) {
try {
const { results } = await memSearch(lastUserMsg, userId, sessionId);
if (results?.length) {
console.log(`📚 Mem0 hits: ${results.length}`);
results.forEach((r, i) =>
console.log(` ${i + 1}) ${r.memory} (score ${Number(r.score).toFixed(3)})`)
);
memorySnippets = results.map((r, i) => `${i + 1}) ${r.memory}`);
} else {
console.log("😴 No memories found");
}
} catch (e) {
console.warn("⚠️ mem.search failed:", e.message);
}
}
});*/
// 3. Fetch persona
/* let personaText = "Persona: Lyra 🤖 friendly, concise, poker-savvy.";
await span("persona.fetch", async () => {
try {
if (PERSONA_URL) {
const data = await fetchJSON(PERSONA_URL);
if (data?.persona) {
const name = data.persona.name ?? "Lyra";
const style = data.persona.style ?? "friendly, concise";
const protocols = Array.isArray(data.persona.protocols) ? data.persona.protocols.join(", ") : "";
personaText = `Persona: ${name} 🤖 ${style}. Protocols: ${protocols}`.trim();
}
}
} catch (err) {
console.error("💥 persona.fetch failed", err);
}
}); */
// 1. Ask Cortex to build the final prompt
let cortexPrompt = "";
try {
console.log("🧠 Requesting prompt from Cortex...");
const response = await fetch(`${process.env.CORTEX_API_URL || "http://10.0.0.41:7081"}/reason`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
prompt: lastUserMsg,
session_id: sessionId,
user_id: userId
})
});
const data = await response.json();
cortexPrompt = data.full_prompt || data.prompt || "";
console.log("🧩 Cortex returned prompt");
} catch (err) {
console.warn("⚠️ Cortex prompt build failed:", err.message);
}
// 4. Build final messages
const injectedMessages = [
{ role: "system", content: cortexPrompt || "You are Lyra." },
...messages,
];
if (debugPrompt) {
console.log("\n==== Injected Prompt ====");
console.log(JSON.stringify(injectedMessages, null, 2));
console.log("=========================\n");
}
// 5. Call LLM (OpenAI or Ollama)
const isOllama = model.startsWith("ollama:");
const llmUrl = isOllama
? `${OLLAMA_URL}/api/chat`
: "https://api.openai.com/v1/chat/completions";
const llmHeaders = isOllama
? { "Content-Type": "application/json" }
: {
"Content-Type": "application/json",
Authorization: `Bearer ${OPENAI_API_KEY}`,
};
const llmBody = {
model: isOllama ? model.replace("ollama:", "") : model,
messages: injectedMessages, // <-- make sure injectedMessages is defined above this section
stream: false,
};
const data = await fetchJSON(llmUrl, {
method: "POST",
headers: llmHeaders,
body: JSON.stringify(llmBody),
});
// define once for everything below
const assistantReply = isOllama
? data?.message?.content
: data?.choices?.[0]?.message?.content || data?.choices?.[0]?.text || "";
// 🧠 Send exchange back to Cortex for ingest
try {
await ingestToCortex(lastUserMsg, assistantReply || "", {}, sessionId);
console.log("📤 Sent exchange back to Cortex ingest");
} catch (err) {
console.warn("⚠️ Cortex ingest failed:", err.message);
}
// 💾 Save exchange to session log
try {
const logFile = path.join(sessionsDir, `${sessionId}.jsonl`);
const entry = JSON.stringify({
ts: new Date().toISOString(),
turn: [
{ role: "user", content: lastUserMsg },
{ role: "assistant", content: assistantReply || "" }
]
}) + "\n";
fs.appendFileSync(logFile, entry, "utf8");
console.log(`🧠 Logged session exchange → ${logFile}`);
} catch (e) {
console.warn("⚠️ Session log write failed:", e.message);
}
// 🔄 Forward user↔assistant exchange to Intake summarizer
if (process.env.INTAKE_API_URL) {
try {
const intakePayload = {
session_id: sessionId,
turns: [
{ role: "user", content: lastUserMsg },
{ role: "assistant", content: assistantReply || "" }
]
};
await fetch(process.env.INTAKE_API_URL, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(intakePayload),
});
console.log("📨 Sent exchange to Intake summarizer");
} catch (err) {
console.warn("⚠️ Intake post failed:", err.message);
}
}
if (isOllama) {
res.json({
id: "ollama-" + Date.now(),
object: "chat.completion",
created: Math.floor(Date.now() / 1000),
model,
choices: [
{
index: 0,
message: data?.message || { role: "assistant", content: "" },
finish_reason: "stop",
},
],
});
} else {
res.json(data);
}
} catch (err) {
console.error("💥 relay error", err);
res.status(500).json({ error: "relay_failed", detail: err.message });
}
});
/* ------------------------------
Start server
--------------------------------*/
app.listen(PORT, () => {
console.log(`Relay listening on port ${PORT}`);
});

View File

@@ -0,0 +1,50 @@
[
{
"role": "user",
"content": "Hey im testing stuff! how goes it?"
},
{
"role": "assistant",
"content": "Hey! I'm here and ready to help. What are you testing?"
},
{
"role": "user",
"content": "testing the logic loop for your brain!"
},
{
"role": "assistant",
"content": "Sounds interesting! How can I assist you with that?"
},
{
"role": "user",
"content": "Just keep answering me!@"
},
{
"role": "assistant",
"content": "Absolutely! I'm here to respond. Whats next?"
},
{
"role": "user",
"content": "how about now how are you?"
},
{
"role": "assistant",
"content": "I'm doing well, thanks for asking! How about you?"
},
{
"role": "user",
"content": "are you able to tell me what your prompts are saying to you?"
},
{
"role": "assistant",
"content": "I cant share specific internal prompts or system messages, but I can tell you that Im designed to understand and respond to your questions as best as I can! What else would you like to know?"
},
{
"role": "user",
"content": "ok thanks how is the cheese i gave you?"
},
{
"role": "assistant",
"content": "I appreciate the gesture! If I could taste it, Im sure it would be delightful. What kind of cheese was it?"
}
]

View File

@@ -0,0 +1,6 @@
{"ts":"2025-11-12T20:41:32.497Z","turn":[{"role":"user","content":"Hey im testing stuff! how goes it?"},{"role":"assistant","content":"Hey! I'm here and ready to help. What are you testing?"}]}
{"ts":"2025-11-12T20:59:32.992Z","turn":[{"role":"user","content":"testing the logic loop for your brain!"},{"role":"assistant","content":"Sounds interesting! How can I assist you with that?"}]}
{"ts":"2025-11-12T22:08:09.644Z","turn":[{"role":"user","content":"Just keep answering me!@"},{"role":"assistant","content":"Absolutely! I'm here to respond. Whats next?"}]}
{"ts":"2025-11-12T22:08:48.693Z","turn":[{"role":"user","content":"how about now how are you?"},{"role":"assistant","content":"I'm doing well, thanks for asking! How about you?"}]}
{"ts":"2025-11-12T22:10:46.435Z","turn":[{"role":"user","content":"are you able to tell me what your prompts are saying to you?"},{"role":"assistant","content":"I cant share specific internal prompts or system messages, but I can tell you that Im designed to understand and respond to your questions as best as I can! What else would you like to know?"}]}
{"ts":"2025-11-12T23:10:16.321Z","turn":[{"role":"user","content":"ok thanks how is the cheese i gave you?"},{"role":"assistant","content":"I appreciate the gesture! If I could taste it, Im sure it would be delightful. What kind of cheese was it?"}]}

View File

@@ -0,0 +1,122 @@
[
{
"role": "user",
"content": "Yo yo yo wusup"
},
{
"role": "assistant",
"content": "Hey Brian! Not much, just here to help you out. What's up with you?"
},
{
"role": "user",
"content": "werkin on codin'"
},
{
"role": "assistant",
"content": "Nice! What are you working on? Need any help with coding or anything related to your projects?"
},
{
"role": "user",
"content": "working on a little program that saves our chats so you can search them again in different chats"
},
{
"role": "assistant",
"content": "That sounds like a great idea! Having a program to save and search through our chats could really enhance the way you interact with Lyra. Do you have a specific approach in mind for how you'll implement the search functionality?"
},
{
"role": "user",
"content": "just adding a simple logger into the code you run on"
},
{
"role": "assistant",
"content": "That sounds like a solid plan! Implementing a simple logger can make it easier to keep track of conversations. Are you thinking of any specific features for the search functionality, like filtering by keywords or date?"
},
{
"role": "user",
"content": "i'll like use a basic word search, possibly integrate it into a rag system i have built"
},
{
"role": "assistant",
"content": "Integrating a basic word search into your RAG system sounds like a smart approach! It could enhance the way you retrieve specific chats. Are you planning to use any specific libraries or tools for your RAG system?"
},
{
"role": "user",
"content": "no clue!"
},
{
"role": "assistant",
"content": "No worries! If you need any suggestions on libraries or tools that could help you with implementing the RAG system or the word search feature, just let me know. I'm here to help!"
},
{
"role": "user",
"content": "just having some coding issues, i think i was editing the wrong version of a file LOL"
},
{
"role": "assistant",
"content": "That can definitely happen! It can be tricky to keep track of different versions. Have you made any backups or version control for your projects? That might help avoid mix-ups in the future!"
},
{
"role": "user",
"content": "i have everything syncing back to my main dev computer, i just do it manually. i have no idea how to use git"
},
{
"role": "assistant",
"content": "(no reply)"
},
{
"role": "user",
"content": "i have everything syncing back to my main dev computer, i just do it manually. i have no idea how to use git"
},
{
"role": "assistant",
"content": "(no reply)"
},
{
"role": "user",
"content": "i have everything syncing back to my main dev computer, i just do it manually. i have no idea how to use git"
},
{
"role": "assistant",
"content": "Manually syncing is a good start, but learning Git can really help streamline your workflow and keep track of changes. If you're interested, I can provide some resources or tips on how to get started with Git! It can make managing different versions of your code much easier."
},
{
"role": "user",
"content": "what is the point of git? how can it streamline my work?"
},
{
"role": "assistant",
"content": "Git is a version control system that helps you keep track of changes in your code over time. Here are some ways it can streamline your work:\n\n1. **Version Tracking**: Git allows you to save versions of your code so you can revert back to previous versions if needed. This is especially helpful if you make a change that breaks something.\n\n2. **Collaboration**: If you ever work with others, Git makes it easy to collaborate on projects. You can work on separate branches, merge changes, and resolve conflicts.\n\n3. **Branching**: You can create branches for new features or experiments without affecting the main codebase. Once you're satisfied with the changes, you can merge them back.\n\n4. **Snapshotting**: Git takes snapshots of your code at different points in time, so you can easily see what your code looked like at any point.\n\n5. **Easy Backup**: By pushing your code to a remote repository (like GitHub or GitLab), you can back up your work and access it from anywhere.\n\nLearning Git might take a bit of time, but it can greatly enhance your coding efficiency and help you manage your projects better. If you're interested, I can point you to some beginner-friendly tutorials!"
},
{
"role": "user",
"content": "say i am writing one program on my main computer, and then im working on one that lives in a VM. Can i put them both in the same git?"
},
{
"role": "assistant",
"content": "Yes, you can use the same Git repository for your code on both your main computer and the VM. Heres how you can do it:\n\n1. **Create a Git Repository**: You can create a Git repository on your main computer for your program. Use the command `git init` in your project's directory to initialize it.\n\n2. **Commit Changes**: As you make changes to your code on your main computer, you can use `git add .` to stage changes and `git commit -m \"your message\"` to commit them.\n\n3. **Clone the Repository on your VM**: You can then clone the Git repository on your VM using `git clone <repository-url>`. This gives you a copy of the same repository on your VM.\n\n4. **Work on Both**: You can now work on either machine. When you make changes on one, commit those changes, and then pull the latest changes from the other machine to keep everything in sync.\n\n5. **Push and Pull**: Use `git push` to push your changes to a remote repository (like GitHub) if you're using one. Use `git pull` on the other machine to pull the latest code.\n\nJust be careful about making sure you commit and push your changes before switching between machines to avoid conflicts. This setup allows you to maintain a single version of your code across both systems. If you need help with specific commands or setups, just let me know!"
},
{
"role": "user",
"content": "hmm i guess that might be useful. Im writting code that is running on a VM. I want to test and experiment while i write the code. The VM is headless though so i've been doing all my editing and work via SSH and SFTP."
},
{
"role": "assistant",
"content": "Using a VM and working through SSH and SFTP is a solid way to manage your coding environment! Git can definitely be beneficial in this setup as well. Heres how it can help:\n\n1. **Easier Code Management**: With Git, you can track changes to your code directly within the VM, making it easier to manage versions as you experiment.\n\n2. **Experiment Branches**: You can create branches for different experiments. This lets you test new features or fixes without affecting your main code, and you can easily switch back and forth between branches.\n\n3. **Collaborative Testing**: If you ever work with others or want to test things collaboratively, you can push changes to a remote repository, making it easy for others to pull and test the latest code.\n\n4. **Integrated with SSH**: You can run Git commands directly from your SSH terminal on the VM, making it straightforward to use. You wont need any additional tools.\n\n5. **Seamless Remote Access**: Since youre using SFTP, you can clone the repository to your VM. This way, you can easily pull updates you've made on your main computer, or push changes made on the VM back to the main repository.\n\nOverall, integrating Git into your workflow can enhance your coding experience while working in a headless environment. If you decide to go for it, I can help with the initial setup or any Git-related questions you might have!"
},
{
"role": "user",
"content": "so is git like a compressed version of a directory?"
},
{
"role": "assistant",
"content": "Not exactly, but that's a good way to think about some of its features! Here's a more precise breakdown:\n\n1. **Snapshots, Not Compression**: Git takes \"snapshots\" of your files at different points in time, meaning it records a full version of your directory's contents every time you make a commit. While this does involve storing the files, it doesn't compress them in the traditional sense—Git keeps track of changes efficiently, so only the differences between versions are stored after the first snapshot.\n\n2. **Directory Structure**: Git is designed to track changes in your source code, including the entire directory structure, file contents, and metadata. It allows you to manage multiple versions of a directory and the files within it.\n\n3. **Branching and Merging**: Git allows you to create branches, which can be thought of as \"parallel versions\" of your directory. Each branch can have different sets of changes, and you can merge these branches to integrate different features or fixes.\n\n4. **History Tracking**: Git also maintains a history of changes, so you can navigate through previous versions of your directory. This includes seeing what changes were made, who made them, and when they occurred.\n\nThe combination of these features makes Git a powerful tool for version control, allowing you to manage your projects and code changes more effectively. If you're curious about any specific aspect of Git or how it works, feel free to ask!"
},
{
"role": "user",
"content": "hmm i still dont see the point of it. atleast not for this project. This project is mostly about learning how shit works, im not planning to push this as a product or collab with anyone"
},
{
"role": "assistant",
"content": "That makes total sense! If this project is primarily for learning and experimentation, you might not need the full functionality of Git right now. Many developers start without it, especially when theyre working on smaller projects or learning something new.\n\nHere are a few reasons you might still consider using Git, even for personal learning projects:\n\n1. **Tracking Changes**: It can help you understand what changes you made over time. You can easily go back to a previous point if something breaks or if you want to see how you arrived at your current solution.\n\n2. **Learning Opportunity**: Getting familiar with Git now can make it easier when you start working on larger projects or collaborating in the future. It's a valuable skill in the software development world.\n\n3. **Experimentation**: If you're trying out different ideas, branches can help you manage experiments without affecting your main work.\n\nHowever, if you're comfortable managing your changes manually and dont see the benefit for this particular project, thats completely okay! Focus on what helps you learn best. If you have specific goals or areas you want to explore, Im here to help with that too!"
}
]

View File

@@ -0,0 +1,6 @@
{"ts":"2025-11-12T20:18:06.822Z","turn":[{"role":"user","content":"i have everything syncing back to my main dev computer, i just do it manually. i have no idea how to use git"},{"role":"assistant","content":"Manually syncing is a good start, but learning Git can really help streamline your workflow and keep track of changes. If you're interested, I can provide some resources or tips on how to get started with Git! It can make managing different versions of your code much easier."}]}
{"ts":"2025-11-12T20:29:20.003Z","turn":[{"role":"user","content":"what is the point of git? how can it streamline my work?"},{"role":"assistant","content":"Git is a version control system that helps you keep track of changes in your code over time. Here are some ways it can streamline your work:\n\n1. **Version Tracking**: Git allows you to save versions of your code so you can revert back to previous versions if needed. This is especially helpful if you make a change that breaks something.\n\n2. **Collaboration**: If you ever work with others, Git makes it easy to collaborate on projects. You can work on separate branches, merge changes, and resolve conflicts.\n\n3. **Branching**: You can create branches for new features or experiments without affecting the main codebase. Once you're satisfied with the changes, you can merge them back.\n\n4. **Snapshotting**: Git takes snapshots of your code at different points in time, so you can easily see what your code looked like at any point.\n\n5. **Easy Backup**: By pushing your code to a remote repository (like GitHub or GitLab), you can back up your work and access it from anywhere.\n\nLearning Git might take a bit of time, but it can greatly enhance your coding efficiency and help you manage your projects better. If you're interested, I can point you to some beginner-friendly tutorials!"}]}
{"ts":"2025-11-12T20:30:55.329Z","turn":[{"role":"user","content":"say i am writing one program on my main computer, and then im working on one that lives in a VM. Can i put them both in the same git?"},{"role":"assistant","content":"Yes, you can use the same Git repository for your code on both your main computer and the VM. Heres how you can do it:\n\n1. **Create a Git Repository**: You can create a Git repository on your main computer for your program. Use the command `git init` in your project's directory to initialize it.\n\n2. **Commit Changes**: As you make changes to your code on your main computer, you can use `git add .` to stage changes and `git commit -m \"your message\"` to commit them.\n\n3. **Clone the Repository on your VM**: You can then clone the Git repository on your VM using `git clone <repository-url>`. This gives you a copy of the same repository on your VM.\n\n4. **Work on Both**: You can now work on either machine. When you make changes on one, commit those changes, and then pull the latest changes from the other machine to keep everything in sync.\n\n5. **Push and Pull**: Use `git push` to push your changes to a remote repository (like GitHub) if you're using one. Use `git pull` on the other machine to pull the latest code.\n\nJust be careful about making sure you commit and push your changes before switching between machines to avoid conflicts. This setup allows you to maintain a single version of your code across both systems. If you need help with specific commands or setups, just let me know!"}]}
{"ts":"2025-11-12T20:33:16.511Z","turn":[{"role":"user","content":"hmm i guess that might be useful. Im writting code that is running on a VM. I want to test and experiment while i write the code. The VM is headless though so i've been doing all my editing and work via SSH and SFTP."},{"role":"assistant","content":"Using a VM and working through SSH and SFTP is a solid way to manage your coding environment! Git can definitely be beneficial in this setup as well. Heres how it can help:\n\n1. **Easier Code Management**: With Git, you can track changes to your code directly within the VM, making it easier to manage versions as you experiment.\n\n2. **Experiment Branches**: You can create branches for different experiments. This lets you test new features or fixes without affecting your main code, and you can easily switch back and forth between branches.\n\n3. **Collaborative Testing**: If you ever work with others or want to test things collaboratively, you can push changes to a remote repository, making it easy for others to pull and test the latest code.\n\n4. **Integrated with SSH**: You can run Git commands directly from your SSH terminal on the VM, making it straightforward to use. You wont need any additional tools.\n\n5. **Seamless Remote Access**: Since youre using SFTP, you can clone the repository to your VM. This way, you can easily pull updates you've made on your main computer, or push changes made on the VM back to the main repository.\n\nOverall, integrating Git into your workflow can enhance your coding experience while working in a headless environment. If you decide to go for it, I can help with the initial setup or any Git-related questions you might have!"}]}
{"ts":"2025-11-12T20:34:30.196Z","turn":[{"role":"user","content":"so is git like a compressed version of a directory?"},{"role":"assistant","content":"Not exactly, but that's a good way to think about some of its features! Here's a more precise breakdown:\n\n1. **Snapshots, Not Compression**: Git takes \"snapshots\" of your files at different points in time, meaning it records a full version of your directory's contents every time you make a commit. While this does involve storing the files, it doesn't compress them in the traditional sense—Git keeps track of changes efficiently, so only the differences between versions are stored after the first snapshot.\n\n2. **Directory Structure**: Git is designed to track changes in your source code, including the entire directory structure, file contents, and metadata. It allows you to manage multiple versions of a directory and the files within it.\n\n3. **Branching and Merging**: Git allows you to create branches, which can be thought of as \"parallel versions\" of your directory. Each branch can have different sets of changes, and you can merge these branches to integrate different features or fixes.\n\n4. **History Tracking**: Git also maintains a history of changes, so you can navigate through previous versions of your directory. This includes seeing what changes were made, who made them, and when they occurred.\n\nThe combination of these features makes Git a powerful tool for version control, allowing you to manage your projects and code changes more effectively. If you're curious about any specific aspect of Git or how it works, feel free to ask!"}]}
{"ts":"2025-11-12T20:40:33.207Z","turn":[{"role":"user","content":"hmm i still dont see the point of it. atleast not for this project. This project is mostly about learning how shit works, im not planning to push this as a product or collab with anyone"},{"role":"assistant","content":"That makes total sense! If this project is primarily for learning and experimentation, you might not need the full functionality of Git right now. Many developers start without it, especially when theyre working on smaller projects or learning something new.\n\nHere are a few reasons you might still consider using Git, even for personal learning projects:\n\n1. **Tracking Changes**: It can help you understand what changes you made over time. You can easily go back to a previous point if something breaks or if you want to see how you arrived at your current solution.\n\n2. **Learning Opportunity**: Getting familiar with Git now can make it easier when you start working on larger projects or collaborating in the future. It's a valuable skill in the software development world.\n\n3. **Experimentation**: If you're trying out different ideas, branches can help you manage experiments without affecting your main work.\n\nHowever, if you're comfortable managing your changes manually and dont see the benefit for this particular project, thats completely okay! Focus on what helps you learn best. If you have specific goals or areas you want to explore, Im here to help with that too!"}]}

39
core/relay/test-llm.js Normal file
View File

@@ -0,0 +1,39 @@
// test-llm.js
import path from "path";
import { fileURLToPath } from "url";
import dotenv from "dotenv";
import { callSpeechLLM } from "./lib/llm.js";
// ───────────────────────────────────────────────
// 🔧 Load environment
// ───────────────────────────────────────────────
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
const envPath = path.join(__dirname, "../.env");
dotenv.config({ path: envPath });
console.log("🔧 Using .env from:", envPath);
console.log("🔧 LLM_FORCE_BACKEND =", process.env.LLM_FORCE_BACKEND);
console.log("🔧 LLM_PRIMARY_URL =", process.env.LLM_PRIMARY_URL);
// ───────────────────────────────────────────────
// 🧪 Run a simple test message
// ───────────────────────────────────────────────
async function testLLM() {
console.log("🧪 Testing LLM helper...");
const messages = [
{ role: "user", content: "Say hello in five words or less." }
];
try {
const { reply, backend } = await callSpeechLLM(messages);
console.log(`✅ Reply: ${reply || "[no reply]"}`);
console.log(`Backend used: ${backend || "[unknown]"}`);
} catch (err) {
console.error("💥 Test failed:", err.message);
}
}
testLLM();

270
core/ui/index.html Normal file
View File

@@ -0,0 +1,270 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>Lyra Core Chat</title>
<link rel="stylesheet" href="style.css" />
<!-- PWA -->
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<meta name="mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<link rel="manifest" href="manifest.json" />
</head>
<body>
<div id="chat">
<!-- Model selector -->
<div id="model-select">
<label for="model">Model:</label>
<select id="model">
<option value="gpt-4o-mini">GPT-4o-mini (OpenAI)</option>
<option value="ollama:nollama/mythomax-l2-13b:Q5_K_S">Ollama MythoMax (3090)</option>
</select>
<div id="theme-toggle">
<button id="toggleThemeBtn">🌙 Dark Mode</button>
</div>
</div>
<!-- Session selector -->
<div id="session-select">
<label for="sessions">Session:</label>
<select id="sessions"></select>
<button id="newSessionBtn"> New</button>
<button id="renameSessionBtn">✏️ Rename</button>
</div>
<!-- Status -->
<div id="status">
<span id="status-dot"></span>
<span id="status-text">Checking Relay...</span>
</div>
<!-- Chat messages -->
<div id="messages"></div>
<!-- Input box -->
<div id="input">
<input id="userInput" type="text" placeholder="Type a message..." autofocus />
<button id="sendBtn">Send</button>
</div>
</div>
<script>
const RELAY_BASE = "http://10.0.0.40:7078";
const API_URL = `${RELAY_BASE}/v1/chat/completions`;
function generateSessionId() {
return "sess-" + Math.random().toString(36).substring(2, 10);
}
let history = [];
let currentSession = localStorage.getItem("currentSession") || null;
let sessions = JSON.parse(localStorage.getItem("sessions") || "[]");
function saveSessions() {
localStorage.setItem("sessions", JSON.stringify(sessions));
localStorage.setItem("currentSession", currentSession);
}
function renderSessions() {
const select = document.getElementById("sessions");
select.innerHTML = "";
sessions.forEach(s => {
const opt = document.createElement("option");
opt.value = s.id;
opt.textContent = s.name;
if (s.id === currentSession) opt.selected = true;
select.appendChild(opt);
});
}
function getSessionName(id) {
const s = sessions.find(s => s.id === id);
return s ? s.name : id;
}
async function loadSession(id) {
try {
const res = await fetch(`${RELAY_BASE}/sessions/${id}`);
const data = await res.json();
history = Array.isArray(data) ? data : [];
const messagesEl = document.getElementById("messages");
messagesEl.innerHTML = "";
history.forEach(m => addMessage(m.role, m.content));
addMessage("system", `📂 Loaded session: ${getSessionName(id)}${history.length} message(s)`);
} catch (e) {
addMessage("system", `Failed to load session: ${e.message}`);
}
}
async function saveSession() {
if (!currentSession) return;
try {
await fetch(`${RELAY_BASE}/sessions/${currentSession}`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(history)
});
} catch (e) {
addMessage("system", `Failed to save session: ${e.message}`);
}
}
async function sendMessage() {
const inputEl = document.getElementById("userInput");
const msg = inputEl.value.trim();
if (!msg) return;
inputEl.value = "";
addMessage("user", msg);
history.push({ role: "user", content: msg });
await saveSession(); // ✅ persist both user + assistant messages
const model = document.getElementById("model").value;
// make sure we always include a stable user_id
let userId = localStorage.getItem("userId");
if (!userId) {
userId = "brian"; // use whatever ID you seeded Mem0 with
localStorage.setItem("userId", userId);
}
const body = {
model: model,
messages: history,
sessionId: currentSession
};
try {
const resp = await fetch(API_URL, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(body)
});
const data = await resp.json();
const reply = data.choices?.[0]?.message?.content || "(no reply)";
addMessage("assistant", reply);
history.push({ role: "assistant", content: reply });
await saveSession();
} catch (err) {
addMessage("system", "Error: " + err.message);
}
}
function addMessage(role, text) {
const messagesEl = document.getElementById("messages");
const msgDiv = document.createElement("div");
msgDiv.className = `msg ${role}`;
msgDiv.textContent = text;
messagesEl.appendChild(msgDiv);
// only auto-scroll if user is near bottom
const threshold = 120;
const isNearBottom = messagesEl.scrollHeight - messagesEl.scrollTop - messagesEl.clientHeight < threshold;
if (isNearBottom) {
messagesEl.scrollTo({ top: messagesEl.scrollHeight, behavior: "smooth" });
}
}
async function checkHealth() {
try {
const resp = await fetch(API_URL.replace("/v1/chat/completions", "/_health"));
if (resp.ok) {
document.getElementById("status-dot").className = "dot ok";
document.getElementById("status-text").textContent = "Relay Online";
} else {
throw new Error("Bad status");
}
} catch (err) {
document.getElementById("status-dot").className = "dot fail";
document.getElementById("status-text").textContent = "Relay Offline";
}
}
document.addEventListener("DOMContentLoaded", () => {
// Dark mode toggle
const btn = document.getElementById("toggleThemeBtn");
btn.addEventListener("click", () => {
document.body.classList.toggle("dark");
const isDark = document.body.classList.contains("dark");
btn.textContent = isDark ? "☀️ Light Mode" : "🌙 Dark Mode";
localStorage.setItem("theme", isDark ? "dark" : "light");
});
if (localStorage.getItem("theme") === "dark") {
document.body.classList.add("dark");
btn.textContent = "☀️ Light Mode";
}
// Sessions
// Populate dropdown initially
renderSessions();
// Ensure we have at least one session
if (!currentSession) {
const id = generateSessionId();
const name = "default";
sessions.push({ id, name });
currentSession = id;
saveSessions();
renderSessions();
}
// Load current session history (if it exists on Relay)
loadSession(currentSession);
// Switch session
document.getElementById("sessions").addEventListener("change", async e => {
currentSession = e.target.value;
history = [];
saveSessions();
addMessage("system", `Switched to session: ${getSessionName(currentSession)}`);
await loadSession(currentSession); // ✅ load the chat history from Relay
});
// Create new session
document.getElementById("newSessionBtn").addEventListener("click", () => {
const name = prompt("Enter new session name:");
if (!name) return;
const id = generateSessionId();
sessions.push({ id, name });
currentSession = id;
history = [];
saveSessions();
renderSessions();
addMessage("system", `Created session: ${name}`);
});
// Rename session
document.getElementById("renameSessionBtn").addEventListener("click", () => {
const session = sessions.find(s => s.id === currentSession);
if (!session) return;
const newName = prompt("Rename session:", session.name);
if (!newName) return;
session.name = newName;
saveSessions();
renderSessions();
addMessage("system", `Session renamed to: ${newName}`);
});
// Health check
checkHealth();
setInterval(checkHealth, 10000);
// Input events
document.getElementById("sendBtn").addEventListener("click", sendMessage);
document.getElementById("userInput").addEventListener("keypress", e => {
if (e.key === "Enter") sendMessage();
});
});
</script>
</body>
</html>

20
core/ui/manifest.json Normal file
View File

@@ -0,0 +1,20 @@
{
"name": "Lyra Chat",
"short_name": "Lyra",
"start_url": "./index.html",
"display": "standalone",
"background_color": "#181818",
"theme_color": "#181818",
"icons": [
{
"src": "icon-192.png",
"sizes": "192x192",
"type": "image/png"
},
{
"src": "icon-512.png",
"sizes": "512x512",
"type": "image/png"
}
]
}

175
core/ui/style.css Normal file
View File

@@ -0,0 +1,175 @@
:root {
--bg-dark: #0a0a0a;
--bg-panel: rgba(255, 115, 0, 0.1);
--accent: #ff6600;
--accent-glow: 0 0 12px #ff6600cc;
--text-main: #e6e6e6;
--text-fade: #999;
--font-console: "IBM Plex Mono", monospace;
}
body {
margin: 0;
background: var(--bg-dark);
color: var(--text-main);
font-family: var(--font-console);
height: 100vh;
display: flex;
justify-content: center;
align-items: center;
}
#chat {
width: 95%;
max-width: 900px;
height: 95vh;
display: flex;
flex-direction: column;
border: 1px solid var(--accent);
border-radius: 10px;
box-shadow: var(--accent-glow);
background: linear-gradient(180deg, rgba(255,102,0,0.05) 0%, rgba(0,0,0,0.9) 100%);
overflow: hidden;
}
/* Header sections */
#model-select, #session-select, #status {
display: flex;
align-items: center;
gap: 8px;
padding: 8px 12px;
border-bottom: 1px solid var(--accent);
background-color: rgba(255, 102, 0, 0.05);
}
#status {
justify-content: flex-start;
border-top: 1px solid var(--accent);
}
label, select, button {
font-family: var(--font-console);
font-size: 0.9rem;
color: var(--text-main);
background: transparent;
border: 1px solid var(--accent);
border-radius: 4px;
padding: 4px 8px;
}
button:hover, select:hover {
box-shadow: 0 0 8px var(--accent);
cursor: pointer;
}
/* Chat area */
#messages {
flex: 1;
padding: 16px;
overflow-y: auto;
display: flex;
flex-direction: column;
gap: 8px;
scroll-behavior: smooth;
}
/* Messages */
.msg {
max-width: 80%;
padding: 10px 14px;
border-radius: 8px;
line-height: 1.4;
word-wrap: break-word;
box-shadow: 0 0 8px rgba(255,102,0,0.2);
}
.msg.user {
align-self: flex-end;
background: rgba(255,102,0,0.15);
border: 1px solid var(--accent);
}
.msg.assistant {
align-self: flex-start;
background: rgba(255,102,0,0.08);
border: 1px solid rgba(255,102,0,0.5);
}
.msg.system {
align-self: center;
font-size: 0.8rem;
color: var(--text-fade);
}
/* Input bar */
#input {
display: flex;
border-top: 1px solid var(--accent);
background: rgba(255, 102, 0, 0.05);
padding: 10px;
}
#userInput {
flex: 1;
background: transparent;
color: var(--text-main);
border: 1px solid var(--accent);
border-radius: 4px;
padding: 8px;
}
#sendBtn {
margin-left: 8px;
}
/* Relay status dot */
#status {
display: flex;
align-items: center;
margin: 10px 0;
gap: 8px;
font-family: monospace;
color: #f5f5f5;
}
#status-dot {
width: 10px;
height: 10px;
border-radius: 50%;
display: inline-block;
}
@keyframes pulseGreen {
0% { box-shadow: 0 0 5px #00ff66; opacity: 0.9; }
50% { box-shadow: 0 0 20px #00ff99; opacity: 1; }
100% { box-shadow: 0 0 5px #00ff66; opacity: 0.9; }
}
.dot.ok {
background: #00ff66;
animation: pulseGreen 2s infinite ease-in-out;
}
/* Offline state stays solid red */
.dot.fail {
background: #ff3333;
box-shadow: 0 0 10px #ff3333;
}
/* Dropdown (session selector) styling */
select {
background-color: #1a1a1a;
color: #f5f5f5;
border: 1px solid #b84a12;
border-radius: 6px;
padding: 4px 6px;
font-size: 14px;
}
select option {
background-color: #1a1a1a;
color: #f5f5f5;
}
/* Hover/focus for better visibility */
select:focus,
select:hover {
outline: none;
border-color: #ff7a33;
background-color: #222;
}