docs updated v0.7.0

2025-12-22 01:40:24 -05:00
parent b4613ac30c
commit f1471cde84
3 changed files with 478 additions and 38 deletions
--- a/README.md
+++ b/README.md
@@ -1,10 +1,12 @@
-# Project Lyra - README v0.6.0
+# Project Lyra - README v0.7.0

 Lyra is a modular persistent AI companion system with advanced reasoning capabilities and autonomous decision-making.
 It provides memory-backed chat using **Relay** + **Cortex** with integrated **Autonomy System**,
 featuring a multi-stage reasoning pipeline powered by HTTP-based LLM backends.

-**Current Version:** v0.6.0 (2025-12-18)
+**NEW in v0.7.0:** Standard Mode for simple chatbot functionality + UI backend selection + server-side session persistence
+
+**Current Version:** v0.7.0 (2025-12-21)

 > **Note:** As of v0.6.0, NeoMem is **disabled by default** while we work out integration hiccups in the pipeline. The autonomy system is being refined independently before full memory integration.

@@ -25,14 +27,18 @@ Project Lyra operates as a **single docker-compose deployment** with multiple Do
 - Coordinates all module interactions
 - OpenAI-compatible endpoint: `POST /v1/chat/completions`
 - Internal endpoint: `POST /chat`
- Routes messages through Cortex reasoning pipeline
+- Dual-mode routing: Standard Mode (simple chat) or Cortex Mode (full reasoning)
+- Server-side session persistence with file-based storage
+- Session management API: `GET/POST/PATCH/DELETE /sessions`
 - Manages async calls to Cortex ingest
 - *(NeoMem integration currently disabled in v0.6.0)*

 **2. UI** (Static HTML)
 - Browser-based chat interface with cyberpunk theme
- Connects to Relay
- Saves and loads sessions
+- **NEW:** Mode selector (Standard/Cortex) in header
+- **NEW:** Settings modal with backend selection and session management
+- **NEW:** Light/Dark mode toggle (dark by default)
+- Server-synced session management (persists across browsers and reboots)
 - OpenAI-compatible message format

 **3. NeoMem** (Python/FastAPI) - Port 7077 - **DISABLED IN v0.6.0**
@@ -49,15 +55,22 @@ Project Lyra operates as a **single docker-compose deployment** with multiple Do
 - Primary reasoning engine with multi-stage pipeline and autonomy system
 - **Includes embedded Intake module** (no separate service as of v0.5.1)
 - **Integrated Autonomy System** (NEW in v0.6.0) - See Autonomy System section below
- **4-Stage Processing:**
-  1. **Reflection** - Generates meta-awareness notes about conversation
-  2. **Reasoning** - Creates initial draft answer using context
-  3. **Refinement** - Polishes and improves the draft
-  4. **Persona** - Applies Lyra's personality and speaking style
+- **Dual Operating Modes:**
+  - **Standard Mode** (NEW in v0.7.0) - Simple chatbot with context retention
+    - Bypasses reflection, reasoning, refinement stages
+    - Direct LLM call with conversation history
+    - User-selectable backend (SECONDARY, OPENAI, or custom)
+    - Faster responses for coding and practical tasks
+  - **Cortex Mode** - Full 4-stage reasoning pipeline
+    1. **Reflection** - Generates meta-awareness notes about conversation
+    2. **Reasoning** - Creates initial draft answer using context
+    3. **Refinement** - Polishes and improves the draft
+    4. **Persona** - Applies Lyra's personality and speaking style
 - Integrates with Intake for short-term context via internal Python imports
 - Flexible LLM router supporting multiple backends via HTTP
 - **Endpoints:**
-  - `POST /reason` - Main reasoning pipeline
+  - `POST /reason` - Main reasoning pipeline (Cortex Mode)
+  - `POST /simple` - Direct LLM chat (Standard Mode) **NEW in v0.7.0**
  - `POST /ingest` - Receives conversation exchanges from Relay
  - `GET /health` - Service health check
  - `GET /debug/sessions` - Inspect in-memory SESSIONS state
@@ -129,12 +142,38 @@ The autonomy system operates in coordinated layers, all maintaining state in `se

 ---

-## Data Flow Architecture (v0.6.0)
+## Data Flow Architecture (v0.7.0)

-### Normal Message Flow:
+### Standard Mode Flow (NEW in v0.7.0):

 ```
-User (UI) → POST /v1/chat/completions
+User (UI) → POST /v1/chat/completions {mode: "standard", backend: "SECONDARY"}
+  ↓
+Relay (7078)
+  ↓ POST /simple
+Cortex (7081)
+  ↓ (internal Python call)
+Intake module → get_recent_messages() (last 20 messages)
+  ↓
+Direct LLM call (user-selected backend: SECONDARY/OPENAI/custom)
+  ↓
+Returns simple response to Relay
+  ↓
+Relay → POST /ingest (async)
+  ↓
+Cortex → add_exchange_internal() → SESSIONS buffer
+  ↓
+Relay → POST /sessions/:id (save session to file)
+  ↓
+Relay → UI (returns final response)
+
+Note: Bypasses reflection, reasoning, refinement, persona stages
+```
+
+### Cortex Mode Flow (Full Reasoning):
+
+```
+User (UI) → POST /v1/chat/completions {mode: "cortex"}
  ↓
 Relay (7078)
  ↓ POST /reason
@@ -158,11 +197,26 @@ Cortex → add_exchange_internal() → SESSIONS buffer
  ↓
 Autonomy System → Update self_state.json (pattern tracking)
  ↓
+Relay → POST /sessions/:id (save session to file)
+  ↓
 Relay → UI (returns final response)

 Note: NeoMem integration disabled in v0.6.0
 ```

+### Session Persistence Flow (NEW in v0.7.0):
+
+```
+UI loads → GET /sessions → Relay → List all sessions from files → UI dropdown
+User sends message → POST /sessions/:id → Relay → Save to sessions/*.json
+User renames session → PATCH /sessions/:id/metadata → Relay → Update *.meta.json
+User deletes session → DELETE /sessions/:id → Relay → Remove session files
+
+Sessions stored in: core/relay/sessions/
+- {sessionId}.json (conversation history)
+- {sessionId}.meta.json (name, timestamps, metadata)
+```
+
 ### Cortex 4-Stage Reasoning Pipeline:

 1. **Reflection** (`reflection.py`) - Cloud LLM (OpenAI)
@@ -196,6 +250,14 @@ Note: NeoMem integration disabled in v0.6.0
 - OpenAI-compatible endpoint: `POST /v1/chat/completions`
 - Internal endpoint: `POST /chat`
 - Health check: `GET /_health`
+- **NEW:** Dual-mode routing (Standard/Cortex)
+- **NEW:** Server-side session persistence with CRUD API
+- **NEW:** Session management endpoints:
+  - `GET /sessions` - List all sessions
+  - `GET /sessions/:id` - Retrieve session history
+  - `POST /sessions/:id` - Save session history
+  - `PATCH /sessions/:id/metadata` - Update session metadata
+  - `DELETE /sessions/:id` - Delete session
 - Async non-blocking calls to Cortex
 - Shared request handler for code reuse
 - Comprehensive error handling
@@ -210,19 +272,35 @@ Note: NeoMem integration disabled in v0.6.0

 **UI**:
 - Lightweight static HTML chat interface
- Cyberpunk theme
- Session save/load functionality
+- Cyberpunk theme with light/dark mode toggle
+- **NEW:** Mode selector (Standard/Cortex) in header
+- **NEW:** Settings modal (⚙ button) with:
+  - Backend selection for Standard Mode (SECONDARY/OPENAI/custom)
+  - Session management (view, delete sessions)
+  - Theme toggle (dark mode default)
+- **NEW:** Server-synced session management
+  - Sessions persist across browsers and reboots
+  - Rename sessions with custom names
+  - Delete sessions with confirmation
+  - Automatic session save on every message
 - OpenAI message format support

 ### Reasoning Layer

-**Cortex** (v0.5.1):
- Multi-stage reasoning pipeline (reflection → reasoning → refine → persona)
+**Cortex** (v0.7.0):
+- **NEW:** Dual operating modes:
+  - **Standard Mode** - Simple chat with context (`/simple` endpoint)
+    - User-selectable backend (SECONDARY, OPENAI, or custom)
+    - Full conversation history via Intake integration
+    - Bypasses reasoning pipeline for faster responses
+  - **Cortex Mode** - Full reasoning pipeline (`/reason` endpoint)
+    - Multi-stage processing: reflection → reasoning → refine → persona
+    - Per-stage backend selection
+    - Autonomy system integration
 - Flexible LLM backend routing via HTTP
- Per-stage backend selection
 - Async processing throughout
 - Embedded Intake module for short-term context
- `/reason`, `/ingest`, `/health`, `/debug/sessions`, `/debug/summary` endpoints
+- `/reason`, `/simple`, `/ingest`, `/health`, `/debug/sessions`, `/debug/summary` endpoints
 - Lenient error handling - never fails the chat pipeline

 **Intake** (Embedded Module):
@@ -327,7 +405,28 @@ The following LLM backends are accessed via HTTP (not part of docker-compose):

 ## Version History

-### v0.6.0 (2025-12-18) - Current Release
+### v0.7.0 (2025-12-21) - Current Release
+**Major Features: Standard Mode + Backend Selection + Session Persistence**
+- ✅ Added Standard Mode for simple chatbot functionality
+- ✅ UI mode selector (Standard/Cortex) in header
+- ✅ Settings modal with backend selection for Standard Mode
+- ✅ Server-side session persistence with file-based storage
+- ✅ Session management UI (view, rename, delete sessions)
+- ✅ Light/Dark mode toggle (dark by default)
+- ✅ Context retention in Standard Mode via Intake integration
+- ✅ Fixed modal positioning and z-index issues
+- ✅ Cortex `/simple` endpoint for direct LLM calls
+- ✅ Session CRUD API in Relay
+- ✅ Full backward compatibility - Cortex Mode unchanged
+
+**Key Changes:**
+- Standard Mode bypasses 6 of 7 reasoning stages for faster responses
+- Sessions now sync across browsers and survive container restarts
+- User can select SECONDARY (Ollama), OPENAI, or custom backend for Standard Mode
+- Theme preference and backend selection persisted in localStorage
+- Session files stored in `core/relay/sessions/` directory
+
+### v0.6.0 (2025-12-18)
 **Major Feature: Autonomy System (Phase 1, 2, and 2.5)**
 - ✅ Added autonomous decision-making framework
 - ✅ Implemented executive planning and goal-setting layer
@@ -394,30 +493,39 @@ The following LLM backends are accessed via HTTP (not part of docker-compose):

 ---

-## Known Issues (v0.6.0)
+## Known Issues (v0.7.0)

-### Temporarily Disabled (v0.6.0)
+### Temporarily Disabled
 - **NeoMem disabled by default** - Being refined independently before full integration
  - PostgreSQL + pgvector storage inactive
  - Neo4j graph database inactive
  - Memory persistence endpoints not active
 - RAG service (Beta Lyrae) currently disabled in docker-compose.yml

-### Non-Critical
- Session management endpoints not fully implemented in Relay
- Full autonomy system integration still being refined
- Memory retrieval integration pending NeoMem re-enablement
+### Standard Mode Limitations
+- No reflection, reasoning, or refinement stages (by design)
+- DeepSeek R1 not recommended for Standard Mode (generates reasoning artifacts)
+- No RAG integration (same as Cortex Mode - currently disabled)
+- No NeoMem memory storage (same as Cortex Mode - currently disabled)
+
+### Session Management Limitations
+- Sessions stored in container filesystem - requires volume mount for true persistence
+- No session import/export functionality yet
+- No session search or filtering
+- Old localStorage sessions don't automatically migrate to server

 ### Operational Notes
 - **Single-worker constraint**: Cortex must run with single Uvicorn worker to maintain SESSIONS state
  - Multi-worker scaling requires migrating SESSIONS to Redis or shared storage
 - Diagnostic endpoints (`/debug/sessions`, `/debug/summary`) available for troubleshooting
+- Backend selection only affects Standard Mode - Cortex Mode uses environment-configured backends

 ### Future Enhancements
 - Re-enable NeoMem integration after pipeline refinement
 - Full autonomy system maturation and optimization
 - Re-enable RAG service integration
- Implement full session persistence
+- Session import/export functionality
+- Session search and filtering UI
 - Migrate SESSIONS to Redis for multi-worker support
 - Add request correlation IDs for tracing
 - Comprehensive health checks across all services
@@ -457,17 +565,56 @@ The following LLM backends are accessed via HTTP (not part of docker-compose):
   curl http://localhost:7077/health
   ```

-4. Access the UI at `http://localhost:7078`
+4. Access the UI at `http://localhost:8081`
+
+### Using the UI
+
+**Mode Selection:**
+- Use the **Mode** dropdown in the header to switch between:
+  - **Standard** - Simple chatbot for coding and practical tasks
+  - **Cortex** - Full reasoning pipeline with autonomy features
+
+**Settings Menu:**
+1. Click the **⚙ Settings** button in the header
+2. **Backend Selection** (Standard Mode only):
+   - Choose **SECONDARY** (Ollama/Qwen on 3090) - Fast, local
+   - Choose **OPENAI** (GPT-4o-mini) - Cloud-based, high quality
+   - Enter custom backend name for advanced configurations
+3. **Session Management**:
+   - View all saved sessions with message counts and timestamps
+   - Click 🗑️ to delete unwanted sessions
+4. **Theme Toggle**:
+   - Click **🌙 Dark Mode** or **☀️ Light Mode** to switch themes
+
+**Session Management:**
+- Sessions automatically save on every message
+- Use the **Session** dropdown to switch between sessions
+- Click **➕ New** to create a new session
+- Click **✏️ Rename** to rename the current session
+- Sessions persist across browsers and container restarts

 ### Test

-**Test Relay → Cortex pipeline:**
+**Test Standard Mode:**
 ```bash
 curl -X POST http://localhost:7078/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
+    "mode": "standard",
+    "backend": "SECONDARY",
+    "messages": [{"role": "user", "content": "Hello!"}],
+    "sessionId": "test"
+  }'
+```
+
+**Test Cortex Mode (Full Reasoning):**
+```bash
+curl -X POST http://localhost:7078/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "mode": "cortex",
    "messages": [{"role": "user", "content": "Hello Lyra!"}],
-    "session_id": "test"
+    "sessionId": "test"
  }'
 ```

@@ -492,6 +639,21 @@ curl http://localhost:7081/debug/sessions
 curl "http://localhost:7081/debug/summary?session_id=test"
 ```

+**List all sessions:**
+```bash
+curl http://localhost:7078/sessions
+```
+
+**Get session history:**
+```bash
+curl http://localhost:7078/sessions/sess-abc123
+```
+
+**Delete a session:**
+```bash
+curl -X DELETE http://localhost:7078/sessions/sess-abc123
+```
+
 All backend databases (PostgreSQL and Neo4j) are automatically started as part of the docker-compose stack.

 ---
@@ -515,12 +677,13 @@ OPENAI_API_KEY=sk-...

 **Module-specific backend selection:**
 ```bash
-CORTEX_LLM=SECONDARY      # Use Ollama for reasoning
-INTAKE_LLM=PRIMARY        # Use llama.cpp for summarization
-SPEAK_LLM=OPENAI          # Use OpenAI for persona
-NEOMEM_LLM=PRIMARY        # Use llama.cpp for memory
-UI_LLM=OPENAI             # Use OpenAI for UI
-RELAY_LLM=PRIMARY         # Use llama.cpp for relay
+CORTEX_LLM=SECONDARY         # Use Ollama for reasoning
+INTAKE_LLM=PRIMARY           # Use llama.cpp for summarization
+SPEAK_LLM=OPENAI             # Use OpenAI for persona
+NEOMEM_LLM=PRIMARY           # Use llama.cpp for memory
+UI_LLM=OPENAI                # Use OpenAI for UI
+RELAY_LLM=PRIMARY            # Use llama.cpp for relay
+STANDARD_MODE_LLM=SECONDARY  # Default backend for Standard Mode (NEW in v0.7.0)
 ```

 ### Database Configuration
@@ -541,6 +704,7 @@ NEO4J_PASSWORD=neomemgraph
 NEOMEM_API=http://neomem-api:7077
 CORTEX_API=http://cortex:7081
 CORTEX_REASON_URL=http://cortex:7081/reason
+CORTEX_SIMPLE_URL=http://cortex:7081/simple      # NEW in v0.7.0
 CORTEX_INGEST_URL=http://cortex:7081/ingest
 RELAY_URL=http://relay:7078
 ```
@@ -685,7 +849,10 @@ NeoMem is a derivative work based on Mem0 OSS (Apache 2.0).
 ### Debugging Tips
 - Enable verbose logging: `VERBOSE_DEBUG=true` in `.env`
 - Check Cortex logs: `docker logs cortex -f`
+- Check Relay logs: `docker logs relay -f`
 - Inspect SESSIONS: `curl http://localhost:7081/debug/sessions`
 - Test summarization: `curl "http://localhost:7081/debug/summary?session_id=test"`
- Check Relay logs: `docker logs relay -f`
+- List sessions: `curl http://localhost:7078/sessions`
+- Test Standard Mode: `curl -X POST http://localhost:7078/v1/chat/completions -H "Content-Type: application/json" -d '{"mode":"standard","backend":"SECONDARY","messages":[{"role":"user","content":"test"}],"sessionId":"test"}'`
 - Monitor Docker network: `docker network inspect lyra_net`
+- Check session files: `ls -la core/relay/sessions/`