doc: add readme.md
This commit is contained in:
275
README.md
Normal file
275
README.md
Normal file
@@ -0,0 +1,275 @@
|
||||
# TMI RAG (rag-tmi)
|
||||
|
||||
A lightweight **local Retrieval-Augmented Generation (RAG) system** used to index and search technical documentation and source code across the Terra-Mechanics development workspace.
|
||||
|
||||
This tool allows Claude Code, Codex, or any LLM assistant to retrieve relevant context from engineering documentation before answering questions.
|
||||
|
||||
The goal is to create a **searchable semantic memory layer** for projects such as:
|
||||
|
||||
- Terra-View
|
||||
- Seismo Relay
|
||||
- Series 3 / Minimate protocol research
|
||||
- Modem / SLM documentation
|
||||
- Reverse engineering notes
|
||||
- Parser documentation
|
||||
|
||||
Instead of manually searching repos or notes, the system retrieves relevant information using **vector similarity search**.
|
||||
|
||||
---
|
||||
|
||||
# How It Works
|
||||
|
||||
The system follows a standard RAG architecture.
|
||||
|
||||
### 1. Indexing
|
||||
|
||||
`ingest.py`:
|
||||
|
||||
1. Reads directories listed in `sources.yaml`
|
||||
2. Recursively scans for supported files (`.md`, `.txt`, `.py`)
|
||||
3. Splits content into chunks
|
||||
4. Generates embeddings using OpenAI
|
||||
5. Stores vectors in a FAISS index
|
||||
|
||||
Result:
|
||||
|
||||
```
|
||||
index/
|
||||
index.faiss
|
||||
meta.pkl
|
||||
```
|
||||
|
||||
This becomes the **semantic database**.
|
||||
|
||||
---
|
||||
|
||||
### 2. Querying
|
||||
|
||||
`query.py`:
|
||||
|
||||
1. Embeds the query text
|
||||
2. Searches the FAISS vector index
|
||||
3. Returns the most relevant chunks of text
|
||||
|
||||
These results can be pasted into an LLM chat (Claude Code, Codex, etc.) as context.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
python query.py "checksum algorithm"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Repository Structure
|
||||
|
||||
```
|
||||
rag-tmi/
|
||||
│
|
||||
├─ ingest.py
|
||||
├─ query.py
|
||||
├─ sources.yaml
|
||||
├─ requirements.txt
|
||||
├─ README.md
|
||||
│
|
||||
├─ index/ # Generated FAISS index (not committed)
|
||||
│
|
||||
└─ .venv/ # Local Python environment (not committed)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Installation
|
||||
|
||||
Create a virtual environment:
|
||||
|
||||
```
|
||||
python -m venv .venv
|
||||
```
|
||||
|
||||
Activate it.
|
||||
|
||||
Windows (PowerShell):
|
||||
|
||||
```
|
||||
.venv\Scripts\activate
|
||||
```
|
||||
|
||||
Install dependencies:
|
||||
|
||||
```
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Environment Variables
|
||||
|
||||
Create a `.env` file in the project root.
|
||||
|
||||
```
|
||||
OPENAI_API_KEY=your_key_here
|
||||
```
|
||||
|
||||
The key is loaded automatically by:
|
||||
|
||||
```
|
||||
python-dotenv
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Running the Indexer
|
||||
|
||||
Build the vector database:
|
||||
|
||||
```
|
||||
python ingest.py
|
||||
```
|
||||
|
||||
The script will:
|
||||
|
||||
- scan all configured sources
|
||||
- generate embeddings
|
||||
- build a FAISS index
|
||||
|
||||
You should see progress like:
|
||||
|
||||
```
|
||||
Embedding 72 chunks
|
||||
```
|
||||
|
||||
The index will be written to:
|
||||
|
||||
```
|
||||
index/index.faiss
|
||||
index/meta.pkl
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Querying the Database
|
||||
|
||||
Run:
|
||||
|
||||
```
|
||||
python query.py "frame checksum"
|
||||
```
|
||||
|
||||
The system will return the most relevant chunks from indexed documents.
|
||||
|
||||
These chunks can be copied into Claude or Codex to provide **accurate context** for responses.
|
||||
|
||||
---
|
||||
|
||||
# sources.yaml
|
||||
|
||||
This file defines which directories are indexed.
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
sources:
|
||||
- ../terra-view
|
||||
- ../seismo-relay
|
||||
- ../series3-agent
|
||||
- ../protocol-docs
|
||||
```
|
||||
|
||||
Paths are relative to the `rag-tmi` directory.
|
||||
|
||||
---
|
||||
|
||||
# Supported File Types
|
||||
|
||||
Currently indexed:
|
||||
|
||||
```
|
||||
.md
|
||||
.txt
|
||||
.py
|
||||
```
|
||||
|
||||
Additional types can be added inside `ingest.py`.
|
||||
|
||||
---
|
||||
|
||||
# Git Notes
|
||||
|
||||
The following files should **not be committed**:
|
||||
|
||||
```
|
||||
.venv/
|
||||
.env
|
||||
index/
|
||||
__pycache__/
|
||||
```
|
||||
|
||||
Example `.gitignore` entries:
|
||||
|
||||
```
|
||||
.venv/
|
||||
.env
|
||||
index/
|
||||
__pycache__/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Why This Exists
|
||||
|
||||
Large engineering projects accumulate knowledge across:
|
||||
|
||||
- source code
|
||||
- documentation
|
||||
- research notes
|
||||
- reverse engineering logs
|
||||
|
||||
Traditional search tools rely on **exact text matches**.
|
||||
|
||||
RAG enables **semantic search**, meaning queries like:
|
||||
|
||||
```
|
||||
"frame checksum"
|
||||
```
|
||||
|
||||
can retrieve documentation that contains phrases like:
|
||||
|
||||
```
|
||||
payload CRC
|
||||
frame validation
|
||||
checksum calculation
|
||||
```
|
||||
|
||||
even if the exact words do not match.
|
||||
|
||||
---
|
||||
|
||||
# Future Improvements
|
||||
|
||||
Possible upgrades:
|
||||
|
||||
- CLI command wrapper (`rag "query"`)
|
||||
- automatic repo indexing
|
||||
- incremental indexing
|
||||
- MCP server for AI tool access
|
||||
- reranking model
|
||||
- code-aware chunking
|
||||
- web UI for search
|
||||
|
||||
---
|
||||
|
||||
# Philosophy
|
||||
|
||||
This tool acts as a **local memory layer** for engineering work.
|
||||
|
||||
Instead of searching through repos manually, developers and AI assistants can retrieve the most relevant information instantly.
|
||||
|
||||
The system is intentionally simple:
|
||||
|
||||
- local
|
||||
- transparent
|
||||
- easy to rebuild
|
||||
- no heavy frameworks
|
||||
|
||||
It follows the principle that **small tools that solve real problems are better than complex systems that are hard to maintain.**
|
||||
Reference in New Issue
Block a user