Local-first transaction enrichment:
- Normalize messy bank “payee” strings into stable canonical payees
- Infer
CategoryandBas-Lux(Basic/Luxury) from what you’ve already labeled - Improve over time via seeding + feedback (stored in a local SQLite DB)
This runs as a FastAPI service backed by a local sentence-transformers embedding model and nearest-neighbor matching against payees you’ve stored in data/enrichment.db.
Prereqs: Docker Desktop.
docker compose up --build- Web UI (when the React build is present):
http://localhost:8000/ - Load Transactions CSV requires
Posted Date/Date,Amount, andPayee/Description(optional:Category,Bas-Lux,Source;Idis always generated assha256(date + description + amount)). - API docs:
http://localhost:8000/docs - Data persists in
./data/enrichment.db(delete it to reset learning)
Note: the compose file pins platform: linux/amd64 for compatibility; on Apple Silicon this may run under emulation.
Prereqs: Python 3.11+.
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reload --port 8000Notes:
- First run may download the embedding model (
sentence-transformers/all-MiniLM-L6-v2). - If SQLite extension loading isn’t available, the service falls back to a pure-Python KNN backend. Check
GET /health→vector_backend(vec0vspython).
Prereqs: Node 20+.
cd client
npm install
npm run devThe dev server runs at http://localhost:5175 and proxies API calls to VITE_API_BASE_URL (default: http://localhost:8000).
Health check:
curl http://localhost:8000/health | jq .Enrich a batch:
curl -X POST http://localhost:8000/enrich_batch \
-H "Content-Type: application/json" \
-d '{
"transactions": [
{"transaction_id": "1", "payee_raw": "MYKIDSSPENDING.C WK4051295 ***********SAAS", "amount": -12.34},
{"transaction_id": "2", "payee_raw": "AMZN Mktp US*2A3B4C5D", "amount": -48.12}
]
}' | jq .Seed knowledge (canonical payees + aliases):
curl -X POST http://localhost:8000/insert \
-H "Content-Type: application/json" \
-d '{
"type": "canonical",
"payee": "MYKIDSSPENDING",
"category": "Kids / Allowance",
"bas_lux": "Basic"
}' | jq .
# Optional: map a raw string to the canonical payee
curl -X POST http://localhost:8000/insert \
-H "Content-Type: application/json" \
-d '{
"type": "alias",
"alias": "MYKIDSSPENDING.C WK4051295 ***********SAAS",
"canonical": "MYKIDSSPENDING"
}' | jq .Provide corrections (feedback loop):
curl -X POST http://localhost:8000/feedback \
-H "Content-Type: application/json" \
-d '{
"raw_payee": "MYKIDSSPENDING.C WK4051295 ***********SAAS",
"corrected_payee": "MYKIDSSPENDING",
"corrected_category": "Kids / Allowance",
"corrected_bas_lux": "Basic"
}' | jq .The helper script client/enrich_transactions.py reads CSV or Excel and writes an enriched output file.
Install client dependencies (not in requirements.txt):
pip install pandas openpyxl requests tqdmCSV input:
python client/enrich_transactions.py \
--input /path/to/all_transactions.csv \
--api http://localhost:8000 \
--batch-size 200XLSX input (reads “All transactions” sheet by default):
python client/enrich_transactions.py \
--input /path/to/Expenses-2025.11.xlsx \
--sheet "All transactions" \
--api http://localhost:8000 \
--batch-size 200Bootstrap the service DB from your existing Category column (and Bas-Lux if present):
python client/enrich_transactions.py \
--input /path/to/Expenses-2025.11.xlsx \
--sheet "All transactions" \
--api http://localhost:8000 \
--seed-from-categoryNotes:
- If the API reports
payee_count=0, the client will auto-seed from Category/Bas-Lux by default (disable with--no-auto-seed). - If the API already has payees but reports
bas_lux_count=0, seed just Bas-Lux without touching categories via--seed-from-bas-lux.
Custom output path:
python client/enrich_transactions.py \
--input all_transactions.csv \
--output all_transactions_enriched.csv \
--api http://localhost:8000What you get in the output:
Payee Clean(what the service compares/embeds)Normalized Payee(canonical name)AI Category,AI Bas-LuxPayee Confidence,Category Confidence,Bas-Lux Confidence,Bas-Lux MethodNeeds Review(true if low confidence and/orUncategorized)Enrich Error(per-row error text if something failed)
Run unit tests:
python -m unittest discover -s tests