Skip to content

judej/tx_enrich

Repository files navigation

tx_enrich

Local-first transaction enrichment:

  • Normalize messy bank “payee” strings into stable canonical payees
  • Infer Category and Bas-Lux (Basic/Luxury) from what you’ve already labeled
  • Improve over time via seeding + feedback (stored in a local SQLite DB)

This runs as a FastAPI service backed by a local sentence-transformers embedding model and nearest-neighbor matching against payees you’ve stored in data/enrichment.db.

Setup

Option A: Docker (recommended)

Prereqs: Docker Desktop.

docker compose up --build
  • Web UI (when the React build is present): http://localhost:8000/
  • Load Transactions CSV requires Posted Date/Date, Amount, and Payee/Description (optional: Category, Bas-Lux, Source; Id is always generated as sha256(date + description + amount)).
  • API docs: http://localhost:8000/docs
  • Data persists in ./data/enrichment.db (delete it to reset learning)

Note: the compose file pins platform: linux/amd64 for compatibility; on Apple Silicon this may run under emulation.

Option B: Run locally (Python)

Prereqs: Python 3.11+.

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reload --port 8000

Notes:

  • First run may download the embedding model (sentence-transformers/all-MiniLM-L6-v2).
  • If SQLite extension loading isn’t available, the service falls back to a pure-Python KNN backend. Check GET /healthvector_backend (vec0 vs python).

Optional: React UI (dev)

Prereqs: Node 20+.

cd client
npm install
npm run dev

The dev server runs at http://localhost:5175 and proxies API calls to VITE_API_BASE_URL (default: http://localhost:8000).

Using the API

Health check:

curl http://localhost:8000/health | jq .

Enrich a batch:

curl -X POST http://localhost:8000/enrich_batch \
  -H "Content-Type: application/json" \
  -d '{
    "transactions": [
      {"transaction_id": "1", "payee_raw": "MYKIDSSPENDING.C WK4051295 ***********SAAS", "amount": -12.34},
      {"transaction_id": "2", "payee_raw": "AMZN Mktp US*2A3B4C5D", "amount": -48.12}
    ]
  }' | jq .

Seed knowledge (canonical payees + aliases):

curl -X POST http://localhost:8000/insert \
  -H "Content-Type: application/json" \
  -d '{
    "type": "canonical",
    "payee": "MYKIDSSPENDING",
    "category": "Kids / Allowance",
    "bas_lux": "Basic"
  }' | jq .

# Optional: map a raw string to the canonical payee
curl -X POST http://localhost:8000/insert \
  -H "Content-Type: application/json" \
  -d '{
    "type": "alias",
    "alias": "MYKIDSSPENDING.C WK4051295 ***********SAAS",
    "canonical": "MYKIDSSPENDING"
  }' | jq .

Provide corrections (feedback loop):

curl -X POST http://localhost:8000/feedback \
  -H "Content-Type: application/json" \
  -d '{
    "raw_payee": "MYKIDSSPENDING.C WK4051295 ***********SAAS",
    "corrected_payee": "MYKIDSSPENDING",
    "corrected_category": "Kids / Allowance",
    "corrected_bas_lux": "Basic"
  }' | jq .

Enrich a CSV/XLSX export (client script)

The helper script client/enrich_transactions.py reads CSV or Excel and writes an enriched output file.

Install client dependencies (not in requirements.txt):

pip install pandas openpyxl requests tqdm

CSV input:

python client/enrich_transactions.py \
  --input /path/to/all_transactions.csv \
  --api http://localhost:8000 \
  --batch-size 200

XLSX input (reads “All transactions” sheet by default):

python client/enrich_transactions.py \
  --input /path/to/Expenses-2025.11.xlsx \
  --sheet "All transactions" \
  --api http://localhost:8000 \
  --batch-size 200

Bootstrap the service DB from your existing Category column (and Bas-Lux if present):

python client/enrich_transactions.py \
  --input /path/to/Expenses-2025.11.xlsx \
  --sheet "All transactions" \
  --api http://localhost:8000 \
  --seed-from-category

Notes:

  • If the API reports payee_count=0, the client will auto-seed from Category/Bas-Lux by default (disable with --no-auto-seed).
  • If the API already has payees but reports bas_lux_count=0, seed just Bas-Lux without touching categories via --seed-from-bas-lux.

Custom output path:

python client/enrich_transactions.py \
  --input all_transactions.csv \
  --output all_transactions_enriched.csv \
  --api http://localhost:8000

What you get in the output:

  • Payee Clean (what the service compares/embeds)
  • Normalized Payee (canonical name)
  • AI Category, AI Bas-Lux
  • Payee Confidence, Category Confidence, Bas-Lux Confidence, Bas-Lux Method
  • Needs Review (true if low confidence and/or Uncategorized)
  • Enrich Error (per-row error text if something failed)

Development

Run unit tests:

python -m unittest discover -s tests

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published