A web application that allows you to chat with your bookmarks. Downloads them from Pinboard, fetches each page, builds embeddings, and allows you to chat with them.
- Chat interface for interacting with your bookmarks
- Bookmark management with tags and search
- Settings for customizing the AI model and parameters
- Dark mode support
- Responsive design
- Clone the repository
- Install dependencies:
npm install
- Run the development server:
npm run dev
- Open http://localhost:3000 in your browser
- Python Environment: Make sure you have Python 3.9+ installed
- API Keys: You'll need:
- Pinboard API token (from https://pinboard.in/settings/password)
- OpenAI API key (from https://platform.openai.com/api-keys)
-
Create a virtual environment:
python3 -m venv .venv source .venv/bin/activate -
Install dependencies:
pip install -r packages/fetcher/requirements.txt pip install -r packages/embedder/requirements.txt
-
Set up your environment variables. Create a
.envfile in the project root:# Pinboard API token (get from https://pinboard.in/settings/password) PINBOARD_TOKEN=your_username:your_api_token # OpenAI API key (get from https://platform.openai.com/api-keys) OPENAI_API_KEY=sk-your-openai-api-key
Run the pinboard fetcher to download all your bookmarks:
python3 packages/fetcher/pinboard_fetcher.pyThis will:
- Fetch all your bookmarks from Pinboard
- Save them to
bookmarks.jsonin the project root - Show progress and any errors
Options:
--force: Force a fresh download (ignores cache)
Run the page fetcher to download the content of each bookmarked page:
python3 packages/fetcher/page_fetcher.pyThis will:
- Read the bookmarks from
bookmarks.json - Download and parse the content of each webpage
- Save the content to
data/cache/pages/with filename based on URL hash - Show progress and skip already downloaded pages
Options:
--force: Force re-download of all pages (ignores cache)--limit N: Only process the first N bookmarks (useful for testing)
Run the embedder to create embeddings for all the downloaded content:
python3 packages/embedder/embedder.pyThis will:
- Read the cached page content
- Split text into chunks using the sophisticated TextChunker (500 tokens per chunk)
- Create embeddings for each chunk using OpenAI's text-embedding-3-small model
- Save embeddings to
data/embeddings/with filename based on URL hash - Track progress to avoid re-processing already embedded pages
Options:
--clean: Clear progress tracking and start fresh
-
API Key Errors: Make sure your
.envfile is in the project root and contains valid API keys -
Rate Limiting: If you hit rate limits:
- The scripts include delays between requests
- You can increase delays in the code if needed
- Consider using
--limitto process fewer bookmarks at once
-
Memory Issues: For large bookmark collections:
- Process in smaller batches using
--limit - Ensure you have sufficient disk space for cached content
- Process in smaller batches using
-
Network Issues: If downloads fail:
- Check your internet connection
- Some sites may block automated requests
- Use
--forceto retry failed downloads
- Bookmarks:
bookmarks.json(project root) - Cached pages:
data/cache/pages/ - Embeddings:
data/embeddings/ - Progress tracking:
data/cache/embedder_progress.json
- Each script shows progress as it runs
- Check the console output for any errors
- Use
--cleanflag to restart from scratch if needed
The scripts are designed to be resumable - you can stop and restart them, and they'll continue from where they left off.
- Next.js 14
- React 18
- TypeScript
- Tailwind CSS
- Zustand
- Heroicons
src/
├── app/ # Next.js app directory
├── components/ # React components
│ ├── chat/ # Chat-related components
│ ├── bookmarks/ # Bookmark-related components
│ ├── settings/ # Settings-related components
│ └── layout/ # Layout components
├── store/ # Zustand store
├── types/ # TypeScript types
└── utils/ # Utility functions
MIT