Bookmark Chat

A web application that allows you to chat with your bookmarks. Downloads them from Pinboard, fetches each page, builds embeddings, and allows you to chat with them.

Features

Chat interface for interacting with your bookmarks
Bookmark management with tags and search
Settings for customizing the AI model and parameters
Dark mode support
Responsive design

Getting Started

Clone the repository
Install dependencies:
```
npm install
```
Run the development server:
```
npm run dev
```
Open http://localhost:3000 in your browser

Creating data from your Pinboard account

Prerequisites

Python Environment: Make sure you have Python 3.9+ installed
API Keys: You'll need:
- Pinboard API token (from https://pinboard.in/settings/password)
- OpenAI API key (from https://platform.openai.com/api-keys)

Step 1: Environment Setup

Create a virtual environment:

python3 -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install -r packages/fetcher/requirements.txt
pip install -r packages/embedder/requirements.txt

Set up your environment variables. Create a .env file in the project root:

# Pinboard API token (get from https://pinboard.in/settings/password)
PINBOARD_TOKEN=your_username:your_api_token

# OpenAI API key (get from https://platform.openai.com/api-keys)
OPENAI_API_KEY=sk-your-openai-api-key

Step 2: Download Bookmarks from Pinboard

Run the pinboard fetcher to download all your bookmarks:

python3 packages/fetcher/pinboard_fetcher.py

This will:

Fetch all your bookmarks from Pinboard
Save them to bookmarks.json in the project root
Show progress and any errors

Options:

--force: Force a fresh download (ignores cache)

Step 3: Download Page Content

Run the page fetcher to download the content of each bookmarked page:

python3 packages/fetcher/page_fetcher.py

This will:

Read the bookmarks from bookmarks.json
Download and parse the content of each webpage
Save the content to data/cache/pages/ with filename based on URL hash
Show progress and skip already downloaded pages

Options:

--force: Force re-download of all pages (ignores cache)
--limit N: Only process the first N bookmarks (useful for testing)

Step 4: Create Embeddings

Run the embedder to create embeddings for all the downloaded content:

python3 packages/embedder/embedder.py

This will:

Read the cached page content
Split text into chunks using the sophisticated TextChunker (500 tokens per chunk)
Create embeddings for each chunk using OpenAI's text-embedding-3-small model
Save embeddings to data/embeddings/ with filename based on URL hash
Track progress to avoid re-processing already embedded pages

Options:

--clean: Clear progress tracking and start fresh

Common Issues:

API Key Errors: Make sure your .env file is in the project root and contains valid API keys
Rate Limiting: If you hit rate limits:
- The scripts include delays between requests
- You can increase delays in the code if needed
- Consider using --limit to process fewer bookmarks at once
Memory Issues: For large bookmark collections:
- Process in smaller batches using --limit
- Ensure you have sufficient disk space for cached content
Network Issues: If downloads fail:
- Check your internet connection
- Some sites may block automated requests
- Use --force to retry failed downloads

File Locations:

Bookmarks: bookmarks.json (project root)
Cached pages: data/cache/pages/
Embeddings: data/embeddings/
Progress tracking: data/cache/embedder_progress.json

Monitoring Progress:

Each script shows progress as it runs
Check the console output for any errors
Use --clean flag to restart from scratch if needed

The scripts are designed to be resumable - you can stop and restart them, and they'll continue from where they left off.

Tech Stack

Next.js 14
React 18
TypeScript
Tailwind CSS
Zustand
Heroicons

Project Structure

src/
  ├── app/              # Next.js app directory
  ├── components/       # React components
  │   ├── chat/        # Chat-related components
  │   ├── bookmarks/   # Bookmark-related components
  │   ├── settings/    # Settings-related components
  │   └── layout/      # Layout components
  ├── store/           # Zustand store
  ├── types/           # TypeScript types
  └── utils/           # Utility functions

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
designdocs		designdocs
packages		packages
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bookmark Chat

Features

Getting Started

Creating data from your Pinboard account

Prerequisites

Step 1: Environment Setup

Step 2: Download Bookmarks from Pinboard

Step 3: Download Page Content

Step 4: Create Embeddings

Common Issues:

File Locations:

Monitoring Progress:

Tech Stack

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

drewvolpe/bookmarkchat

Folders and files

Latest commit

History

Repository files navigation

Bookmark Chat

Features

Getting Started

Creating data from your Pinboard account

Prerequisites

Step 1: Environment Setup

Step 2: Download Bookmarks from Pinboard

Step 3: Download Page Content

Step 4: Create Embeddings

Common Issues:

File Locations:

Monitoring Progress:

Tech Stack

Project Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages