Notion exports with chapter titles #101

Harsh14901 · 2025-09-04T16:20:09Z

Summary of proposed feature implementations

The original database is modified to keep track of the formatting options that were used to upload content. For example if originally the page was written with enable_location=False, and we ran the script a second time with enable_location=True, the original behaviour was to not change the content, new behaviour is to detect this change and rewrite the page. Changes in the following options are detected. The new database can be found here
- enable_location
- enable_highlight_date
- #highlights
- last highlighted date

Example new page:

Originally uploading with separate_blocks=True was slower. Optimized it by using the full notion API bandwith of 100 blocks per request. My intention is that separate_blocks=True would be the preferred approach and set to default as notion API has a limit of 100 characters on the paragraph anyways (as indicated by one of the comments of the original author).
Add a feature to segregate highlights using their corresponding chapter titles. This organizes the flow of the notion page so that it is more readable.
Example page to help better understand: https://www.notion.so/Thinking-in-Bets-264cbb15a7c781f8971eed3b07a7a6e1?source=copy_link
- This requires the kindle to be connected to the computer and the user knowing its mount point (kindle_root)
- For each book to be processed, the script tries to find the corresponding .mobi file in the kindle directory
- It then extracts the mobi file to html format. This also gives us the Table of Contents information in an xml file (The library used is mobi)
- We parse the TOC, and for each chapter title get its location in the HTML page. Similarly for each highlight we try to find its location in the HTML page. This location information is then used to put together a notion page like the following:

Often Kindle highlights can be overlapping. For example if you select 2 sentences but then you try to extend it to 4 sentences, kindle will make 2 separate entries in My Clippings.txt. First entry for 2 sentences and the second entry for 4 sentences. This PR tries to coalesce them into one so that we don't upload highlights that are subset of another highlight.

Technical improvements

Used pydantic models for structuring data everywhere
Added a rich logging interface with colors and error handling
Integrate uv and direnv
Fill in notion auth token and database reference uuid from env vars instead of supplying on the command line

* The database has additional fields to store whether the last upload included BlockQuotes, Location, Highlight Date. We now check that to decide if the page content needs to be refreshed or not * Optimize exporting as separate blocks by batching them in quotes of 100, which is the limit of notion API. * Page management is completely automatic at the moment, so manual changes are not expected to persist. * Add .envrc to work with direnv and uv.lock files for uv

* Add pydantic models for better validation * Simplify code throughout the codebase * Add support for pruning overlapping highlights to remove the clutter

* Figure out the correct mobi file from kindle device * MobiHandler class executes the following flow ** converts mobi to html ** parses the toc from the extraction ** fetches the start character number of each heading in the toc in the html doc * Before exporting we map every highlight to the heading that contains it based on position information offered by MobiHandler * Improve error handling to not leave notion in an inconsistent state

Harsh14901 added 5 commits September 2, 2025 17:36

Load notion auth token and db ref from .env file

bdd5ece

Improve model handling | Prune overlapping highlights

d1296d8

* Add pydantic models for better validation * Simplify code throughout the codebase * Add support for pruning overlapping highlights to remove the clutter

Add rich logging

daddf2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Notion exports with chapter titles #101

Notion exports with chapter titles #101

Uh oh!

Harsh14901 commented Sep 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Notion exports with chapter titles #101

Are you sure you want to change the base?

Notion exports with chapter titles #101

Uh oh!

Conversation

Harsh14901 commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of proposed feature implementations

Technical improvements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Harsh14901 commented Sep 4, 2025 •

edited

Loading