html2rss
diff --git a/‎about.md
Lines changed: 44 additions & 0 deletions b/‎about.md
Lines changed: 44 additions & 0 deletions
diff --git a/‎api-reference.md
Lines changed: 55 additions & 0 deletions b/‎api-reference.md
Lines changed: 55 additions & 0 deletions
diff --git a/‎components/html2rss-configs.md
Lines changed: 0 additions & 23 deletions b/‎components/html2rss-configs.md
Lines changed: 0 additions & 23 deletions
diff --git a/‎components/html2rss-web.md
Lines changed: 0 additions & 28 deletions b/‎components/html2rss-web.md
Lines changed: 0 additions & 28 deletions
diff --git a/‎components/html2rss.md
Lines changed: 0 additions & 32 deletions b/‎components/html2rss.md
Lines changed: 0 additions & 32 deletions
diff --git a/‎components/index.md
Lines changed: 0 additions & 7 deletions b/‎components/index.md
Lines changed: 0 additions & 7 deletions
diff --git a/‎configs/index.html
Lines changed: 2 additions & 2 deletions b/‎configs/index.html
Lines changed: 2 additions & 2 deletions
diff --git a/‎configuration/auto_source.md
Lines changed: 68 additions & 0 deletions b/‎configuration/auto_source.md
Lines changed: 68 additions & 0 deletions
diff --git a/‎configuration/channel.md
Lines changed: 35 additions & 0 deletions b/‎configuration/channel.md
Lines changed: 35 additions & 0 deletions
diff --git a/‎configuration/headers.md
Lines changed: 51 additions & 0 deletions b/‎configuration/headers.md
Lines changed: 51 additions & 0 deletions
@@ -0,0 +1,44 @@
+---
+layout: default
+title: About html2rss
+# nav_order: 2
+---
+
+# About html2rss
+
+`html2rss` is an open-source project dedicated to empowering users to take control of their web content consumption. In an age where many websites no longer offer traditional RSS feeds, `html2rss` bridges this gap by providing a robust and flexible solution for converting any HTML content into a structured RSS format.
+
+The project was started in 2018 and has since grown into a suite of tools that help users create and consume RSS feeds.
+
+---
+
+### Our Mission
+
+Our mission is to provide a simple, powerful, and accessible tool that enables individuals and developers to create custom RSS feeds from any web page. We believe in the power of open standards and the freedom to access information on your own terms.
+
+---
+
+### The html2rss Ecosystem
+
+The `html2rss` project is more than just a single tool. It's a collection of tools that work together to provide a complete RSS solution:
+
+- **[`html2rss`](https://github.com/html2rss/html2rss):** The core Ruby gem that provides the main functionality for converting HTML to RSS.
+- **[`html2rss-web`](https://github.com/html2rss/html2rss-web):** A web application that allows you to create and manage your RSS feeds through a user-friendly interface.
+- **[`html2rss-configs`](https://github.com/html2rss/html2rss-configs):** A collection of pre-built feed configs for popular websites, so you can get started quickly.
+
+---
+
+### Project Philosophy
+
+- **User Empowerment:** Give users the tools to customize their web experience.
+- **Simplicity & Power:** Offer an easy-to-use interface with powerful underlying capabilities.
+- **Open Source:** Foster a collaborative environment where the community can contribute and improve the project.
+- **Reliability:** Strive for a stable and dependable tool that consistently delivers.
+
+---
+
+### The Team
+
+`html2rss` is maintained by a dedicated group of volunteers and contributors from around the world. We are passionate about open source and committed to continuously improving the project.
+
+Want to join us? Check out our [Contributing Guide]({{ '/contributing/' | relative_url }})!
@@ -0,0 +1,55 @@
+---
+layout: default
+title: API Reference
+nav_order: 8
+---
+
+# API Reference
+
+This section provides a reference for the `html2rss` command-line interface (CLI).
+
+For detailed documentation on the Ruby API, please refer to the official YARD documentation.
+
+[**📚 View the Ruby API Docs on rubydoc.info**](https://www.rubydoc.info/gems/html2rss)
+
+---
+
+### Command-Line Interface (CLI)
+
+The `html2rss` executable provides the primary way to interact with the tool from your terminal.
+
+#### `html2rss auto <URL>`
+
+Automatically generates an RSS feed from the provided URL.
+
+- `<URL>` (Required): The URL of the website to generate a feed from.
+
+**Example:**
+
+```bash
+html2rss auto https://unmatchedstyle.com/
+```
+
+#### `html2rss feed <CONFIG_FILE>`
+
+Generates an RSS feed based on the provided YAML configuration file.
+
+- `<CONFIG_FILE>` (Required): Path to your YAML configuration file.
+
+**Examples:**
+
+```bash
+# Generate and print to console
+html2rss feed my_feed.yml
+
+# Generate and save to an XML file
+html2rss feed my_feed.yml > my_feed.xml
+```
+
+#### `html2rss help`
+
+Displays the help message with available commands and options.
+
+#### `html2rss --version`
+
+Displays the currently installed version of `html2rss`.
@@ -1,8 +1,8 @@
 ---
 layout: default
-title: All feeds
+title: Ready-to-use configs
 noindex: true
-nav_order: 1
+# nav_order: 1
 ---
 
 <noscript>
 
@@ -0,0 +1,68 @@
+---
+layout: default
+title: Auto Source
+nav_order: 4
+parent: Configuration
+---
+
+# `auto_source`
+
+The `auto_source` scraper is the easiest way to create a feed. It intelligently finds items on a page without requiring you to specify CSS selectors.
+
+You can enable it in your YAML config like this:
+
+```yaml
+channel:
+  url: https://example.com
+auto_source: {}
+```
+
+---
+
+## How it Works
+
+The `auto_source` scraper uses a series of strategies to find content:
+
+1.  **`schema`:** It looks for structured data in the form of `<script type="json/ld">` tags. Many websites use this to provide machine-readable information about their content, often following the [Schema.org](https://schema.org/) standard.
+2.  **`semantic_html`:** It searches for semantic HTML5 tags like `<article>`, `<main>`, and `<section>`. These tags are often used to define the main content of a page.
+3.  **`html`:** As a last resort, it analyzes the entire HTML structure to find frequently occurring selectors that are likely to contain the main content.
+
+---
+
+## Fine-Tuning `auto_source`
+
+You can customize the behavior of the `auto_source` scraper to improve its accuracy.
+
+### Scraper Options
+
+You can enable or disable specific scrapers and adjust their settings.
+
+```yaml
+auto_source:
+  scraper:
+    schema:
+      enabled: false # default: true
+    semantic_html:
+      enabled: false # default: true
+    html:
+      enabled: true
+      minimum_selector_frequency: 3 # default: 2
+      use_top_selectors: 3 # default: 5
+```
+
+- `minimum_selector_frequency`: The minimum number of times a selector must appear to be considered a candidate for the main content.
+- `use_top_selectors`: The number of top candidate selectors to consider.
+
+### Cleanup Options
+
+You can also clean up the results to remove unwanted items.
+
+```yaml
+auto_source:
+  cleanup:
+    keep_different_domain: false # default: true
+    min_words_title: 4 # default: 3
+```
+
+- `keep_different_domain`: Whether to keep items that link to a different domain.
+- `min_words_title`: The minimum number of words a title must have to be included.
@@ -0,0 +1,35 @@
+---
+layout: default
+title: Channel
+nav_order: 1
+parent: Configuration
+---
+
+# `channel`
+
+The `channel` key contains information about the RSS feed itself, such as its title, URL, and description.
+
+```yaml
+channel:
+  url: https://example.com
+  title: "My Custom Feed"
+  description: "A feed of the latest news from Example.com"
+  author: "[email protected] (Jane Doe)"
+  ttl: 60
+  language: "en-us"
+  time_zone: "Europe/Berlin"
+```
+
+---
+
+## Channel Options
+
+| Attribute     | Required     | Type    | Default        | Remark                                                                                                                                  |
+| :------------ | :----------- | :------ | :------------- | :-------------------------------------------------------------------------------------------------------------------------------------- |
+| `url`         | **Required** | String  |                | The URL of the website to scrape.                                                                                                       |
+| `title`       | Optional     | String  | Auto-generated | The title of the RSS feed.                                                                                                              |
+| `description` | Optional     | String  | Auto-generated | Retrieved from meta description tags.                                                                                                   |
+| `author`      | Optional     | String  | Blank          | Format: `email (Name)`.                                                                                                                 |
+| `ttl`         | Optional     | Integer | Auto-generated | Time to live in minutes. `html2rss` will use the `max-age` from the response headers if available, otherwise it will default to `360`.  |
+| `language`    | Optional     | String  | Auto-generated | Determined by the `lang` attribute of the `<html>` tag.                                                                                 |
+| `time_zone`   | Optional     | String  | `'UTC'`        | The time zone to use for parsing dates. See a [list of valid time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones). |
@@ -0,0 +1,51 @@
+---
+layout: default
+title: Headers
+nav_order: 1
+parent: Configuration
+---
+
+# `headers`
+
+The `headers` key allows you to set custom HTTP headers for your requests. This is useful for accessing protected content or interacting with APIs.
+
+```yaml
+headers:
+  User-Agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
+  Authorization: "Bearer YOUR_TOKEN"
+```
+
+You can also set headers for APIs that require authorization or custom headers.
+
+Dynamic parameters can be used in headers to pass values at runtime. See [Advanced Topics](/configuration/advanced-topics/) for more details.
+
+## Example Configuration
+
+This example demonstrates how to add custom HTTP headers to your feed request:
+
+```yaml
+channel:
+  url: https://example.com/protected-content
+  headers:
+    User-Agent: "Mozilla/5.0 (compatible; html2rss/1.0)"
+    Authorization: "Bearer your_api_token_here"
+selectors:
+  items:
+    selector: ".article"
+  title:
+    selector: "h2.title"
+  url:
+    selector: "h2.title a"
+    extractor: "href"
+  description:
+    selector: ".summary"
+```
+
+### Explanation
+
+- **`channel.headers`**: Defines custom HTTP headers to include in the request.
+- **`User-Agent`**: Some websites require a specific user agent string.
+- **`Authorization`**: Example of an API token for protected content.
+- The rest of the configuration extracts articles as usual.
+
+Use this configuration to access content that requires authentication or specific headers.