docs: add opendataloader_pdf integration page

hnc-hyunheejo · hnc-hyunheejo · commit 91e312db97ba · 2025-10-02T10:21:21.000+09:00
Adds a complete integration page for **OpenDataLoader PDF** (`langchain-opendataloader-pdf`) document loader, including installation, initialization, quick start and parameters documentation. **Type:** New documentation page - GitHub issue: - https://github.com/opendataloader-project/opendataloader-pdf/issues - Feature PR: - https://github.com/opendataloader-project/opendataloader-pdf/pulls  - Linear issue: - - Slack thread: -  - [x] I have read the [contributing guidelines](README.md) - [x] I have tested my changes locally using `docs dev` - [x] All code examples have been tested and work correctly - [x] I have used **root relative** paths for internal links - [ ] I have updated navigation in `src/docs.json` if needed - [ ] I have gotten approval from the relevant reviewers - [ ] (Internal team members only / optional) I have created a preview deployment using the [Create Preview Branch workflow](https://github.com/langchain-ai/docs/actions/workflows/create-preview-branch.yml )  - New file: `docs/src/oss/python/integrations/document_loaders/opendataloader_pdf.mdx` `docs/src/oss/python/integrations/providers/opendataloader_pdf.mdx` - Also appended `OpenDataLoader PDF` to the loader index (`index.mdx`) and `all_providers.mdx` for discoverability。 ---------
diff --git a/src/oss/python/integrations/document_loaders/index.mdx b/src/oss/python/integrations/document_loaders/index.mdx
@@ -67,6 +67,7 @@ The below document loaders allow you to load PDF documents.
 | [Upstage Document Parse Loader](/oss/integrations/document_loaders/upstage) | Load PDF files using UpstageDocumentParseLoader | Package |
 | [Docling](/oss/integrations/document_loaders/docling) | Load PDF files using Docling | Package |
 | [UnDatasIO](/oss/integrations/document_loaders/undatasio) | Load PDF files using UnDatasIO | Package |
+| [OpenDataLoader PDF](/oss/integrations/document_loaders/opendataloader_pdf) | Load PDF files using OpenDataLoader PDF | Package |
 
 
 ### Cloud Providers
@@ -258,6 +259,7 @@ The below document loaders allow you to load data from common data formats.
 <Card title="Notion DB" icon="link" href="/oss/integrations/document_loaders/notion" arrow="true" cta="View guide" />
 <Card title="Nuclia" icon="link" href="/oss/integrations/document_loaders/nuclia" arrow="true" cta="View guide" />
 <Card title="Obsidian" icon="link" href="/oss/integrations/document_loaders/obsidian" arrow="true" cta="View guide" />
+<Card title="OpenDataLoader PDF" icon="link" href="/oss/integrations/document_loaders/opendataloader_pdf" arrow="true" cta="View guide" />
 <Card title="Open Document Format (ODT)" icon="link" href="/oss/integrations/document_loaders/odt" arrow="true" cta="View guide" />
 <Card title="Open City Data" icon="link" href="/oss/integrations/document_loaders/open_city_data" arrow="true" cta="View guide" />
 <Card title="Oracle Autonomous Database" icon="link" href="/oss/integrations/document_loaders/oracleadb_loader" arrow="true" cta="View guide" />
diff --git a/src/oss/python/integrations/document_loaders/opendataloader_pdf.mdx b/src/oss/python/integrations/document_loaders/opendataloader_pdf.mdx
@@ -0,0 +1,67 @@
+---
+title: OpenDataLoader PDF
+---
+
+**Safe, Open, High-Performance — PDF for AI**
+
+[OpenDataLoader PDF](https://github.com/opendataloader-project/opendataloader-pdf) converts PDFs into JSON, Markdown or Html — ready to feed into modern AI stacks (LLMs, vector search, and RAG).
+
+It reconstructs document layout (headings, lists, tables, and reading order) so the content is easier to chunk, index, and query.
+Powered by fast, heuristic, rule-based inference, it runs entirely on your local machine and delivers high-throughput processing for large document sets.
+AI-safety is enabled by default and automatically filters likely prompt-injection content embedded in PDFs to reduce downstream risk.
+
+## Overview
+
+### Integration details
+
+| Class | Package | Local | Serializable | JS support |
+| :--- | :--- | :---: | :---: |  :---: |
+| [OpenDataLoader PDF](https://github.com/opendataloader-project/opendataloader-pdf) | [langchain-opendataloader-pdf](https://pypi.org/project/langchain-opendataloader-pdf/) | ✅ | ❌ | ❌ |
+
+### Loader features
+
+| Source | Document Lazy Loading | Native Async Support
+| :---: | :---: | :---: |
+| OpenDataLoaderPDFLoader | ✅ | ❌ |
+
+The `OpenDataLoaderPDFLoader` component enables you to parse PDFs into structured `Document` objects.
+
+## Requirements
+- Python >= 3.9
+- Java 11 or newer available on the system `PATH`
+- opendataloader-pdf >= 1.1.1
+
+## Installation
+```bash
+pip install -U langchain-opendataloader-pdf
+```
+
+## Quick start
+```python
+from langchain_opendataloader_pdf import OpenDataLoaderPDFLoader
+
+loader = OpenDataLoaderPDFLoader(
+    file_path=["path/to/document.pdf", "path/to/folder"], 
+    format="text"
+)
+documents = loader.load()
+
+for doc in documents:
+    print(doc.metadata, doc.page_content[:80])
+```
+
+## Parameters
+
+| Parameter                | Type                  | Required   | Default      | Description                                                                                                        |
+|--------------------------|-----------------------| ---------- |--------------|--------------------------------------------------------------------------------------------------------------------|
+| `file_path`              | `List[str]`           | ✅ Yes     | —            | One or more PDF file paths or directories to process.                                                              |
+| `format`                 | `str`                 | No         | `None`       | Output formats (e.g. `"json"`, `"html"`, `"markdown"`, `"text"`).                                                  |
+| `quiet`                  | `bool`                | No         | `False`      | Suppresses CLI logging output when `True`.                                                                         |
+| `content_safety_off`     | `Optional[List[str]]` | No         | `None`       | List of content safety filters to disable (e.g. `"all"`, `"hidden-text"`, `"off-page"`, `"tiny"`, `"hidden-ocg"`). |
+
+## Additional Resources
+
+- [LangChain OpenDataLoader PDF integration GitHub](https://github.com/opendataloader-project/langchain-opendataloader-pdf)
+- [LangChain OpenDataLoader PDF integration PyPI package](https://pypi.org/project/langchain-opendataloader-pdf/)
+- [OpenDataLoader PDF GitHub](https://github.com/opendataloader-project/opendataloader-pdf)
+- [OpenDataLoader PDF Homepage](https://opendataloader.org/)
diff --git a/src/oss/python/integrations/providers/all_providers.mdx b/src/oss/python/integrations/providers/all_providers.mdx
@@ -1990,6 +1990,14 @@ Browse the complete collection of integrations available for Python. LangChain P
   >
     GPT models and comprehensive AI platform.
   </Card>
+  
+  <Card
+    title="OpenDataLoader PDF"
+    href="/oss/integrations/providers/opendataloader_pdf"
+    icon="link"
+  >
+    Safe, Open, High-Performance — PDF for AI
+  </Card>
 
   <Card
     title="OpenGradient"
diff --git a/src/oss/python/integrations/providers/opendataloader_pdf.mdx b/src/oss/python/integrations/providers/opendataloader_pdf.mdx
@@ -0,0 +1,51 @@
+---
+title: OpenDataLoader PDF
+---
+
+> **Safe, Open, High-Performance — PDF for AI**
+
+> [OpenDataLoader PDF](https://github.com/opendataloader-project/opendataloader-pdf) converts PDFs into JSON, Markdown or Html — ready to feed into modern AI stacks (LLMs, vector search, and RAG).
+> 
+> It reconstructs document layout (headings, lists, tables, and reading order) so the content is easier to chunk, index, and query.
+> Powered by fast, heuristic, rule-based inference, it runs entirely on your local machine and delivers high-throughput processing for large document sets.
+> AI-safety is enabled by default and automatically filters likely prompt-injection content embedded in PDFs to reduce downstream risk.
+
+## Requirements
+- Python >= 3.9
+- Java 11 or newer available on the system `PATH`
+- opendataloader-pdf >= 1.1.1
+
+## Installation
+```bash
+pip install -U langchain-opendataloader-pdf
+```
+
+## Quick start
+```python
+from langchain_opendataloader_pdf import OpenDataLoaderPDFLoader
+
+loader = OpenDataLoaderPDFLoader(
+    file_path=["path/to/document.pdf", "path/to/folder"], 
+    format="text"
+)
+documents = loader.load()
+
+for doc in documents:
+    print(doc.metadata, doc.page_content[:80])
+```
+
+## Parameters
+
+| Parameter                | Type                  | Required   | Default      | Description                                                                                                        |
+|--------------------------|-----------------------| ---------- |--------------|--------------------------------------------------------------------------------------------------------------------|
+| `file_path`              | `List[str]`           | ✅ Yes     | —            | One or more PDF file paths or directories to process.                                                              |
+| `format`                 | `str`                 | No         | `None`       | Output formats (e.g. `"json"`, `"html"`, `"markdown"`, `"text"`).                                                  |
+| `quiet`                  | `bool`                | No         | `False`      | Suppresses CLI logging output when `True`.                                                                         |
+| `content_safety_off`     | `Optional[List[str]]` | No         | `None`       | List of content safety filters to disable (e.g. `"all"`, `"hidden-text"`, `"off-page"`, `"tiny"`, `"hidden-ocg"`). |
+
+## Additional Resources
+
+- [LangChain OpenDataLoader PDF integration GitHub](https://github.com/opendataloader-project/langchain-opendataloader-pdf)
+- [LangChain OpenDataLoader PDF integration PyPI package](https://pypi.org/project/langchain-opendataloader-pdf/)
+- [OpenDataLoader PDF GitHub](https://github.com/opendataloader-project/opendataloader-pdf)
+- [OpenDataLoader PDF Homepage](https://opendataloader.org/)