PdfItDown is a python package that relies on markitdown by Microsoft, markdown_pdf and img2pdf. Visit us on our documentation website!
PdfItDown is applicable to the following file formats:
- Markdown
- PowerPoint
- Word
- Excel
- HTML
- Text-based formats (CSV, XML, JSON)
- ZIP files (iterates over contents)
- Image files (PNG, JPG)
The format-specific support needs to be evaluated for the specific reader you are using.
PdfItDown works in a very simple way:
- From markdown to PDF (default)
graph LR
2(Input File) --> 3[Markdown content]
3[Markdown content] --> 4[markdown-pdf]
4[markdown-pdf] --> 5(PDF file)
- From image to PDF (default)
graph LR
2(Input File) --> 3[Bytes]
3[Bytes] --> 4[img2pdf]
4[img2pdf] --> 5(PDF file)
- From other text-based file formats or unstructured file formats to PDF (default)
graph LR
2(Input File) --> 3[MarkitDown]
3[MarkitDown] --> 4[Markdown content]
4[Markdown content] --> 5[markdown-pdf]
5[markdown-pdf] --> 6(PDF file)
- Using a custom conversion callback
graph LR
2(Input File) --> 3[Conversion Callback]
3[Conversion Callback] --> 4(PDF file)
To install PdfItDown, just run:
pip install pdfitdownYou can now use the command line tool:
Usage: pdfitdown [OPTIONS]
Convert (almost) everything to PDF
Options:
-i, --inputfile TEXT Path to the input file(s) that need to be converted
to PDF. Can be used multiple times.
-o, --outputfile TEXT Path to the output PDF file(s). If more than one
input file is provided, you should provide an equal
number of output files.
-t, --title TEXT Title to include in the PDF metadata. Default: 'File
Converted with PdfItDown'. If more than one file is
provided, it will be ignored.
-d, --directory TEXT Directory whose files you want to bulk-convert to
PDF. If `--inputfile` is also provided, this option
will be ignored. Defaults to None.
--help Show this message and exit.
An example usage can be:
pdfitdown -i README.md -o README.pdf -t "README"Or you can use it inside your python scripts:
from pdfitdown.pdfconversion import Converter
converter = Converter()
converter.convert(file_path = "business_grow.md", output_path = "business_growth.pdf", title="Business Growth for Q3 in 2024")
converter.convert(file_path = "logo.png", output_path = "logo.pdf")
converter.convert(file_path = "users.xlsx", output_path = "users.pdf")You can also convert multiple files at once:
- In the CLI:
# with custom output paths
pdfitdown -i test0.png -i test1.md -o testoutput0.pdf -o testoutput1.pdf
# with inferred output paths
pdfitdown -i test0.png -i test1.csv- In the Python API:
from pdfitdown.pdfconversion import Converter
converter = Converter()
# with custom output paths
converter.multiple_convert(file_paths = ["business_growth.md", "logo.png"], output_paths = ["business_growth.pdf", "logo.pdf"])
# with inferred output paths
converter.multiple_convert(file_paths = ["business_growth.md", "logo.png"])You can bulk-convert all the files in a directory:
- In the CLI:
pdfitdown -d tests/data/testdir- In the Python API:
from pdfitdown.pdfconversion import Converter
converter = Converter()
output_paths = converter.convert_directory(directory_path = "tests/data/testdir")
print(output_paths)In the python API you can also define a custom callback for the conversion. In this example, we use Google Gemini to summarize a file and save its content as a PDF:
from pathlib import Path
from pdfitdown.pdfconversion import Converter
from markdown_pdf import MarkdownPdf, Section
from google import genai
client = genai.Client()
def conversion_callback(input_file: str, output_file: str, title: str | None = None, overwrite: bool = True)
uploaded_file = client.files.upload(file=Path(input_file))
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=["Give me a summary of this file.", uploaded_file],
)
content = response.text
pdf = MarkdownPdf(toc_level=0)
pdf.add_section(Section(content))
pdf.meta["title"] = title or "Summary by Gemini"
pdf.save(output_file)
return output_fle
converter = Converter(conversion_callback=conversion_callback)
converter.convert(file_path = "business_growth.md", output_path = "business_growth.pdf", title="Business Growth for Q3 in 2024")Moreover, the python API provides you with the possibility of mounting PdfItDown conversion features into a backend server built with Starlette and Starlette-compatible frameworks (such as FastAPI):
from starlette.applications import Starlette
from starlette.requests import Request
from startlette.responses import PlainTextResponse
from starlette.routing import Route
from pdfitdown.pdfconversion import Converter
from pdfitdown.server import mount
async def hello_world(request: Request) -> PlainTextResponse:
return PlainTextResponse(content="hello world!")
routes = Route("/helloworld", hello_world)
app = Starlette(routes=routes)
app = mount(app, converter=Converter(), path="/conversions/pdf", name="pdfitdown")Now you can send file payloads to the /conversions/pdf endpoint through POST requests and get the content of the converted file back, in the response content:
import httpx
with open("file.txt", "rb") as f:
content = f.read()
files = {"file_upload": ("file.txt", content, "text/plain")}
with httpx.Client() as client:
response = client.post("http://localhost:80/conversions/pdf", files=files)
assert response.status_code == 200
with open("file.pdf", "wb") as f:
f.write(response.content)Contributions are always welcome!
Find contribution guidelines at CONTRIBUTING.md
This project is open-source and is provided under an MIT License.
If you found it useful, please consider funding it.
