Skip to content

Document HTML sanitation policy #1543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .spell-dict
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ rST
ryneeverett
sanitizer
sanitizers
sanitization
Sauder
schemeless
setuptools
Expand Down Expand Up @@ -168,6 +169,7 @@ workflow
Xanthakis
XHTML
xhtml
XSS
YAML
Yunusov
inline
Expand Down
82 changes: 78 additions & 4 deletions docs/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,33 @@ instance of the `markdown.Markdown` class and pass multiple documents through
it. If you do use a single instance though, make sure to call the `reset`
method appropriately ([see below](#convert)).

### markdown.markdown(text [, **kwargs]) {: #markdown data-toc-label='markdown.markdown' }
### `markdown.markdown(text [, **kwargs])` {: #markdown data-toc-label='markdown.markdown' }

!!! warning

The Python-Markdown library does ***not*** sanitize its HTML output. If
you are processing Markdown input from an untrusted source, it is your
responsibility to ensure that it is properly sanitized. See [Markdown and
XSS] for an overview of some of the dangers and [Improper markup
sanitization in popular software] for notes on best practices to ensure
HTML is properly sanitized.

The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1]
as a sanitizer on the output of `markdown.markdown`. However, be
aware that those libraries may not be sufficient in themselves and will
likely require customization. Some useful lists of allowed tags and
attributes can be found in the [`bleach-allowlist`][bleach-allowlist] library, which should
work with either sanitizer.


[Markdown and XSS]: https://michelf.ca/blog/2010/markdown-and-xss/
[Improper markup sanitization in popular software]: https://github.com/ChALkeR/notes/blob/master/Improper-markup-sanitization.md
[nh3]: https://nh3.readthedocs.io/en/latest/
[bleach]: http://bleach.readthedocs.org/en/latest/
[bleach-allowlist]: https://github.com/yourcelf/bleach-allowlist
[^1]: We are aware that the [bleach] project has been [deprecated](https://github.com/mozilla/bleach/issues/698).
However, it is the only pure-Python HTML sanitation library we are aware of and may be the only option for
those who cannot use [`nh3`][nh3] (Python bindings to a Rust library).

The following options are available on the `markdown.markdown` function:

Expand Down Expand Up @@ -216,7 +242,23 @@ __encoding__{: #encoding }
meet your specific needs, it is suggested that you write your own code
to handle your encoding/decoding needs.

### markdown.Markdown([**kwargs]) {: #Markdown data-toc-label='markdown.Markdown' }
!!! warning

The Python-Markdown library does ***not*** sanitize its HTML output. If
you are processing Markdown input from an untrusted source, it is your
responsibility to ensure that it is properly sanitized. See [Markdown and
XSS] for an overview of some of the dangers and [Improper markup
sanitization in popular software] for notes on best practices to ensure
HTML is properly sanitized.

The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1]
as a sanitizer on the output of `markdown.markdownFromFile`.
However, be aware that those libraries may not be sufficient in
themselves and will likely require customization. Some useful lists of
allowed tags and attributes can be found in the
[`bleach-allowlist`][bleach-allowlist] library, which should work with either sanitizer.

### `markdown.Markdown([**kwargs])` {: #Markdown data-toc-label='markdown.Markdown' }

The same options are available when initializing the `markdown.Markdown` class
as on the [`markdown.markdown`](#markdown) function, except that the class does
Expand All @@ -229,7 +271,7 @@ string must be passed to one of two instance methods.
the thread they were created in. A single instance should not be accessed
from multiple threads.

#### Markdown.convert(source) {: #convert data-toc-label='Markdown.convert' }
#### `Markdown.convert(source)` {: #convert data-toc-label='Markdown.convert' }

The `source` text must meet the same requirements as the [`text`](#text)
argument of the [`markdown.markdown`](#markdown) function.
Expand Down Expand Up @@ -258,7 +300,23 @@ To make this easier, you can also chain calls to `reset` together:
html3 = md.reset().convert(text3)
```

#### Markdown.convertFile(**kwargs) {: #convertFile data-toc-label='Markdown.convertFile' }
!!! warning

The Python-Markdown library does ***not*** sanitize its HTML output. If
you are processing Markdown input from an untrusted source, it is your
responsibility to ensure that it is properly sanitized. See [Markdown and
XSS] for an overview of some of the dangers and [Improper markup
sanitization in popular software] for notes on best practices to ensure
HTML is properly sanitized.

The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1]
as a sanitizer on the output of `Markdown.convert`. However, be
aware that those libraries may not be sufficient in themselves and will
likely require customization. Some useful lists of allowed tags and
attributes can be found in the [`bleach-allowlist`][bleach-allowlist] library, which should
work with either sanitizer.

#### `Markdown.convertFile(**kwargs)` {: #convertFile data-toc-label='Markdown.convertFile' }

The arguments of this method are identical to the arguments of the same
name on the `markdown.markdownFromFile` function ([`input`](#input),
Expand All @@ -267,3 +325,19 @@ name on the `markdown.markdownFromFile` function ([`input`](#input),
process multiple files without creating a new instance of the class for
each document. State may need to be `reset` between each call to
`convertFile` as is the case with `convert`.

!!! warning

The Python-Markdown library does ***not*** sanitize its HTML output. If
you are processing Markdown input from an untrusted source, it is your
responsibility to ensure that it is properly sanitized. See [Markdown and
XSS] for an overview of some of the dangers and [Improper markup
sanitization in popular software] for notes on best practices to ensure
HTML is properly sanitized.

The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1]
as a sanitizer on the output of `Markdown.convertFile`. However, be
aware that those libraries may not be sufficient in themselves and will
likely require customization. Some useful lists of allowed tags and
attributes can be found in the [`bleach-allowlist`][bleach-allowlist] library, which should
work with either sanitizer.