1515This module defines a class :class: `HTMLParser ` which serves as the basis for
1616parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
1717
18- .. class :: HTMLParser(*, convert_charrefs=True)
18+ .. class :: HTMLParser(*, convert_charrefs=True, scripting=False )
1919
2020 Create a parser instance able to parse invalid markup.
2121
22- If *convert_charrefs * is `` True `` (the default), all character
23- references (except the ones in ``script ``/ ``style `` elements ) are
22+ If *convert_charrefs * is true (the default), all character
23+ references (except the ones in elements like ``script `` and ``style ``) are
2424 automatically converted to the corresponding Unicode characters.
2525
26+ If *scripting * is false (the default), the content of the ``noscript ``
27+ element is parsed normally; if it's true, it's returned as is without
28+ being parsed.
29+
2630 An :class: `.HTMLParser ` instance is fed HTML data and calls handler methods
2731 when start tags, end tags, text, comments, and other markup elements are
2832 encountered. The user should subclass :class: `.HTMLParser ` and override its
@@ -37,6 +41,9 @@ parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
3741 .. versionchanged :: 3.5
3842 The default value for argument *convert_charrefs * is now ``True ``.
3943
44+ .. versionchanged :: 3.11.15
45+ Added the *scripting * parameter.
46+
4047
4148Example HTML Parser Application
4249-------------------------------
@@ -159,24 +166,24 @@ implementations do nothing (except for :meth:`~HTMLParser.handle_startendtag`):
159166.. method :: HTMLParser.handle_data(data)
160167
161168 This method is called to process arbitrary data (e.g. text nodes and the
162- content of `` < script>...</script> `` and ``< style>...</style> ``).
169+ content of elements like `` script `` and ``style ``).
163170
164171
165172.. method :: HTMLParser.handle_entityref(name)
166173
167174 This method is called to process a named character reference of the form
168175 ``&name; `` (e.g. ``> ``), where *name * is a general entity reference
169- (e.g. ``'gt' ``). This method is never called if * convert_charrefs * is
170- `` True `` .
176+ (e.g. ``'gt' ``).
177+ This method is only called if * convert_charrefs * is false .
171178
172179
173180.. method :: HTMLParser.handle_charref(name)
174181
175182 This method is called to process decimal and hexadecimal numeric character
176183 references of the form :samp: `&#{ NNN } ; ` and :samp: `&#x{ NNN } ; `. For example, the decimal
177184 equivalent for ``> `` is ``> ``, whereas the hexadecimal is ``> ``;
178- in this case the method will receive ``'62' `` or ``'x3E' ``. This method
179- is never called if *convert_charrefs * is `` True `` .
185+ in this case the method will receive ``'62' `` or ``'x3E' ``.
186+ This method is only called if *convert_charrefs * is false .
180187
181188
182189.. method :: HTMLParser.handle_comment(data)
@@ -284,8 +291,8 @@ Parsing an element with a few attributes and a title::
284291 Data : Python
285292 End tag : h1
286293
287- The content of ``script `` and ``style `` elements is returned as is, without
288- further parsing::
294+ The content of elements like ``script `` and ``style `` is returned as is,
295+ without further parsing::
289296
290297 >>> parser.feed('<style type="text/css">#python { color: green }</style>')
291298 Start tag: style
@@ -294,10 +301,10 @@ further parsing::
294301 End tag : style
295302
296303 >>> parser.feed('<script type="text/javascript">'
297- ... 'alert("<strong>hello!</strong>");</script>')
304+ ... 'alert("<strong>hello! ☺ </strong>");</script>')
298305 Start tag: script
299306 attr: ('type', 'text/javascript')
300- Data : alert("<strong>hello!</strong>");
307+ Data : alert("<strong>hello! ☺ </strong>");
301308 End tag : script
302309
303310Parsing comments::
@@ -317,7 +324,7 @@ correct char (note: these 3 references are all equivalent to ``'>'``)::
317324
318325Feeding incomplete chunks to :meth: `~HTMLParser.feed ` works, but
319326:meth: `~HTMLParser.handle_data ` might be called more than once
320- (unless *convert_charrefs * is set to `` True ``) ::
327+ if *convert_charrefs * is false ::
321328
322329 >>> for chunk in ['<sp', 'an>buff', 'ered ', 'text</s', 'pan>']:
323330 ... parser.feed(chunk)
0 commit comments