diff --git a/AUTHORS.rst b/AUTHORS.rst index 3dde015f..af0a7a65 100644 --- a/AUTHORS.rst +++ b/AUTHORS.rst @@ -45,3 +45,4 @@ Patches and suggestions - Jon Dufresne - Ville Skyttä - Jonathan Vanasco +- Tom Most diff --git a/CHANGES.rst b/CHANGES.rst index 047a7545..8690d749 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -32,7 +32,7 @@ Released on July 14, 2016 * Cease supporting DATrie under PyPy. -* **Remove ``PullDOM`` support, as this hasn't ever been properly +* **Remove PullDOM support, as this hasn't ever been properly tested, doesn't entirely work, and as far as I can tell is completely unused by anyone.** @@ -70,7 +70,7 @@ Released on July 14, 2016 to clarify their status as public.** * **Get rid of the sanitizer package. Merge sanitizer.sanitize into the - sanitizer.htmlsanitizer module and move that to saniziter. This means + sanitizer.htmlsanitizer module and move that to sanitizer. This means anyone who used sanitizer.sanitize or sanitizer.HTMLSanitizer needs no code changes.** diff --git a/doc/html5lib.rst b/doc/html5lib.rst index f0646aac..2a0b150f 100644 --- a/doc/html5lib.rst +++ b/doc/html5lib.rst @@ -1,13 +1,8 @@ html5lib Package ================ -:mod:`html5lib` Package ------------------------ - -.. automodule:: html5lib.__init__ - :members: - :undoc-members: - :show-inheritance: +.. automodule:: html5lib + :members: __version__ :mod:`constants` Module ----------------------- @@ -26,7 +21,7 @@ html5lib Package :show-inheritance: :mod:`serializer` Module ----------------------- +------------------------ .. automodule:: html5lib.serializer :members: @@ -41,4 +36,5 @@ Subpackages html5lib.filters html5lib.treebuilders html5lib.treewalkers + html5lib.treeadapters diff --git a/doc/html5lib.treeadapters.rst b/doc/html5lib.treeadapters.rst new file mode 100644 index 00000000..6b2dc78d --- /dev/null +++ b/doc/html5lib.treeadapters.rst @@ -0,0 +1,20 @@ +treebuilders Package +==================== + +:mod:`~html5lib.treeadapters` Package +------------------------------------- + +.. automodule:: html5lib.treeadapters + :members: + :undoc-members: + :show-inheritance: + +.. automodule:: html5lib.treeadapters.genshi + :members: + :undoc-members: + :show-inheritance: + +.. automodule:: html5lib.treeadapters.sax + :members: + :undoc-members: + :show-inheritance: diff --git a/doc/html5lib.treewalkers.rst b/doc/html5lib.treewalkers.rst index 46501258..085d8a98 100644 --- a/doc/html5lib.treewalkers.rst +++ b/doc/html5lib.treewalkers.rst @@ -10,7 +10,7 @@ treewalkers Package :show-inheritance: :mod:`base` Module -------------------- +------------------ .. automodule:: html5lib.treewalkers.base :members: @@ -34,7 +34,7 @@ treewalkers Package :show-inheritance: :mod:`etree_lxml` Module ------------------------ +------------------------ .. automodule:: html5lib.treewalkers.etree_lxml :members: @@ -43,9 +43,9 @@ treewalkers Package :mod:`genshi` Module --------------------------- +-------------------- .. automodule:: html5lib.treewalkers.genshi :members: :undoc-members: - :show-inheritance: \ No newline at end of file + :show-inheritance: diff --git a/doc/movingparts.rst b/doc/movingparts.rst index 80ee2ad1..6ba367a2 100644 --- a/doc/movingparts.rst +++ b/doc/movingparts.rst @@ -4,22 +4,25 @@ The moving parts html5lib consists of a number of components, which are responsible for handling its features. +Parsing uses a *tree builder* to generate a *tree*, the in-memory representation of the document. +Several tree representations are supported, as are translations to other formats via *tree adapters*. +The tree may be translated to a token stream with a *tree walker*, from which :class:`~html5lib.serializer.HTMLSerializer` produces a stream of bytes. +The token stream may also be transformed by use of *filters* to accomplish tasks like sanitization. Tree builders ------------- The parser reads HTML by tokenizing the content and building a tree that -the user can later access. There are three main types of trees that -html5lib can build: +the user can later access. html5lib can build three types of trees: -* ``etree`` - this is the default; builds a tree based on ``xml.etree``, +* ``etree`` - this is the default; builds a tree based on :mod:`xml.etree`, which can be found in the standard library. Whenever possible, the accelerated ``ElementTree`` implementation (i.e. ``xml.etree.cElementTree`` on Python 2.x) is used. -* ``dom`` - builds a tree based on ``xml.dom.minidom``. +* ``dom`` - builds a tree based on :mod:`xml.dom.minidom`. -* ``lxml.etree`` - uses lxml's implementation of the ``ElementTree`` +* ``lxml`` - uses the :mod:`lxml.etree` implementation of the ``ElementTree`` API. The performance gains are relatively small compared to using the accelerated ``ElementTree`` module. @@ -31,21 +34,15 @@ You can specify the builder by name when using the shorthand API: with open("mydocument.html", "rb") as f: lxml_etree_document = html5lib.parse(f, treebuilder="lxml") -When instantiating a parser object, you have to pass a tree builder -class in the ``tree`` keyword attribute: +To get a builder class by name, use the :func:`~html5lib.treebuilders.getTreeBuilder` function. -.. code-block:: python - - import html5lib - parser = html5lib.HTMLParser(tree=SomeTreeBuilder) - document = parser.parse("
Hello World!") - -To get a builder class by name, use the ``getTreeBuilder`` function: +When instantiating a :class:`~html5lib.html5parser.HTMLParser` object, you must pass a tree builder class via the ``tree`` keyword attribute: .. code-block:: python import html5lib - parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom")) + TreeBuilder = html5lib.getTreeBuilder("dom") + parser = html5lib.HTMLParser(tree=TreeBuilder) minidom_document = parser.parse("
Hello World!")
The implementation of builders can be found in `html5lib/treebuilders/
@@ -55,17 +52,13 @@ The implementation of builders can be found in `html5lib/treebuilders/
Tree walkers
------------
-Once a tree is ready, you can work on it either manually, or using
-a tree walker, which provides a streaming view of the tree. html5lib
-provides walkers for all three supported types of trees (``etree``,
-``dom`` and ``lxml``).
+In addition to manipulating a tree directly, you can use a tree walker to generate a streaming view of it.
+html5lib provides walkers for ``etree``, ``dom``, and ``lxml`` trees, as well as ``genshi`` `markup streams Surprise!")
-
-HTMLTokenizer
-~~~~~~~~~~~~~
-
-This is the default tokenizer, the heart of html5lib. The implementation
-can be found in `html5lib/tokenizer.py
- Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages. Alternative Proxies:Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.