Skip to content

Commit 0a2d779

Browse files
committed
update override docs to showcase url-matcher patterns
1 parent 75593ed commit 0a2d779

File tree

1 file changed

+23
-15
lines changed

1 file changed

+23
-15
lines changed

docs/intro/overrides.rst

Lines changed: 23 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -70,21 +70,28 @@ Let's take a look at how the following code is structured:
7070
def to_item(self):
7171
... # more specific parsing
7272
73-
@handle_urls(["dualexample.com", "dualexample.net"], overrides=GenericProductPage)
73+
@handle_urls(["dualexample.com/shop/?product=*", "dualexample.net/store/?pid=*"], overrides=GenericProductPage)
7474
class DualExampleProductPage(ItemWebPage):
7575
def to_item(self):
7676
... # more specific parsing
7777
7878
The code above declares that:
7979

80-
- For sites that matches the ``example.com`` pattern, ``ExampleProductPage``
80+
- For sites that match the ``example.com`` pattern, ``ExampleProductPage``
8181
would be used instead of ``GenericProductPage``.
82-
- The same is true for ``YetAnotherExampleProductPage`` where it is used
83-
instead of ``GenericProductPage`` for two URLs: ``dualexample.com`` and
84-
``dualexample.net``.
85-
- However, ``AnotherExampleProductPage`` is only used instead of ``GenericProductPage``
86-
when we're parsing pages from ``anotherexample.com`` which doesn't contain
87-
``/digital-goods/`` in its URL path.
82+
- The same is true for ``DualExampleProductPage`` where it is used
83+
instead of ``GenericProductPage`` for two URL patterns which works as:
84+
85+
- **(match)** https://www.dualexample.com/shop/electronics/?product=123
86+
- **(match)** https://www.dualexample.com/shop/books/paperback/?product=849
87+
- (NO match) https://www.dualexample.com/on-sale/books/?product=923
88+
- **(match)** https://www.dualexample.net/store/kitchen/?pid=776
89+
- **(match)** https://www.dualexample.net/store/?pid=892
90+
- (NO match) https://www.dualexample.net/new-offers/fitness/?pid=892
91+
92+
- On the other hand, ``AnotherExampleProductPage`` is only used instead of
93+
``GenericProductPage`` when we're parsing pages from ``anotherexample.com``
94+
which doesn't contain ``/digital-goods/`` in its URL path.
8895

8996
The override mechanism that ``web-poet`` offers could still be further
9097
customized. You can read some of the specific parameters and alternative ways
@@ -115,10 +122,11 @@ code example below:
115122
def to_item(self):
116123
... # more specific parsing
117124
118-
@primary_registry.handle_urls(["dualexample.com", "dualexample.net"], overrides=GenericProductPage)
119-
@secondary_registry.handle_urls(["dualexample.com", "dualexample.net"], overrides=GenericProductPage)
125+
@primary_registry.handle_urls(["dualexample.com/shop/?product=*", "dualexample.net/store/?pid=*"], overrides=GenericProductPage)
126+
@secondary_registry.handle_urls(["dualexample.com/shop/?product=*", "dualexample.net/store/?pid=*"], overrides=GenericProductPage)
120127
class DualExampleProductPage(ItemWebPage):
121128
def to_item(self):
129+
... # more specific parsing
122130
123131
If you need more control over the Registry, you could instantiate your very
124132
own :class:`~.PageObjectRegistry` and use its ``@handle_urls`` to annotate and
@@ -159,11 +167,11 @@ like ``web_poet my_project.page_objects`` would produce the following:
159167

160168
.. code-block::
161169
162-
Use this instead of for the URL patterns except for the patterns with priority meta
163-
---------------------------------------------------- ------------------------------------------ -------------------------------------- ------------------------- --------------- ------
164-
my_project.page_objects.ExampleProductPage my_project.page_objects.GenericProductPage ['example.com'] [] 500 {}
165-
my_project.page_objects.AnotherExampleProductPage my_project.page_objects.GenericProductPage ['anotherexample.com'] ['/digital-goods/'] 500 {}
166-
my_project.page_objects.DualExampleProductPage my_project.page_objects.GenericProductPage ['dualexample.com', 'dualexample.net'] [] 500 {}
170+
Use this instead of for the URL patterns except for the patterns with priority meta
171+
---------------------------------------------------- ------------------------------------------ -------------------------------------- ------------------------- --------------- ------
172+
my_project.page_objects.ExampleProductPage my_project.page_objects.GenericProductPage ['example.com'] [] 500 {}
173+
my_project.page_objects.AnotherExampleProductPage my_project.page_objects.GenericProductPage ['anotherexample.com'] ['/digital-goods/'] 500 {}
174+
my_project.page_objects.DualExampleProductPage my_project.page_objects.GenericProductPage ['dualexample.com/shop/?product=*', 'dualexample.net/store/?pid=*'] [] 500 {}
167175
168176
Organizing Page Object Overrides
169177
--------------------------------

0 commit comments

Comments
 (0)