@@ -316,7 +316,7 @@ instances of the :class:`~.PageObjectRegistry` instead:
316
316
cool_gadget_fr_registry = PageObjectRegistry()
317
317
furniture_shop_registry = PageObjectRegistry()
318
318
319
- After declaring the :class: `~.PageObjectRegistry ` instances, they can be imported
319
+ After declaring the :class: `~.PageObjectRegistry ` instances, they can be used
320
320
in each of the Page Object packages like so:
321
321
322
322
.. code-block :: python
@@ -432,3 +432,173 @@ Retrieving all of the Product Listing Override rules would simply be:
432
432
433
433
# We can also filter it down further on a per site basis if needed.
434
434
rules = product_listings_registry.get_overrides_from(" my_page_obj_project.cool_gadget_site" )
435
+
436
+ Using Overrides from External Packages
437
+ --------------------------------------
438
+
439
+ Developers have the option to import existing Page Objects alongside the Override
440
+ Rules attached to them. This section aims to showcase different ways you can
441
+ play with the Registries to manipulate the Override Rules according to your needs.
442
+
443
+ Let's suppose we have the following use case before us:
444
+
445
+ - An external Python package named ``ecommerce_page_objects `` is available
446
+ which contains Page Objects for common websites. It's using the
447
+ ``default_registry `` from **web-poet **.
448
+ - Another similar package named ``gadget_sites_page_objects `` is available
449
+ for more specific websites. It's using its own registry named
450
+ ``gadget_registry ``.
451
+ - Your project's objectives is to handle as much eCommerce websites as you
452
+ can. Thus, you'd want to use the already available packages above and
453
+ perhaps improve on them or create new Page Objects for new websites.
454
+
455
+ Assuming that you'd want to **use all existing Override rules from the external
456
+ packages ** in your project, you can do it like:
457
+
458
+ .. code-block :: python
459
+
460
+ import ecommerce_page_objects
461
+ import gadget_sites_page_objects
462
+ from web_poet import PageObjectRegistry, consume_modules, default_registry
463
+
464
+ consume_modules(" ecommerce_page_objects" , " gadget_sites_page_objects" )
465
+
466
+ combined_registry = PageObjectRegistry()
467
+ combined_registry.data = {
468
+ # Since ecommerce_page_objects is using web_poet.default_registry, then
469
+ # it functions like a global registry which we can access as:
470
+ ** default_registry.data,
471
+
472
+ ** gadget_sites_page_objects.gadget_registry.data,
473
+ }
474
+
475
+ combined_rules = combined_registry.get_overrides()
476
+
477
+ # The combined_rules would be as follows:
478
+ # 1. OverrideRule(for_patterns=Patterns(include=['site_1.com'], exclude=[], priority=500), use=<class 'ecommerce_page_objects.site_1.EcomSite1'>, instead_of=<class 'ecommerce_page_objects.EcomGenericPage'>, meta={})
479
+ # 2. OverrideRule(for_patterns=Patterns(include=['site_2.com'], exclude=[], priority=500), use=<class 'ecommerce_page_objects.site_2.EcomSite2'>, instead_of=<class 'ecommerce_page_objects.EcomGenericPage'>, meta={})
480
+ # 3. OverrideRule(for_patterns=Patterns(include=['site_2.com'], exclude=[], priority=500), use=<class 'gadget_sites_page_objects.site_2.GadgetSite2'>, instead_of=<class 'gadget_sites_page_objects.GadgetGenericPage'>, meta={})
481
+ # 4. OverrideRule(for_patterns=Patterns(include=['site_3.com'], exclude=[], priority=500), use=<class 'gadget_sites_page_objects.site_3.GadgetSite3'>, instead_of=<class 'gadget_sites_page_objects.GadgetGenericPage'>, meta={})
482
+
483
+ .. note ::
484
+
485
+ Note that ``registry.get_overrides() == list(registry.data.values()) ``. We're
486
+ using ``registry.data `` for these cases so that we can easily look up specific
487
+ Page Objects using the ``dict ``'s key. Otherwise, it may become a problem on
488
+ large cases with lots of Override rules.
489
+
490
+ .. note ::
491
+
492
+ If you don't need the entire data contents of Registries, then you can opt
493
+ to use :meth: `~.PageObjectRegistry.data_from ` to easily filter them out
494
+ per package/module.
495
+
496
+ Here's an example:
497
+
498
+ .. code-block :: python
499
+
500
+ default_registry.data_from(" ecommerce_page_objects.site_1" , " ecommerce_page_objects.site_2" )
501
+
502
+ As you can see in the example above, we can easily combine the data from multiple
503
+ different registries as it simply follows a ``Dict[Callable, OverrideRule] ``
504
+ structure. There won't be any duplication or clashes of ``dict `` keys between
505
+ registries of different external packages since the keys are the Page Object
506
+ classes intended to be used. From our example above, the ``dict `` keys from a
507
+ given ``data `` registry attribute would be:
508
+
509
+ 1. ``<class 'ecommerce_page_objects.site_1.EcomSite1'> ``
510
+ 2. ``<class 'ecommerce_page_objects.site_2.EcomSite2'> ``
511
+ 3. ``<class 'gadget_sites_page_objects.site_2.GadgetSite2'> ``
512
+ 4. ``<class 'gadget_sites_page_objects.site_3.GadgetSite3'> ``
513
+
514
+ As you might've observed, combining the two Registries above may result in a
515
+ conflict for the Override rules for **#2 ** and **#3 **:
516
+
517
+ .. code-block :: python
518
+
519
+ # 2. OverrideRule(for_patterns=Patterns(include=['site_2.com'], exclude=[], priority=500), use=<class 'ecommerce_page_objects.site_2.EcomSite2'>, instead_of=<class 'ecommerce_page_objects.EcomGenericPage'>, meta={})
520
+ # 3. OverrideRule(for_patterns=Patterns(include=['site_2.com'], exclude=[], priority=500), use=<class 'gadget_sites_page_objects.site_2.GadgetSite2'>, instead_of=<class 'gadget_sites_page_objects.GadgetGenericPage'>, meta={})
521
+
522
+ The `url-matcher `_ library is the one responsible breaking such conflicts. It's
523
+ specifically discussed in this section: `rules-conflict-resolution
524
+ <https://url-matcher.readthedocs.io/en/stable/intro.html#rules-conflict-resolution> `_.
525
+
526
+ However, it's technically **NOT ** a conflict, **yet **, since:
527
+
528
+ - ``ecommerce_page_objects.site_2.EcomSite2 `` would only be used in **site_2.com **
529
+ if ``ecommerce_page_objects.EcomGenericPage `` is to be replaced.
530
+ - The same case with ``gadget_sites_page_objects.site_2.GadgetSite2 `` wherein
531
+ it's only going to be utilized for **site_2.com ** if the following is to be
532
+ replaced: ``gadget_sites_page_objects.GadgetGenericPage ``.
533
+
534
+ It would be only become a conflict if the **#2 ** and **#3 ** Override Rules for
535
+ **site_2.com ** both intend to replace the same Page Object. In fact, none of the
536
+ Override Rules above would ever be used if your project never intends to use the
537
+ following Page Objects *(since there's nothing to override) *. You can import
538
+ these Page Objects into your project and use them so they can be overridden:
539
+
540
+ - ``ecommerce_page_objects.EcomGenericPage ``
541
+ - ``gadget_sites_page_objects.GadgetGenericPage ``
542
+
543
+ However, let's assume that you want to create your own generic Page Object and
544
+ only intend to use it instead of the ones above. We can easily replace them like:
545
+
546
+ .. code-block :: python
547
+
548
+ class ImprovedEcommerceGenericPage :
549
+ def to_item (self ):
550
+ ... # different type of generic parsers
551
+
552
+ for _, rule in combined_registry.data.items():
553
+ rule.instead_of = ImprovedEcommerceGenericPage
554
+
555
+ updated_rules = combined_registry.get_overrides()
556
+
557
+ # The updated_rules would be as follows:
558
+ # 1. OverrideRule(for_patterns=Patterns(include=['site_1.com'], exclude=[], priority=500), use=<class 'ecommerce_page_objects.site_1.EcomSite1'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
559
+ # 2. OverrideRule(for_patterns=Patterns(include=['site_2.com'], exclude=[], priority=500), use=<class 'ecommerce_page_objects.site_2.EcomSite2'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
560
+ # 3. OverrideRule(for_patterns=Patterns(include=['site_2.com'], exclude=[], priority=500), use=<class 'gadget_sites_page_objects.site_2.GadgetSite2'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
561
+ # 4. OverrideRule(for_patterns=Patterns(include=['site_3.com'], exclude=[], priority=500), use=<class 'gadget_sites_page_objects.site_3.GadgetSite3'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
562
+
563
+ Now, **#2 ** and **#3 ** have a conflict since they now both intend to replace
564
+ ``ImprovedEcommerceGenericPage ``. As mentioned earlier, the `url-matcher `_
565
+ would be the one to resolve such conflicts.
566
+
567
+ However, it would help prevent future confusion if we could remove the source of
568
+ ambiguity in our Override Rules.
569
+
570
+ Suppose, we prefer ``gadget_sites_page_objects.site_2.GadgetSite2 `` more than
571
+ ``ecommerce_page_objects.site_2.EcomSite2 ``. As such, we could remove the latter:
572
+
573
+ .. code-block :: python
574
+
575
+ del combined_registry.data[ecommerce_page_objects.site_2.EcomSite2]
576
+
577
+ updated_rules = combined_registry.get_overrides()
578
+
579
+ # The newly updated_rules would be as follows:
580
+ # 1. OverrideRule(for_patterns=Patterns(include=['site_1.com'], exclude=[], priority=500), use=<class 'ecommerce_page_objects.site_1.EcomSite1'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
581
+ # 2. OverrideRule(for_patterns=Patterns(include=['site_2.com'], exclude=[], priority=500), use=<class 'ecommerce_page_objects.site_2.EcomSite2'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
582
+ # 3. OverrideRule(for_patterns=Patterns(include=['site_3.com'], exclude=[], priority=500), use=<class 'gadget_sites_page_objects.site_3.GadgetSite3'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
583
+
584
+ As discussed before, the Registry's data is structured simply as
585
+ ``Dict[Callable, OverrideRule] `` for which we can easily manipulate it via ``dict ``
586
+ operations.
587
+
588
+ Now, suppose we want to improve ``ecommerce_page_objects.site_1.EcomSite1 ``
589
+ from **#1 ** above by perhaps adding/fixing fields. We can do that by:
590
+
591
+ .. code-block :: python
592
+
593
+ class ImprovedEcomSite1 (ecommerce_page_objects .site_1 .EcomSite1 ):
594
+ def to_item (self ):
595
+ ... # replace and improve some of the parsers here
596
+
597
+ combined_registry.data[ecommerce_page_objects.site_1.EcomSite1].use = ImprovedEcomSite1
598
+
599
+ updated_rules = combined_registry.get_overrides()
600
+
601
+ # The newly updated_rules would be as follows:
602
+ # 1. OverrideRule(for_patterns=Patterns(include=['site_1.com'], exclude=[], priority=500), use=<class 'my_project.ImprovedEcomSite1'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
603
+ # 2. OverrideRule(for_patterns=Patterns(include=['site_2.com'], exclude=[], priority=500), use=<class 'gadget_sites_page_objects.site_2.GadgetSite2'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
604
+ # 3. OverrideRule(for_patterns=Patterns(include=['site_3.com'], exclude=[], priority=500), use=<class 'gadget_sites_page_objects.site_3.GadgetSite3'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
0 commit comments