BrowserHtml #43

kmike · 2022-05-22T18:38:58Z

In this PR url-less BrowserHtml is added. Unlike HttpResponseBody, its type is str, not bytes; this means selectors can be supported directly.

In autoextract-poet we had AutoextractHtml (https://github.com/scrapinghub/autoextract-poet/blob/aac08746c7ca9bc0baf07cfbf7773d616c26b1fb/autoextract_poet/page_inputs.py#L23), which is more similar to HttpResponse, as it contains URL.

I think we can add a similar class later, if needed. A design challenge would be to figure out wht should be a URL class - is the same as ResponseURL, or is it separate (BrowserURL) - assuming we're moving forward with #42.

…ented This allows to pick a public-facing name which fits better in different cases (e.g. .html or .text)

codecov · 2022-05-22T18:39:51Z

Codecov Report

Merging #43 (df68a24) into master (de6e24c) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master       #43   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           14        15    +1     
  Lines          320       330   +10     
=========================================
+ Hits           320       330   +10

Impacted Files	Coverage Δ
web_poet/__init__.py	`100.00% <ø> (ø)`
web_poet/mixins.py	`100.00% <100.00%> (ø)`
web_poet/page_inputs/__init__.py	`100.00% <100.00%> (ø)`
web_poet/page_inputs/browser.py	`100.00% <100.00%> (ø)`
web_poet/page_inputs/http.py	`100.00% <100.00%> (ø)`
web_poet/overrides.py	`100.00% <0.00%> (ø)`

It was required before because property was applied to a decorated method.

BurnzZ

LGTM @kmike !

BurnzZ · 2022-05-26T08:55:34Z

web_poet/mixins.py

+        # non-hashable classes, where memoizemethod_noargs doesn't work
+        if self.__cached_selector is not None:
+            return self.__cached_selector
+        # XXX: should we pass base_url=self.url, as Scrapy does?


I think we can remove this comment since it's being used by BrowserHtml which doesn't rely on a url. Or do you foresee a need for it later on @kmike ?

If that's a right thing to do (which I'm not sure about - it seems it's not needed), we'd need to have URL for selectors to work properly. In this case, having a class like BrowserResponse, which contains both URL and HTML (similar to what we had with AutoextractHtml), might be better.

That said, it won't be a part of BrowserHtml, so it does make sense to remove the comment, thanks @BurnzZ!

Actually I think it might be better to keep the comment, as it's a part of SelectableMixin, not of BrowserHtml class.

web_poet/page_inputs/browser.py

Co-authored-by: Adrián Chaves <[email protected]>

gatufo

LGMT!

kmike added 3 commits May 22, 2022 22:53

SelectableMixin with .selector property and .css/.xpath methods

c2cb35a

SelectorMixin: require explicit _selector_input function to be implem…

10b1570

…ented This allows to pick a public-facing name which fits better in different cases (e.g. .html or .text)

BrowserHtml

d6c3231

kmike marked this pull request as ready for review May 22, 2022 18:39

skipping type check is no longer needed for .selector attribute

e4ace93

It was required before because property was applied to a decorated method.

BurnzZ approved these changes May 26, 2022

View reviewed changes

expose web_poet.BrowserHtml

ed3af66

Gallaecio approved these changes May 26, 2022

View reviewed changes

web_poet/page_inputs/browser.py Outdated Show resolved Hide resolved

web_poet/page_inputs/browser.py Outdated Show resolved Hide resolved

kmike and others added 2 commits May 26, 2022 23:53

switch mixin position

0010b35

Co-authored-by: Adrián Chaves <[email protected]>

Update web_poet/page_inputs/browser.py

df68a24

Co-authored-by: Adrián Chaves <[email protected]>

gatufo approved these changes May 27, 2022

View reviewed changes

kmike merged commit 2b7dd00 into master May 27, 2022

kmike deleted the browser-html branch July 22, 2022 09:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BrowserHtml #43

BrowserHtml #43

Uh oh!

kmike commented May 22, 2022 •

edited

Loading

Uh oh!

codecov bot commented May 22, 2022 •

edited

Loading

Uh oh!

BurnzZ left a comment

Uh oh!

BurnzZ May 26, 2022

Uh oh!

kmike May 26, 2022

Uh oh!

kmike May 26, 2022

Uh oh!

Uh oh!

Uh oh!

gatufo left a comment

Uh oh!

Uh oh!

BrowserHtml #43

BrowserHtml #43

Uh oh!

Conversation

kmike commented May 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented May 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

BurnzZ left a comment

Choose a reason for hiding this comment

Uh oh!

BurnzZ May 26, 2022

Choose a reason for hiding this comment

Uh oh!

kmike May 26, 2022

Choose a reason for hiding this comment

Uh oh!

kmike May 26, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gatufo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kmike commented May 22, 2022 •

edited

Loading

codecov bot commented May 22, 2022 •

edited

Loading