Skip to content

Conversation

kmike
Copy link
Member

@kmike kmike commented May 22, 2022

In this PR url-less BrowserHtml is added. Unlike HttpResponseBody, its type is str, not bytes; this means selectors can be supported directly.

In autoextract-poet we had AutoextractHtml (https://github.com/scrapinghub/autoextract-poet/blob/aac08746c7ca9bc0baf07cfbf7773d616c26b1fb/autoextract_poet/page_inputs.py#L23), which is more similar to HttpResponse, as it contains URL.

I think we can add a similar class later, if needed. A design challenge would be to figure out wht should be a URL class - is the same as ResponseURL, or is it separate (BrowserURL) - assuming we're moving forward with #42.

kmike added 3 commits May 22, 2022 22:53
…ented

This allows to pick a public-facing name which fits better
in different cases (e.g. .html or .text)
@kmike kmike marked this pull request as ready for review May 22, 2022 18:39
@codecov
Copy link

codecov bot commented May 22, 2022

Codecov Report

Merging #43 (df68a24) into master (de6e24c) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master       #43   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           14        15    +1     
  Lines          320       330   +10     
=========================================
+ Hits           320       330   +10     
Impacted Files Coverage Δ
web_poet/__init__.py 100.00% <ø> (ø)
web_poet/mixins.py 100.00% <100.00%> (ø)
web_poet/page_inputs/__init__.py 100.00% <100.00%> (ø)
web_poet/page_inputs/browser.py 100.00% <100.00%> (ø)
web_poet/page_inputs/http.py 100.00% <100.00%> (ø)
web_poet/overrides.py 100.00% <0.00%> (ø)

It was required before because property was applied to a decorated method.
Copy link
Contributor

@BurnzZ BurnzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @kmike !

# non-hashable classes, where memoizemethod_noargs doesn't work
if self.__cached_selector is not None:
return self.__cached_selector
# XXX: should we pass base_url=self.url, as Scrapy does?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this comment since it's being used by BrowserHtml which doesn't rely on a url. Or do you foresee a need for it later on @kmike ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's a right thing to do (which I'm not sure about - it seems it's not needed), we'd need to have URL for selectors to work properly. In this case, having a class like BrowserResponse, which contains both URL and HTML (similar to what we had with AutoextractHtml), might be better.

That said, it won't be a part of BrowserHtml, so it does make sense to remove the comment, thanks @BurnzZ!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think it might be better to keep the comment, as it's a part of SelectableMixin, not of BrowserHtml class.

kmike and others added 2 commits May 26, 2022 23:53
Co-authored-by: Adrián Chaves <[email protected]>
Copy link
Member

@gatufo gatufo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT!

@kmike kmike merged commit 2b7dd00 into master May 27, 2022
@kmike kmike deleted the browser-html branch July 22, 2022 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants