-
Notifications
You must be signed in to change notification settings - Fork 16
BrowserHtml #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BrowserHtml #43
Conversation
…ented This allows to pick a public-facing name which fits better in different cases (e.g. .html or .text)
Codecov Report
@@ Coverage Diff @@
## master #43 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 14 15 +1
Lines 320 330 +10
=========================================
+ Hits 320 330 +10
|
It was required before because property was applied to a decorated method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @kmike !
# non-hashable classes, where memoizemethod_noargs doesn't work | ||
if self.__cached_selector is not None: | ||
return self.__cached_selector | ||
# XXX: should we pass base_url=self.url, as Scrapy does? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this comment since it's being used by BrowserHtml
which doesn't rely on a url. Or do you foresee a need for it later on @kmike ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's a right thing to do (which I'm not sure about - it seems it's not needed), we'd need to have URL for selectors to work properly. In this case, having a class like BrowserResponse, which contains both URL and HTML (similar to what we had with AutoextractHtml), might be better.
That said, it won't be a part of BrowserHtml, so it does make sense to remove the comment, thanks @BurnzZ!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think it might be better to keep the comment, as it's a part of SelectableMixin, not of BrowserHtml class.
Co-authored-by: Adrián Chaves <[email protected]>
Co-authored-by: Adrián Chaves <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT!
In this PR url-less BrowserHtml is added. Unlike HttpResponseBody, its type is str, not bytes; this means selectors can be supported directly.
In autoextract-poet we had AutoextractHtml (https://github.com/scrapinghub/autoextract-poet/blob/aac08746c7ca9bc0baf07cfbf7773d616c26b1fb/autoextract_poet/page_inputs.py#L23), which is more similar to HttpResponse, as it contains URL.
I think we can add a similar class later, if needed. A design challenge would be to figure out wht should be a URL class - is the same as ResponseURL, or is it separate (BrowserURL) - assuming we're moving forward with #42.