-
Notifications
You must be signed in to change notification settings - Fork 176
Added oEmbed support to the Web plugin to improve title fetching #1613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, thanks.
The design looks fine, but i have nitpick belows
conf.registerGlobalValue(Web, 'useOembedRegistry', | ||
registry.Boolean(False, _("""Determines whether the bot will use the | ||
oembed.com providers registry."""))) | ||
|
||
conf.registerGlobalValue(Web, 'useOembedDiscovery', | ||
registry.Boolean(False, _("""Determines whether the bot will use HTML | ||
discovery to find oEmbed endpoints."""))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name the config variables plugins.Web.oembed.registry
and plugins.Web.oembed.discovery
.
Could you also make them channel-specific, but not op-settable?
response = utils.web.getUrl(url, timeout=timeout) | ||
text = response.decode('utf8', errors='replace') | ||
match = re.search( | ||
r'<link[^>]+?type="application/json\+oembed"[^>]+?href="([^"]+)"', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's insufficient, it does not cover different attribute orders or single quotes.
re.IGNORECASE) | ||
if match: | ||
endpoint = match.group(1) | ||
endpoint = endpoint.split('?')[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
oembed_endpoint = self._getOEmbedEndpoint(url) | ||
if not oembed_endpoint: | ||
return None | ||
oembed_url = f"{oembed_endpoint}?format=json&url={url}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you need to escape the URL? (use urllib.parse.urlunparse
)
def testtitleOembedRegistry(self): | ||
try: | ||
conf.supybot.plugins.Web.useOembedRegistry.setValue(True) | ||
self.assertResponse( | ||
'title https://www.flickr.com/photos/bees/2362225867/', | ||
'Bacon Lollys') | ||
finally: | ||
conf.supybot.plugins.Web.useOembedRegistry.setValue(False) | ||
|
||
def testtitleOembedDiscovery(self): | ||
try: | ||
conf.supybot.plugins.Web.useOembedDiscovery.setValue(True) | ||
self.assertResponse( | ||
'title https://flickr.com/photos/bees/2362225867/', | ||
'Bacon Lollys') | ||
finally: | ||
conf.supybot.plugins.Web.useOembedDiscovery.setValue(False) | ||
|
||
def testtitleOembedError(self): | ||
try: | ||
conf.supybot.plugins.Web.useOembedDiscovery.setValue(True) | ||
self.assertError('title https://nonexistent.example.com/post/123') | ||
finally: | ||
conf.supybot.plugins.Web.useOembedDiscovery.setValue(False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def testtitleOembedRegistry(self): | |
try: | |
conf.supybot.plugins.Web.useOembedRegistry.setValue(True) | |
self.assertResponse( | |
'title https://www.flickr.com/photos/bees/2362225867/', | |
'Bacon Lollys') | |
finally: | |
conf.supybot.plugins.Web.useOembedRegistry.setValue(False) | |
def testtitleOembedDiscovery(self): | |
try: | |
conf.supybot.plugins.Web.useOembedDiscovery.setValue(True) | |
self.assertResponse( | |
'title https://flickr.com/photos/bees/2362225867/', | |
'Bacon Lollys') | |
finally: | |
conf.supybot.plugins.Web.useOembedDiscovery.setValue(False) | |
def testtitleOembedError(self): | |
try: | |
conf.supybot.plugins.Web.useOembedDiscovery.setValue(True) | |
self.assertError('title https://nonexistent.example.com/post/123') | |
finally: | |
conf.supybot.plugins.Web.useOembedDiscovery.setValue(False) | |
def testtitleOembedRegistry(self): | |
with conf.supybot.plugins.Web.useOembedRegistry.context(True): | |
self.assertResponse( | |
'title https://www.flickr.com/photos/bees/2362225867/', | |
'Bacon Lollys') | |
def testtitleOembedDiscovery(self): | |
with conf.supybot.plugins.Web.useOembedDiscovery.context(True): | |
self.assertResponse( | |
'title https://flickr.com/photos/bees/2362225867/', | |
'Bacon Lollys') | |
def testtitleOembedError(self): | |
with conf.supybot.plugins.Web.useOembedDiscovery.context(True): | |
self.assertError('title https://nonexistent.example.com/post/123') |
use the new style, it's shorter
Two methods are supported and can be configured:
Registry is tried first if enabled, then discovery if enabled,
finally falling back to HTML parsing.
I think it would be nicer to add the calls to
getTitle()
.I was trying to avoid using
url_workaround()
, since it would break support for reddit - for exsample, old.reddit.com isn't in the registry. However, following redirects could actually be beneficial, especially for handling v.reddit.com.For discovery, it makes sense to place this logic close to the title parser, as that would save an extra request. That said, I wanted to open the pull request first to gather some feedback before making further adjustments.