Skip to content

Tweaking ED pipeline #121

@abhinavkulkarni

Description

@abhinavkulkarni

Hi,

Thanks again for the great work!

I am currently evaluating REL for ED purposes and comparing it against other ED techniques, chiefly against BLINK from Facebook AI Research. They both take into account the context in which a mention occurs, are two-staged, and use neural approaches. BLINK does well, but can be slow and requires a GPU to run, which is a limitation for me.

Although REL is fast and lightweight, I find that it often misses a few obvious cases. I am looking for some guidance as to how I can tweak the internal workings of REL to achieve accurate results.

The following results have been obtained by running REL on a podcast description and a particular episode description - separated by a newline.

That is, in the code

text_doc = podcast_summary + '\n' + episode_summary
el_result = requests.post(API_URL, json={
    "text": text_doc,
    "spans": []
}).json()
  • For this episode, mention Shadi Hamid is identified as Brookings_Institution with score 0.9991938769817352 and NER tag PER. This is particularly egregious. Shadi Hamid's Wikipedia page is not being returned as the 1st candidate.

  • For this episode, mention Lauren Bonner from the podcast description is being identified as Lauren_Samuels with score 0.9993583559989929 even though the last names are quite different while mention Ray J is (correctly) identified as Ray_J albeit with a lower score 0.8136761486530304.

  • For this episode, mention Charlamagne Tha God from the podcast description gets only 0.7140538295110067 score even though words like comedians, outspoken celebrities, and thought-leaders appear in the context (which should make it easy to match his embedding learned from his Wikipedia profile which contains similar words).

  • For this episode, mention Dave Smith is always identified as Dave_Smith_(engineer) with very high confidence, even though Dave_Smith_(comedian), the correct answer appears in the candidate set and has even words such as government, foreign policy, and all things Libertarian in the context which should have had a greater match with his description on Wikipedia.

The last point is particularly important since Dave Smith is quite a common name and there are at least 4 Dave Smiths in Wikipedia - but with very differing descriptions.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions