Skip to content

[Feature] Show similar mods on mod page #424

@HebaruSan

Description

@HebaruSan

Motivation

Currently users can only find mods based on the featured list, creation/update time, overall popularity, and (a currently rather poor) text search. These features are only available via the mod listing pages specifically made for it. If a user happens to open a mod page from off-site, there is no easy path to finding more mods they might like.

image

Suggestion

We could add a Similar Mods list at the bottom of the mod page that would show a few (6? 12? unlimited paginated?) mods ranked by how similar they are to the main mod. Visually, it should be a pretty simple matter to re-use the existing mod box styling and functionality, kind of like:

image

Data model

I imagine implementing this with a new ModsSimilarity table to store similarities:

Column Purpose
main_mod_id Stores Mod.id of one of the mods being compared
other_mod_id Stores Mod.id of the other mod being compared
similarity A number that is larger for more similar mods and smaller for less similar mods

An index of (main_mod_id, similarity DESC) would allow us to quickly get the mods most similar to a given mod from other_mod_id of the rows returned. We would have to create two rows per pair of mods under this model, with the id values swapped in the two *_mod_id columns, but I think that may be the least bad approach anyway.

With 2913 mods currently in the db (counting deleted ones because I don't have an easy way to exclude them), there would be 8,485,569 rows in the table.

Calculating similarity values

We would probably base the similarity on a weighted sum of comparisons of these columns:

  • Mod.game_id - 1 if same, 0 if different
  • [Mod.user_id, SharedAuthor.user_id] (the authors) - 1 if all authors are same, 0 if all authors are different, fractions for partial matches
  • Mod.name
  • Mod.short_description
  • Mod.description
  • Mod.default_version.changelog (most recent changelog, maybe)
  • Mod.background (image files, maybe)

Ideally we would delegate the comparison of the string columns to a machine learning library with an interface like:

def get_string_similarity(s1: str, s2: str) -> float:
    """ Compare the strings with AI """

There are many such open source libraries, including for Python, but so far I have not found one that would make it that easy. They generally would require us to:

  • Maintain a lexicon of known words, which would probably have to be stored in its own new table to keep it consistent between runs
  • Tokenize the input strings into words and then into numbers using the lexicon
  • Provide training data, effectively a long list of pairs of strings and our interpretation of the "correct" similarity values
  • Store the trained neural network weights somewhere
  • Load the trained data when we want to compare strings

So rather than having "an AI" do the hard work for us, we would have to tell it that "probe" and "satellite" are similar but "future" and "SPH" are not, etc., and then micromanage its memory for it and fiddle with it until its comparisons looked acceptable. At that point we might be better off writing our own simpler ad hoc heuristic logic.

It would be nice if we could detect when the user clicks a similar mod link and use that to update the comparison of the mods, since in that case a human is confirming the similarity. I'm not sure how we would do that.

Batching the calculations

To get started, we would need to compare every mod with every other mod (O(N²) in the number of mods). Then as mods were created and edited and updated, we would have to re-compare the changed mod with all the other mods (O(N)). This probably isn't something we could run in the foreground on any page. Ideally we would add mods that need re-comparison to a queue and then have a background task perform the comparisons and update the db.

Metadata

Metadata

Assignees

Labels

Area: BackendRelated to the Python code that runs inside gunicornArea: FrontendRelated to HTML, JS, CSS, or other browser thingsArea: InfrastructureRelated to server stuff outside gunicorn, especially ATSArea: MigrationRelated to Alembic database migrationsPriority: LowType: Feature

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions