Skip to content

Fuzzy Search Engine (Levenshtein Distance + Cosine TF-IDF): Implementing a hybrid fuzzy search mechanism that combines lexical similarity (Levenshtein) with semantic relevance (Cosine similarity over TF-IDF vectors) to deliver accurate, intelligent search results across diverse datasets and user inputs.

License

Notifications You must be signed in to change notification settings

lets-build-an-ocean/Fuzzy-Search

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fuzzy-Search-Service

A highly flexible and reusable C# service library for performing high-accuracy fuzzy string matching on complex data models. This service combines the efficiency of FuzzySharp's Ratio matching with the precision of Levenshtein Distance for superior search results, complete with built-in memory caching.

🌟 Features

  • Generic Implementation: The FuzzySearchService<TModel> abstract class allows it to be used with any data model (TModel).
  • Hybrid Scoring: Results are filtered and ranked based on two criteria:
    1. Fuzzy Score (Ratio): Measures the similarity percentage (e.g., using FuzzySharp).
    2. Levenshtein Distance: Measures the minimum number of single-character edits required to change one word into the other.
  • Customizable Thresholds: Easily configure the minimum acceptable Fuzzy Score (FuzzyThreshold) and the maximum Levenshtein Distance (LevenshteinThreshold).
  • Built-in Caching: Utilizes IMemoryCache to cache the source data, ensuring fast repeated searches without redundant data loading.
  • Asynchronous Data Loading: Designed for scalability with async/await data loading.

🛠️ Usage

To integrate the fuzzy search functionality, you need to implement three main components: your data model, your data repository, and the concrete search service.

1. Define Your Model (Example: City.cs)

Your model is the data structure you wish to search.

public class City
{
    public int Id { get; set; }
    public string Name { get; set; }       // For Persian names like "توکیو"
    public string EnglishName { get; set; }
    public string CountryName { get; set; }
    public long Population { get; set; }
}

2. Define Data Loading (ICityRepository)

Create an interface and a concrete class to handle data retrieval.

public interface ICityRepository
{
    Task<IEnumerable<City>> GetAllCity();
}

public class CityRepository : ICityRepository
{
    public async Task<IEnumerable<City>> GetAllCity()
    {
        // ... Load your city list from DB, file, or static data (as in the example)
        // Ensure this returns the full list of City objects.
        return await Task.FromResult(YourStaticCityList); 
    }
}

3. Implement the Search Service (CityFuzzySearch.cs)

Derive from FuzzySearchService<City> and override the abstract properties and methods.

using Fuzzy_Search.CitySearchLib.Services;

public class CityFuzzySearch : FuzzySearchService<City>
{
    private readonly ICityRepository cityRepo;

    public CityFuzzySearch(ICityRepository cityRepo, IMemoryCache? cache = null) : base(cache)
    {
        this.cityRepo = cityRepo;
    }

    // --- Configuration Overrides ---

    protected override string CacheKey => "city_fuzzy_cache";
    protected override TimeSpan CacheExpireTime => TimeSpan.FromMinutes(30);

    // Default thresholds are 60% Fuzzy Ratio and max 3 Levenshtein Distance
    // protected override int FuzzyThreshold { get; } = 60;
    // protected override int LevenshteinThreshold { get; } = 3;


    // --- Core Logic Implementations ---

    public override async Task<IEnumerable<City>> LoadData()
    {
        return await cityRepo.GetAllCity();
    }

    protected override string[] GetSearchableField(City model)
    {
        // Specify which fields to perform the fuzzy search against
        return new[] { model.EnglishName, model.Name, model.CountryName };
    }
}

4. Running the Search

After setting up your dependency injection (or manual instantiation):

// Example usage in an application service or controller
public async Task<List<City>> GetFuzzyCityResults(string query)
{
    // Assuming citySearchService is injected/instantiated
    var results = await citySearchService.Search(query, limit: 10);

    // The result is a list of tuples, ordered by the best match:
    // (Model, FuzzyScore, Distance)
    
    foreach (var result in results)
    {
        Console.WriteLine($"City: {result.Model.EnglishName} ({result.Model.Name})");
        Console.WriteLine($"  Fuzzy Score: {result.FuzzyScore}, Distance: {result.Distance}");
    }
    
    return results.Select(r => r.Model).ToList();
}

// Example Search Calls:

// 1. Searching by a misspelled English name:
// Query: "nwyork" -> Result: New York (High Score, Distance 1)

// 2. Searching by a misspelled Persian name:
// Query: "قاهر" (Qahr) -> Result: قاهره / Cairo (High Score, Low Distance)

⚙️ Customization

You can adjust the search sensitivity by overriding the following protected properties in your concrete service class (CityFuzzySearch):

Property Default Value Description
FuzzyThreshold 60 The minimum Fuzz.Ratio (0-100) required for a model to be included in the results.
LevenshteinThreshold 3 The maximum Levenshtein distance allowed between the query and a searchable field.
CacheExpireTime TimeSpan.FromMinutes(30) How long the loaded data remains in memory before the service attempts to call LoadData() again.

To manually clear the cached data at any time, simply call:

citySearchService.ClearCache();

About

Fuzzy Search Engine (Levenshtein Distance + Cosine TF-IDF): Implementing a hybrid fuzzy search mechanism that combines lexical similarity (Levenshtein) with semantic relevance (Cosine similarity over TF-IDF vectors) to deliver accurate, intelligent search results across diverse datasets and user inputs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C# 100.0%