A lightweight Natural Language Probabilistic Intent Engine designed to match free-form user queries to predefined intents using a hybrid of similarity metrics and Bayesian scoring. Ideal for chatbots, virtual assistants, and any system needing a simple yet robust intent recognition layer.
-
Combined Similarity & Bayesian Scoring
- Word-level F1 overlap for lexical matching
- Character n-gram (trigram) Jaccard for fuzzy matching
- Length penalty to normalize short vs. long inputs
- Bayesian log-likelihood with configurable match / non-match priors and example-frequency priors
- Final score is a 50/50 blend of similarity and Bayesian scores
-
Clean Preprocessing
- Lowercasing and removal of punctuation (preserves math operators)
- Stopword filtering to focus on keywords
-
Simple API
train_intent(intent_name, example_query)to register labeled expressionsmatch(query) -> (intent_name, score)to infer best intent and confidence
-
No External Dependencies
- Pure Python standard library (
re,math,collections)
- Pure Python standard library (
Clone or download, then include intentmatcher.py in your project. No additional packages are required.
If you prefer a package structure, copy into your module and import:
# Your project structure:
# myapp/
# intentmatcher.py
# main.pyfrom intentmatcher import IntentMatcher
# 1. Instantiate the engine
matcher = IntentMatcher(p_match=0.9, p_nomatch=0.1)
# 2. Train with example phrases
matcher.train_intent("get_balance", "What is my current account balance?")
matcher.train_intent("get_balance", "Show me my balance.")
matcher.train_intent("transfer", "Send $100 to Alice.")
matcher.train_intent("transfer", "Transfer funds to Bob.")
# 3. Match new queries
intent, score = matcher.match("Could you please show my balance now?")
print(intent, score) # => "get_balance", e.g. 0.82
intent, score = matcher.match("I want to transfer funds.")
print(intent, score) # => "transfer", e.g. 0.76-
Preprocessing
clean_text: lowercase, remove non-word chars, filter stopwords.
-
Similarity Calculation
word_f1: F1 overlap of token sets.letter_ngram_sim: Jaccard of 3-gram character shingles.length_penalty: penalizes extreme length mismatches.- Combined:
(0.5 * word_f1 + 0.3 * letter_ngram_sim) * length_penalty
-
Bayesian Scoring
- Log-sum of
p_matchvs.p_nomatchper token match. - Plus log prior based on how often examples were seen.
- Log-sum of
-
Intent Matching
- For each intent, score against each example.
- Take maximum example score as the intent score.
- Return the intent with highest overall score.
p_matchandp_nomatchdetermine the strength of Bayesian likelihood when tokens match or don’t.- Tweak weights inside
combined_similarityto adjust word vs. n-gram importance.
-
Use
match()in your bot’s message handler to route queries to handlers:intent, score = matcher.match(user_input) if intent == "get_balance" and score > 0.5: handle_balance() else: fallback()
-
Preload your intents at application startup for minimal latency.
MIT License. See LICENSE for details.
Built and maintained by Tommy Muga — a Software Engineer and Computational Modeler.