Skip to content

nilesh-c/chatbot_explore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Chatbot Usage Analysis

The notebook analysis/explore.ipynb performs an in-depth analysis of user interactions with an AI chatbot. The analysis includes generating concise summaries of the general intentions and key themes of the questions asked by users, exploratory analysis of the messages (e.g. word clouds, plotting language distributions and user engagement statistics), and a focused study on negative chatbot responses.


analysis/explore_with_results.html contains an exported version of the notebook including generated plots and results.

Here is an outline of the different parts of the notebook.

📦 Dataset Loading and Preprocessing

  • Load usage data from an Excel file.
  • Requests are in serialized Python dict format (not JSON). We:
    • Attempt to parse with ast.literal_eval.
    • Fallback to repairing the string with json_repair when needed.
  • Extract history from each request and isolate only user messages.

🧠 LLM-Powered Summarization

We use deepseek-ai/DeepSeek-R1-Distill-Qwen-14B and a local vLLM server to generate insights from user messages.

Steps:

  1. Tokenize user messages using HuggingFace tokenizer to estimate total number of tokens.
  2. Chunk messages into manageable sizes (≤50k tokens).
  3. Construct prompts and generate summaries per chunk.
  4. Save each chunk summary to summary_results.json.
  5. Combine all summaries to generate a final, global summary of user intents and topics, saved in global_summary.txt.

🔍 NLP Exploratory Analysis

🔡 Word Cloud of User Messages

  • Compute TF-IDF scores of top terms.
  • Visualize with wordcloud.

🌍 Language Distribution

  • Detect languages of user messages using langdetect.
  • Plot frequency of detected languages.

🧠 Topic Modeling (WIP)

  • Use BERTopic to extract topics from user messages.
  • Visualize the top 10 topics (currently experimental, potentially needs more tweaking).

📈 User Engagement Analysis

⏰ Message Timing

  • Plot number of messages by hour of the day with 6-hour bins.
  • Plot number of messages by day of the week.

💬 Session Length

  • Plot number of messages per chat session.

👤 User Activity

  • Bucket users by number of messages sent.
  • Display distribution via pie chart.

🚨 Negative Response Flow Analysis

Analyze how the chatbot handles negative responses and how users react:

Methodology:

  • The current process is based on lexical rules - can be made more intelligent with LLMs if needed.
  • Check if the bot says "I could not find" or "unable to find".
  • Detect if suggestions (<<...>>) are offered.
  • Analyze follow-up behavior:
    • User follows suggestion
    • User asks something else
    • The bot fails to answer even if user follows one of the suggestions (ideally should not happen, but in fact happens 66% of the time)

Visualizations:

  • Histogram: Negative response count per session
  • Sankey Diagram: Flow from negative response to user action

🛠️ Setup & Requirements

  1. Copy Usage_Data.xlsx to data/ directory.
  2. Install required dependencies:
pip install -r requirements.txt

To run the summarization section, you also need:

  • A local vLLM instance running an LLM (feel free to change it to any model supported by vLLM):
    VLLM_USE_V1=0 vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-14B --max-log-len 10 --max-model-len 100000 --enable-reasoning --reasoning-parser deepseek_r1

📁 File Structure

.
├── requirements.txt                   # Python dependencies for running the project
├── data/
│   ├── global_summary.txt             # LLM-generated global summary of all user messages
│   ├── Usage_Data.xlsx        # Main dataset used for analysis (not included in the repo)
│   └── summary_results.json           # LLM-generated summaries of chunks of user messages
├── analysis/
│   ├── utils.py                       # OpenAI wrapper for vLLM
│   ├── explore.ipynb                  # Main exploratory data analysis notebook
│   └── explore_with_results.html      # Static HTML export of the results notebook
├── README.md                          # This project overview and documentation
└── LICENSE                            # License for usage and distribution

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published