The notebook analysis/explore.ipynb
performs an in-depth analysis of user interactions with an AI chatbot. The analysis includes generating concise summaries of the general intentions and key themes of the questions asked by users, exploratory analysis of the messages (e.g. word clouds, plotting language distributions and user engagement statistics), and a focused study on negative chatbot responses.
analysis/explore_with_results.html
contains an exported version of the notebook including generated plots and results.
Here is an outline of the different parts of the notebook.
- Load usage data from an Excel file.
- Requests are in serialized Python dict format (not JSON). We:
- Attempt to parse with
ast.literal_eval
. - Fallback to repairing the string with
json_repair
when needed.
- Attempt to parse with
- Extract
history
from each request and isolate only user messages.
We use deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
and a local vLLM server to generate insights from user messages.
- Tokenize user messages using HuggingFace tokenizer to estimate total number of tokens.
- Chunk messages into manageable sizes (≤50k tokens).
- Construct prompts and generate summaries per chunk.
- Save each chunk summary to
summary_results.json
. - Combine all summaries to generate a final, global summary of user intents and topics, saved in
global_summary.txt
.
- Compute TF-IDF scores of top terms.
- Visualize with
wordcloud
.
- Detect languages of user messages using
langdetect
. - Plot frequency of detected languages.
- Use
BERTopic
to extract topics from user messages. - Visualize the top 10 topics (currently experimental, potentially needs more tweaking).
- Plot number of messages by hour of the day with 6-hour bins.
- Plot number of messages by day of the week.
- Plot number of messages per chat session.
- Bucket users by number of messages sent.
- Display distribution via pie chart.
Analyze how the chatbot handles negative responses and how users react:
- The current process is based on lexical rules - can be made more intelligent with LLMs if needed.
- Check if the bot says "I could not find" or "unable to find".
- Detect if suggestions (
<<...>>
) are offered. - Analyze follow-up behavior:
- User follows suggestion
- User asks something else
- The bot fails to answer even if user follows one of the suggestions (ideally should not happen, but in fact happens 66% of the time)
- Histogram: Negative response count per session
- Sankey Diagram: Flow from negative response to user action
- Copy
Usage_Data.xlsx
todata/
directory. - Install required dependencies:
pip install -r requirements.txt
To run the summarization section, you also need:
- A local vLLM instance running an LLM (feel free to change it to any model supported by vLLM):
VLLM_USE_V1=0 vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-14B --max-log-len 10 --max-model-len 100000 --enable-reasoning --reasoning-parser deepseek_r1
.
├── requirements.txt # Python dependencies for running the project
├── data/
│ ├── global_summary.txt # LLM-generated global summary of all user messages
│ ├── Usage_Data.xlsx # Main dataset used for analysis (not included in the repo)
│ └── summary_results.json # LLM-generated summaries of chunks of user messages
├── analysis/
│ ├── utils.py # OpenAI wrapper for vLLM
│ ├── explore.ipynb # Main exploratory data analysis notebook
│ └── explore_with_results.html # Static HTML export of the results notebook
├── README.md # This project overview and documentation
└── LICENSE # License for usage and distribution