Skip to content

Conversation

Jannis25
Copy link
Collaborator

Rebased #6 and updated dependencies.

We integrated several visualizations allowing for deeper insights into the system's mechanisms.

These visualizations cover:

3D Grid visualizing dimension-reduced embeddings of all best guesses of all documents (NuggetListWidget)
Data Insights section displaying the effects of the user's latest feedback (NuggetListWidget)
3D-Grid visualizing dimension-reduced embeddings of all nuggets of the currently opened document (DocumentWidget)
Bar Chart displaying the cosine similarity of all nuggets of the currently opened document (DocumentWidget)

nils-bz and others added 30 commits January 20, 2025 10:09
add scatterplot to display cosine distance between points, click on one point to display corresponding text+distance
Display only the relevant information nuggets on the scatter plot/bar chart for the currently selected document, without accumulating data from previously viewed documents (irrelevant information nuggets)
adjust plot layout to prevent annotation boxes from getting out of the plot window in fullscreen mode
change colormap of scatterplot to represent distances (same color for same distance)
@Jannis25 Jannis25 self-assigned this Jan 30, 2025
@bhaettasch bhaettasch requested review from bhaettasch and Copilot July 23, 2025 13:21
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces comprehensive visualization capabilities to WannaDB, rebasing PR #6 with updated dependencies. The visualizations provide deeper insights into the system's mechanisms by displaying dimension-reduced embeddings, data insights, and interactive charts to help users understand how the matching process works.

Key changes include:

  • Implementation of 3D grids for visualizing dimension-reduced embeddings
  • Data insights section showing effects of user feedback
  • Bar charts displaying cosine similarity values for nuggets
  • User interaction tracking and accessibility features

Reviewed Changes

Copilot reviewed 23 out of 30 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
wannadb_ui/wannadb_api.py Adds PCA dimension reduction to preprocessing pipeline
wannadb_ui/visualizations.py New comprehensive visualization module with 3D grids, bar charts, and scatter plots
wannadb_ui/study.py New tracking system for user interaction monitoring
wannadb_ui/main_window.py Integrates visualization controls and information popups into main UI
wannadb_ui/interactive_matching.py Updates UI components to support visualization features
wannadb_ui/data_insights.py New data insights area showing feedback effects
wannadb_ui/common.py Adds visualization-related enums, classes, and information popup dialogs
wannadb/utils.py New utility functions for duplicate detection and accessible colors
wannadb/preprocessing/dimension_reduction.py New PCA and t-SNE dimension reduction implementations
wannadb/preprocessing/other_processing.py Adds duplicate nugget cleaning functionality
wannadb/matching/matching.py Enhanced matching with change tracking and visualization support
wannadb/data/signals.py New signals for dimension-reduced embeddings and current threshold
wannadb/data/data.py Adds duplicate detection and confirmed matches tracking
wannadb/change_captor.py New change tracking system for user feedback effects

…, interactive matching, main window, and visualizations (Copilot PR Review)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants