feat(rows): exclude heavy binary columns from /rows response to avoid UI lag (#3052) #3209

ArjunJagdale · 2025-07-04T09:57:37Z

This PR improves the responsiveness of the Dataset Viewer by skipping binary-heavy columns (e.g., t5_prompt_embeds, vae_latents) from the /rows endpoint payload.

These columns typically contain thousands of bytes per row and are not meaningful in the UI. The change introduces a hardcoded exclusion list (EXCLUDED_COLUMNS) and drops those columns from the final pyarrow.Table before response generation.

Tested manually using datasets with large binary columns and confirmed a reduction in payload size and frontend lag.

Future improvements could include:

Auto-detecting such columns by dtype or size
Allowing dataset creators to opt-out columns explicitly via config or metadata

…e_latents from /rows output feat(rows): exclude heavy binary columns like t5_prompt_embeds and vae_latents from /rows output This change introduces a safeguard against rendering performance issues in the Dataset Viewer by skipping certain heavy binary columns (e.g., ~5KB per row) that are not useful for display. Currently hardcoded to drop columns like "t5_prompt_embeds" and "vae_latents", which caused UI freezing in datasets like `frutiemax/themoviedb_posters`. This is handled right before response construction in the /rows endpoint.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(rows): exclude heavy binary columns from /rows response to avoid UI lag (#3052) #3209

feat(rows): exclude heavy binary columns from /rows response to avoid UI lag (#3052) #3209

Uh oh!

ArjunJagdale commented Jul 4, 2025

Uh oh!

Uh oh!

feat(rows): exclude heavy binary columns from /rows response to avoid UI lag (#3052) #3209

Are you sure you want to change the base?

feat(rows): exclude heavy binary columns from /rows response to avoid UI lag (#3052) #3209

Uh oh!

Conversation

ArjunJagdale commented Jul 4, 2025

Uh oh!

Uh oh!