feat: add StreamingParser for memory-efficient CSV parsing #5087

zoroglucihat · 2025-08-22T23:12:04Z

Add StreamingParser for memory-efficient CSV parsing of large files

What?

This PR introduces a new StreamingParser class for the k6/experimental/csv module that enables memory-efficient parsing of large CSV files without loading them entirely into memory.

Key additions:

StreamingParser: New parser class that takes file paths instead of File objects
StreamingReader: Underlying reader with 64KB buffer for streaming file processing
Direct OS filesystem access: Bypasses k6's internal file caching that loads entire files
Identical API: Same interface as existing Parser for seamless migration
Comprehensive tests: Full test suite covering all streaming functionality

Why?

Problem: The current CSV parser causes Out-of-Memory (OOM) errors when processing large files because:

fs.open() loads entire files into memory using io.ReadAll()
k6's file system cache stores complete file content as []byte
A 12GB CSV file immediately consumes 12GB+ RAM during initialization

Root Cause: The file system module (internal/js/modules/k6/experimental/fs/cache.go:102) uses io.ReadAll() which is unsuitable for large file processing.

Impact: Users cannot process large CSV files (>RAM size) for load testing scenarios with extensive test data.

Before (OOM error):

import fs from 'k6/experimental/fs';
import csv from 'k6/experimental/csv';

const csvFile = await fs.open("12GB-file.csv");  // Loads entire file!
const parser = new csv.Parser(csvFile, { skipFirstLine: true });

After (memory-efficient):

import csv from 'k6/experimental/csv';

const parser = new csv.StreamingParser("12GB-file.csv", { skipFirstLine: true });

Performance improvement:

Memory usage: 12GB+ → ~64KB constant
Initialization: Instant vs. loading entire file upfront
Scalability: Handles any file size without OOM

Checklist

I have performed a self-review of my code.
I have commented on my code, particularly in hard-to-understand areas.
I have added tests for my changes.
I have run linter and tests locally (make check) and all pass.

Checklist: Documentation (only for k6 maintainers and if relevant)

Please do not merge this PR until the following items are filled out.

I have added the correct milestone and labels to the PR.
I have updated the release notes: link
I have updated or added an issue to the k6-documentation: grafana/k6-docs#NUMBER if applicable
I have updated or added an issue to the TypeScript definitions: grafana/k6-DefinitelyTyped#NUMBER if applicable

Related PR(s)/Issue(s)

Closes #5080

Testing

All tests pass including new streaming-specific tests:

- Add StreamingReader with 64KB buffer for large file processing - Add StreamingParser class with same API as regular Parser - Bypass k6 file caching to avoid loading entire files into memory - Support all existing CSV parser options (skipFirstLine, asObjects, etc.) - Add comprehensive test suite for streaming functionality - Add usage example for large CSV files Fixes grafana#5080

CLAassistant · 2025-08-22T23:12:11Z

All committers have signed the CLA.

codebien · 2025-09-01T14:46:59Z

Hey @zoroglucihat, is a new API truly necessary?

The proposed solution doesn't appear to be ideal. Have you verified that the reported issue isn't primarily a bug? It always recommended to discuss the introduction of a new API into the dedicated issue before to submit a new pull request.

In addition, this solution seems to be largely LLM-generated. If a clear illustration of the solution isn't provided, we will close this, similar to our previous action.

I report the comment from #5066 (comment) for context:

As a general note for future contributions (from anyone in the community), it is vital that all submitted code is thoroughly reviewed and tested by its author. When using AI tools to assist in development, strong human supervision is essential to ensure the final result is robust, idiomatic, and truly solves the problem at hand. This ensures the review process is productive and respectful of everyone's time.

zoroglucihat requested a review from a team as a code owner August 22, 2025 23:12

zoroglucihat requested review from oleiade and codebien and removed request for a team August 22, 2025 23:12

Merge branch 'master' into feat/csv-streaming-parser

b0cd725

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add StreamingParser for memory-efficient CSV parsing #5087

feat: add StreamingParser for memory-efficient CSV parsing #5087

zoroglucihat commented Aug 22, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Aug 22, 2025 •

edited

Loading

Uh oh!

codebien commented Sep 1, 2025

Uh oh!

Uh oh!

feat: add StreamingParser for memory-efficient CSV parsing #5087

Are you sure you want to change the base?

feat: add StreamingParser for memory-efficient CSV parsing #5087

Conversation

zoroglucihat commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add StreamingParser for memory-efficient CSV parsing of large files

What?

Why?

Checklist

Checklist: Documentation (only for k6 maintainers and if relevant)

Related PR(s)/Issue(s)

Testing

Uh oh!

CLAassistant commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codebien commented Sep 1, 2025

Uh oh!

Uh oh!

zoroglucihat commented Aug 22, 2025 •

edited

Loading

CLAassistant commented Aug 22, 2025 •

edited

Loading