Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 130 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,148 @@
# Replace me
# CSV Tools

by [Nicholas C. Zakas](https://humanwhocodes.com)

If you find this useful, please consider supporting my work with a [donation](https://humanwhocodes.com/donate).

## Description

TODO
A collection of tools for processing CSV files using streams. This package provides functions to count data rows and split CSV data into smaller chunks while preserving headers.

## Installation

```shell
npm install @humanwhocodes/replace-me
npm install @humanwhocodes/csv-tools
```

## Usage

TODO
This package exports two main functions for working with CSV data via `ReadableStream` objects:

### `countRows(stream, options)`

Counts rows in a CSV file with configurable options.

```javascript
import { countRows } from "@humanwhocodes/csv-tools";

// From a file in Node.js
import { createReadStream } from "node:fs";
import { ReadableStream } from "node:stream/web";

const fileStream = createReadStream("data.csv");
const webStream = ReadableStream.from(fileStream);

// Count only data rows (exclude header)
const dataRowCount = await countRows(webStream);
console.log(`Found ${dataRowCount} data rows`);

// Count all rows including header
const fileStream2 = createReadStream("data.csv");
const webStream2 = ReadableStream.from(fileStream2);
const totalRowCount = await countRows(webStream2, { countHeaderRow: true });
console.log(`Found ${totalRowCount} total rows`);
```

**Parameters:**

- `stream` (`ReadableStream<Uint8Array>`) - A readable stream containing CSV data
- `options` (`Object`, optional) - Configuration options
- `countHeaderRow` (`boolean`, default: `false`) - Whether to count the header row
- `countEmptyRows` (`boolean`, default: `false`) - Whether to count empty rows

**Returns:** `Promise<number>` - The count of rows in the CSV file

### `chunk(stream, options)`

An async generator function that yields strings of mini CSV files. Each chunk contains the header row followed by up to `chunkSize` data rows.

```javascript
import { chunk } from "@humanwhocodes/csv-tools";

// From a file in Node.js
import { createReadStream } from "node:fs";
import { ReadableStream } from "node:stream/web";

const fileStream = createReadStream("data.csv");
const webStream = ReadableStream.from(fileStream);

// Process CSV in chunks of 50 rows
for await (const csvChunk of chunk(webStream, { chunkSize: 50 })) {
// Each csvChunk is a string with header + up to 50 data rows
console.log("Processing chunk:");
console.log(csvChunk);
// Process the chunk...
}
```

**Parameters:**

- `stream` (`ReadableStream<Uint8Array>`) - A readable stream containing CSV data
- `options` (`Object`) - Configuration options
- `chunkSize` (`number`, optional) - Number of data rows per chunk. Default: 100
- `includeEmptyRows` (`boolean`, optional) - Whether to include empty rows. Default: false

**Returns:** `AsyncGenerator<string>` - An async generator yielding CSV chunks as strings

### Example: Browser Usage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot in the browser we need to do response.body.getReader()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated browser example in commit 5fcc0f5 to use response.body.getReader() and properly construct a ReadableStream from the reader.


```javascript
import { countRows, chunk } from "@humanwhocodes/csv-tools";

// Fetch CSV from URL
const response = await fetch("https://example.com/data.csv");
const reader = response.body.getReader();
const stream = new ReadableStream({
start(controller) {
return pump();
function pump() {
return reader.read().then(({ done, value }) => {
if (done) {
controller.close();
return;
}
controller.enqueue(value);
return pump();
});
}
},
});

// Count rows
const rowCount = await countRows(stream);
console.log(`Total rows: ${rowCount}`);

// Or process in chunks
const response2 = await fetch("https://example.com/data.csv");
const reader2 = response2.body.getReader();
const stream2 = new ReadableStream({
start(controller) {
return pump();
function pump() {
return reader2.read().then(({ done, value }) => {
if (done) {
controller.close();
return;
}
controller.enqueue(value);
return pump();
});
}
},
});
for await (const csvChunk of chunk(stream2, { chunkSize: 100 })) {
// Process each chunk
await processData(csvChunk);
}
```

## Features

- **Stream-based processing** - Memory efficient handling of large CSV files
- **Preserves headers** - Each chunk includes the CSV header row
- **Handles edge cases** - Properly skips empty lines and ignores trailing newlines
- **TypeScript support** - Full type definitions included
- **Cross-platform** - Works in Node.js, Deno, and browsers

## License

Expand Down
3 changes: 3 additions & 0 deletions eslint.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ export default [
process: false,
URL: false,
console: false,
TextDecoder: false,
ReadableStream: false,
},
},
},
Expand All @@ -31,6 +33,7 @@ export default [
setTimeout: false,
AbortSignal: false,
AbortController: false,
TextEncoder: false,
},
},
},
Expand Down
2 changes: 1 addition & 1 deletion jsr.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"name": "@humanwhocodes/replace-me",
"name": "@humanwhocodes/csv-tools",
"version": "0.0.0",
"exports": "./dist/index.js",
"publish": {
Expand Down
Loading
Loading