@@ -10,7 +10,14 @@ Csv parsing library written in pure Mojo
10
10
11
11
### usage
12
12
13
- Add the Modular community channel (https://repo.prefix.dev/modular-community ) to your mojoproject.toml file or pixi.toml file in the channels section.
13
+ Add the Modular community channel (https://repo.prefix.dev/modular-community ) to your pixi.toml file in the channels section.
14
+
15
+ ``` title:pixi.toml
16
+ channels = [" conda-forge" , " https://conda.modular.com/max" , " https://repo.prefix.dev/modular-community" ]
17
+ ```
18
+
19
+ ` pixi add mojo_csv `
20
+
14
21
15
22
##### Basic Usage
16
23
@@ -38,6 +45,47 @@ fn main():
38
45
print(reader[i])
39
46
```
40
47
48
+ #### BETA
49
+ ``` mojo
50
+ ThreadedCsvReader(
51
+ file_path: Path,
52
+ delimiter: String = ",",
53
+ quotation_mark: String = '"',
54
+ num_threads: Int = 0 # 0 = use all available cores
55
+ )
56
+ ```
57
+
58
+ ### Example 1: Default (All Cores)
59
+
60
+ ``` mojo
61
+ var reader = ThreadedCsvReader(Path("large_file.csv"))
62
+ // Uses all 16 cores on a 16-core system
63
+ ```
64
+
65
+ ### Example 2: Custom Thread Count
66
+
67
+ ``` mojo
68
+ var reader = ThreadedCsvReader(Path("data.csv"), num_threads=4)
69
+ // Uses exactly 4 threads
70
+ ```
71
+
72
+ ### Example 3: Single-threaded
73
+
74
+ ``` mojo
75
+ var reader = ThreadedCsvReader(Path("data.csv"), num_threads=1)
76
+ // Forces single-threaded execution (same as CsvReader)
77
+ ```
78
+
79
+ ### Example 4: Custom Delimiter
80
+
81
+ ```` mojo
82
+ var reader = ThreadedCsvReader(
83
+ Path("pipe_separated.csv"),
84
+ delimiter="|",
85
+ num_threads=8
86
+ )
87
+
88
+
41
89
### Attributes
42
90
43
91
```mojo
@@ -48,6 +96,7 @@ reader.row_count : Int # total number of rows T->B
48
96
reader.column_count : Int # total number of columns L->R
49
97
reader.elements : List[String] # all delimited elements
50
98
reader.length : Int # total number of elements
99
+
51
100
```
52
101
53
102
##### Indexing
@@ -57,3 +106,57 @@ currently the array is only 1D, so indexing is fairly manual.
57
106
```Mojo
58
107
reader[0] # first element
59
108
```
109
+
110
+
111
+ ### Performance
112
+
113
+ - average times over 1k iterations
114
+
115
+ - uncompiled
116
+ - single-threaded
117
+
118
+ micro file benchmark (3 rows)
119
+ mini (100 rows)
120
+ small (1k rows)
121
+ medium file benchmark (100k rows)
122
+ large file benchmark (2m rows)
123
+
124
+ ```log
125
+ ✨ Pixi task (bench): mojo bench.mojo
126
+ running benchmark for micro csv:
127
+ average time in ms for micro file:
128
+ 0.007699
129
+ -------------------------
130
+ running benchmark for mini csv:
131
+ average time in ms for mini file:
132
+ 0.241136
133
+ -------------------------
134
+ running benchmark for small csv:
135
+ average time in ms for small file:
136
+ 1.388513
137
+ -------------------------
138
+ running benchmark for medium csv:
139
+ average time in ms for medium file:
140
+ 121.217188
141
+ -------------------------
142
+ running benchmark for large csv:
143
+ average time in ms for large file:
144
+ 3582.876541
145
+ ```
146
+
147
+ Performance comparison on various file sizes (average of multiple runs):
148
+
149
+ | File Size | Single-threaded | Multi-threaded | Speedup |
150
+ | ------------ | --------------- | -------------- | ------- |
151
+ | 1,000 rows | 1.42ms | 1.30ms | 1.09x |
152
+ | 100,000 rows | 125ms | 105ms | 1.19x |
153
+
154
+ _Tested on AMD 7950x (16 cores) @ 5.8GHz_
155
+
156
+ ## Future Improvements
157
+
158
+ - [ ] SIMD optimization within each thread
159
+ - [ ] Async Chunking
160
+ - [ ] Streaming support for very large files
161
+ - [ ] Memory pool for reduced allocations
162
+ - [ ] Progress callbacks for long-running operation
0 commit comments