Skip to content

Commit 2c67344

Browse files
PhelsongJosh S Wilkinsoncarolinefrasca
authored
update mojo_csv to 1.3 (#148)
* mojo_csv * Update recipe.yaml for CI * unquote for ci * update test * Update source git URL * change tests syntax * verify 25.3.0 and update test * update test * update csv path * update test to cwd * update build and versioning * add logo * update readme * update mojo_csv to 1.3 --------- Co-authored-by: Josh S Wilkinson <[email protected]> Co-authored-by: Caroline Frasca <[email protected]>
1 parent aa7c307 commit 2c67344

File tree

2 files changed

+108
-3
lines changed

2 files changed

+108
-3
lines changed

recipes/mojo_csv/README.md

Lines changed: 104 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,14 @@ Csv parsing library written in pure Mojo
1010

1111
### usage
1212

13-
Add the Modular community channel (https://repo.prefix.dev/modular-community) to your mojoproject.toml file or pixi.toml file in the channels section.
13+
Add the Modular community channel (https://repo.prefix.dev/modular-community) to your pixi.toml file in the channels section.
14+
15+
```title:pixi.toml
16+
channels = ["conda-forge", "https://conda.modular.com/max", "https://repo.prefix.dev/modular-community"]
17+
```
18+
19+
`pixi add mojo_csv`
20+
1421

1522
##### Basic Usage
1623

@@ -38,6 +45,47 @@ fn main():
3845
print(reader[i])
3946
```
4047

48+
#### BETA
49+
```mojo
50+
ThreadedCsvReader(
51+
file_path: Path,
52+
delimiter: String = ",",
53+
quotation_mark: String = '"',
54+
num_threads: Int = 0 # 0 = use all available cores
55+
)
56+
```
57+
58+
### Example 1: Default (All Cores)
59+
60+
```mojo
61+
var reader = ThreadedCsvReader(Path("large_file.csv"))
62+
// Uses all 16 cores on a 16-core system
63+
```
64+
65+
### Example 2: Custom Thread Count
66+
67+
```mojo
68+
var reader = ThreadedCsvReader(Path("data.csv"), num_threads=4)
69+
// Uses exactly 4 threads
70+
```
71+
72+
### Example 3: Single-threaded
73+
74+
```mojo
75+
var reader = ThreadedCsvReader(Path("data.csv"), num_threads=1)
76+
// Forces single-threaded execution (same as CsvReader)
77+
```
78+
79+
### Example 4: Custom Delimiter
80+
81+
````mojo
82+
var reader = ThreadedCsvReader(
83+
Path("pipe_separated.csv"),
84+
delimiter="|",
85+
num_threads=8
86+
)
87+
88+
4189
### Attributes
4290
4391
```mojo
@@ -48,6 +96,7 @@ reader.row_count : Int # total number of rows T->B
4896
reader.column_count : Int # total number of columns L->R
4997
reader.elements : List[String] # all delimited elements
5098
reader.length : Int # total number of elements
99+
51100
```
52101
53102
##### Indexing
@@ -57,3 +106,57 @@ currently the array is only 1D, so indexing is fairly manual.
57106
```Mojo
58107
reader[0] # first element
59108
```
109+
110+
111+
### Performance
112+
113+
- average times over 1k iterations
114+
- [email protected] (peak clock)
115+
- uncompiled
116+
- single-threaded
117+
118+
micro file benchmark (3 rows)
119+
mini (100 rows)
120+
small (1k rows)
121+
medium file benchmark (100k rows)
122+
large file benchmark (2m rows)
123+
124+
```log
125+
✨ Pixi task (bench): mojo bench.mojo
126+
running benchmark for micro csv:
127+
average time in ms for micro file:
128+
0.007699
129+
-------------------------
130+
running benchmark for mini csv:
131+
average time in ms for mini file:
132+
0.241136
133+
-------------------------
134+
running benchmark for small csv:
135+
average time in ms for small file:
136+
1.388513
137+
-------------------------
138+
running benchmark for medium csv:
139+
average time in ms for medium file:
140+
121.217188
141+
-------------------------
142+
running benchmark for large csv:
143+
average time in ms for large file:
144+
3582.876541
145+
```
146+
147+
Performance comparison on various file sizes (average of multiple runs):
148+
149+
| File Size | Single-threaded | Multi-threaded | Speedup |
150+
| ------------ | --------------- | -------------- | ------- |
151+
| 1,000 rows | 1.42ms | 1.30ms | 1.09x |
152+
| 100,000 rows | 125ms | 105ms | 1.19x |
153+
154+
_Tested on AMD 7950x (16 cores) @ 5.8GHz_
155+
156+
## Future Improvements
157+
158+
- [ ] SIMD optimization within each thread
159+
- [ ] Async Chunking
160+
- [ ] Streaming support for very large files
161+
- [ ] Memory pool for reduced allocations
162+
- [ ] Progress callbacks for long-running operation

recipes/mojo_csv/recipe.yaml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
context:
2-
version: 1.2.1
2+
version: 1.3.0
3+
34

45
package:
56
name: "mojo_csv"
67
version: ${{ version }}
78

89
source:
910
- git: https://github.com/Phelsong/mojo_csv.git
10-
rev: c0c5b4b1fd7d7c4db1c504d1104b38f10ef9ff70
11+
rev: 4d6643f27fe15e86263cdc66fd383cf345811228
12+
1113

1214
build:
1315
number: 0

0 commit comments

Comments
 (0)