Skip to content

Commit 8623080

Browse files
authored
Update the FileSource and FileSink docs (#612)
1 parent 56d34af commit 8623080

File tree

4 files changed

+19
-11
lines changed

4 files changed

+19
-11
lines changed

docs/connectors/sinks/file-sink.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,13 @@
99
This sink writes batches of data to files on disk in various formats.
1010
By default, the data will include the kafka message key, value, and timestamp.
1111

12-
Currently supports the following formats:
12+
Currently, it supports the following formats:
1313

1414
- JSON
1515
- Parquet
1616

1717
## How the File Sink Works
18+
1819
`FileSink` is a batching sink.
1920

2021
It batches processed records in memory per topic partition and writes them to files in a specified directory structure. Files are organized by topic and partition, with each batch being written to a separate file named by its starting offset.

docs/connectors/sources/file-source.md

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,19 @@
1-
# Quix File Source Connector
1+
# File Source
2+
3+
!!! info
4+
5+
This is a **Community** connector. Test it before using in production.
6+
7+
To learn more about differences between Core and Community connectors, see the [Community and Core Connectors](../community-and-core.md) page.
28

39
This source enables reading from a localized file source, such as a JSONlines or Parquet
410
file. It also supports file (de)compression.
511

612
The resulting messages can be produced in "replay" mode, where the time between record
713
producing is matched as close as possible to the original. (per topic partition only).
814

9-
The Quix File Source Connector is generally intended to be used alongside the related
10-
Quix File Sink Connector (in terms of expected file and data formatting).
15+
The File Source connector is generally intended to be used alongside the related
16+
[File Sink](../sinks/file-sink.md) (in terms of expected file and data formatting).
1117

1218
## How to use CSV Source
1319

@@ -17,6 +23,8 @@ and pass it to the `app.dataframe()` method.
1723
One important thing to note is that you should in general point to a single topic folder
1824
(rather than a root folder with many topics) otherwise topic partitions may not line up correctly.
1925

26+
For the full description of expected parameters, see the [File Source API](../../api-reference/sources.md#filesource) page.
27+
2028
```python
2129
from quixstreams import Application
2230
from quixstreams.sources.community.file import FileSource
@@ -36,7 +44,7 @@ if __name__ == "__main__":
3644

3745
## File hierarchy/structure
3846

39-
The Quix File Source Connector expects a folder structure like so:
47+
The File Source expects a folder structure like so:
4048

4149
```
4250
my_sinked_topics/
@@ -51,13 +59,13 @@ The Quix File Source Connector expects a folder structure like so:
5159
└── etc...
5260
```
5361

54-
This is the default structure generated by the Quix File Sink Connector.
62+
This is the default structure generated by the File Sink.
5563

5664
## File data format/schema
5765

5866
The expected data schema is largely dependent on the file format chosen.
5967

60-
For easiest use with the Quix File Sink Connector, you can follow these patterns:
68+
For easiest use with the [File Sink](../sinks/file-sink.md), you can follow these patterns:
6169

6270
- for row-based formats (like JSON), the expected data should have records
6371
with the following fields, where value is the entirety of the message value,

mkdocs.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,17 +50,17 @@ nav:
5050
- Sinks:
5151
- 'connectors/sinks/README.md'
5252
- Apache Iceberg Sink: connectors/sinks/apache-iceberg-sink.md
53-
- File Sink: connectors/sinks/file-sink.md
5453
- CSV Sink: connectors/sinks/csv-sink.md
5554
- InfluxDB v3 Sink: connectors/sinks/influxdb3-sink.md
55+
- File Sink: connectors/sinks/file-sink.md
5656
- Creating a Custom Sink: connectors/sinks/custom-sinks.md
5757
- Sources:
5858
- 'connectors/sources/README.md'
5959
- CSV Source: connectors/sources/csv-source.md
60+
- File Source: connectors/source/file-source.md
6061
- Kafka Replicator Source: connectors/sources/kafka-source.md
6162
- Quix Source: connectors/sources/quix-source.md
6263
- Creating a Custom Source: connectors/sources/custom-sources.md
63-
- Local File Source: connectors/source/file-source.md
6464
- Contribution Guide: 'connectors/contribution-guide.md'
6565
- Community and Core Connectors: 'connectors/community-and-core.md'
6666
- Upgrading Guide:

quixstreams/sources/community/file/file.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,7 @@ class FileSource(Source):
1919
Ingest a set of local files into kafka by iterating through the provided folder and
2020
processing all nested files within it.
2121
22-
Expects folder and file structures as generated by the related Quix Streams File
23-
Sink Connector:
22+
Expects folder and file structures as generated by the related FileSink connector:
2423
2524
my_topics/
2625
├── topic_a/

0 commit comments

Comments
 (0)