|
| 1 | +# InfluxDB v3 Sink |
| 2 | + |
| 3 | +InfluxDB is an open source time series database for metrics, events, and real-time analytics. |
| 4 | + |
| 5 | +Quix Streams provides a sink to write processed data to InfluxDB v3. |
| 6 | + |
| 7 | +>***NOTE***: This sink only supports InfluxDB v3. Versions 1 and 2 are not supported. |
| 8 | +
|
| 9 | +## How To Use the InfluxDB Sink |
| 10 | + |
| 11 | +To sink data to InfluxDB, you need to create an instance of `InfluxDB3Sink` and pass |
| 12 | +it to the `StreamingDataFrame.sink()` method: |
| 13 | + |
| 14 | +```python |
| 15 | +from quixstreams import Application |
| 16 | +from quixstreams.sinks.influxdb3 import InfluxDB3Sink |
| 17 | + |
| 18 | +app = Application(broker_address="localhost:9092") |
| 19 | +topic = app.topic("numbers-topic") |
| 20 | + |
| 21 | +# Initialize InfluxDB3Sink |
| 22 | +influx_sink = InfluxDB3Sink( |
| 23 | + token="<influxdb-access-token>", |
| 24 | + host="<influxdb-host>", |
| 25 | + organization_id="<influxdb-org>", |
| 26 | + database="<influxdb-database>", |
| 27 | + measurement="numbers", |
| 28 | + fields_keys=["number"], |
| 29 | + tags_keys=["tag"] |
| 30 | +) |
| 31 | + |
| 32 | +sdf = app.dataframe(topic) |
| 33 | +# Do some processing here ... |
| 34 | +# Sink data to InfluxDB |
| 35 | +sdf.sink(influx_sink) |
| 36 | +``` |
| 37 | + |
| 38 | +## How the InfluxDB Sink Works |
| 39 | +`InfluxDB3Sink` is a batching sink. |
| 40 | +It batches processed records in memory per topic partition, and writes them to the InfluxDB instance when a checkpoint has been committed. |
| 41 | + |
| 42 | +Under the hood, it transforms data to the Influx format using and writes processed records in batches. |
| 43 | + |
| 44 | +### What data can be sent to InfluxDB? |
| 45 | + |
| 46 | +`InfluxDB3Sink` can accept only dictionaries values. |
| 47 | + |
| 48 | +If the record values are not dicts, you need to convert them to dicts using `StreamingDataFrame.apply()` before sinking. |
| 49 | + |
| 50 | +The structure of the sinked data is defined by the `fields_keys` and `tags_keys` parameters provided to the sink class. |
| 51 | + |
| 52 | +- `fields_keys` - a list of keys to be used as "fields" when writing to InfluxDB. |
| 53 | +If present, its keys cannot overlap with any in `tags_keys`. |
| 54 | +If empty, the whole record value will be used. |
| 55 | +The fields' values can only be strings, floats, integers, or booleans. |
| 56 | + |
| 57 | +- `tags_keys` - a list of keys to be used as "tags" when writing to InfluxDB. |
| 58 | +If present, its keys cannot overlap with any in `fields_keys`. |
| 59 | +These keys will be popped from the value dictionary automatically because InfluxDB doesn't allow the same keys be both in tags and fields. |
| 60 | +If empty, no tags will be sent. |
| 61 | +>***NOTE***: InfluxDB client always converts tag values to strings. |
| 62 | +
|
| 63 | +To learn more about schema design and data types in InfluxDB, please read [InfluxDB schema design recommendations](https://docs.influxdata.com/influxdb/cloud-serverless/write-data/best-practices/schema-design/). |
| 64 | + |
| 65 | +## Delivery Guarantees |
| 66 | +`InfluxDB3Sink` provides at-least-once guarantees, and the same records may be written multiple times in case of errors during processing. |
| 67 | + |
| 68 | +## Backpressure Handling |
| 69 | +InfluxDB sink automatically handles events when the database cannot accept new data due to write limits. |
| 70 | + |
| 71 | +When this happens, the application loses the accumulated in-memory batch and pauses the corresponding topic partition for a timeout duration returned by InfluxDB API (it returns an HTTP error with 429 status code and a `Retry-After` header with a timeout). |
| 72 | +When the timeout expires, the app automatically resumes the partition to re-process the data and sink it again. |
| 73 | + |
| 74 | +## Configuration |
| 75 | +InfluxDB3Sink accepts the following configuration parameters: |
| 76 | + |
| 77 | +- `token` - InfluxDB access token. |
| 78 | + |
| 79 | +- `host` - InfluxDB host in format "https://<host>" |
| 80 | + |
| 81 | +- `organization_id` - InfluxDB organization ID. |
| 82 | + |
| 83 | +- `database` - a database name. |
| 84 | + |
| 85 | +- `measurement` - a measurement name, required. |
| 86 | + |
| 87 | +- `fields_keys` - a list of keys to be used as "fields" when writing to InfluxDB. |
| 88 | +See the [What data can be sent to InfluxDB](#what-data-can-be-sent-to-influxdb) for more info. |
| 89 | + |
| 90 | +- `tags_keys` - a list of keys to be used as "tags" when writing to InfluxDB. |
| 91 | +See the [What data can be sent to InfluxDB](#what-data-can-be-sent-to-influxdb) for more info. |
| 92 | + |
| 93 | + |
| 94 | +- `time_key` - a key to be used as "time" when writing to InfluxDB. |
| 95 | +By default, the record timestamp will be used with millisecond time precision. |
| 96 | +When using a custom key, you may need to adjust the `time_precision` setting to match. |
| 97 | + |
| 98 | +- `time_precision` - a time precision to use when writing to InfluxDB. |
| 99 | +Default - `ms`. |
| 100 | + |
| 101 | +- `include_metadata_tags` - if True, includes the record's key, topic, and partition as tags. |
| 102 | +Default - `False`. |
| 103 | + |
| 104 | +- `batch_size` - the number of records to write to InfluxDB in one request. |
| 105 | +Note that it only affects the size of one write request, and not the number of records flushed on each checkpoint. |
| 106 | +Default - `1000`. |
| 107 | + |
| 108 | +- `enable_gzip` - if True, enables gzip compression for writes. |
| 109 | +Default - `True`. |
| 110 | + |
| 111 | +- `request_timeout_ms` - an HTTP request timeout in milliseconds. |
| 112 | +Default - `10000`. |
| 113 | + |
| 114 | +- `debug` - if True, print debug logs from InfluxDB client. |
| 115 | +Default - `False`. |
0 commit comments