Releases · quixio/quix-streams

23 Sep 16:10

v3.23.1

15f8f50

v3.23.1 Latest

Latest

What's Changed

🦠 Bugfixes

Fix NoSuchKey errors in S3FileSource by @gwaramadze in #1037
dynamic configuration lookup: wait until the configs are fully read from the configuration topic by @daniil-quix in #1036
conda/meta.yaml: fix syntax by @daniil-quix in #1035

Full Changelog: v3.23.0...v3.23.1

Contributors

gwaramadze

Assets 2

18 Sep 15:45

v3.23.0

574c1e7

v3.23.0

What's Changed

🦠 Bugfixes

sink influxdb3: fix getting version by @antalf in #1030
Fix jsonschema serialization test by @gwaramadze in #1019
TopicManager: handle missing "retention.bytes" in derive_topic_config() by @daniil-quix in #1031

🔌 Connectors

Connectors: Fix Kinesis dependency by @tim-quix in #1013
Connectors: fix FileSink/Source-based dependencies by @tim-quix in #1012
Connector: FileSink refactor by @tim-quix in #1004
Add MQTT Sink by @SteveRosam in #659 #1032

🛠️ Internal

Add retry mechanism for Quix Portal API connection errors by @gwaramadze in #1029
feat(lookups.quix_configuration): add support for Quix SDK token auth by @ovv in #1007
Migrate from requests to httpx by @daniil-quix in #1020

Dependencies

Update pre-commit requirement from <4.3,>=3.4 to >=3.4,<4.4 by @dependabot[bot] in #1002
Bump types-protobuf from 6.30.2.20250703 to 6.30.2.20250822 by @dependabot[bot] in #1015
Bump types-jsonschema from 4.25.0.20250720 to 4.25.1.20250822 by @dependabot[bot] in #1016
Bump types-protobuf from 6.30.2.20250822 to 6.30.2.20250914 by @dependabot[bot] in #1025
Bump testcontainers[postgres] from 4.12.0 to 4.13.0 by @dependabot[bot] in #1026
Bump mypy from 1.17.1 to 1.18.1 by @dependabot[bot] in #1024

New Contributors

@antalf made their first contribution in #1030

Full Changelog: v3.22.0...v3.23.0

Contributors

gwaramadze, ovv, and 4 other contributors

Assets 2

08 Aug 14:04

v3.22.0

8b9df35

v3.22.0

What's Changed

[BREAKING] Remove default value for quix_portal_api URL.
See the Connecting to Quix Cloud to learn how to get the Quix Portal API URL and connect to the Quix broker locally.
By @gwaramadze in #991
Improvement: TDengine sink adds empty value checking and supports string-formatted timestamps by @huskar-t in #979
Bump mypy from 1.17.0 to 1.17.1 by @dependabot[bot] in #992

New Contributors

@huskar-t made their first contribution in #979

Full Changelog: v3.21.0...v3.22.0

Contributors

gwaramadze, huskar-t, and dependabot

Assets 2

24 Jul 13:25

v3.21.0

da3142f

v3.21.0

What's Changed

💎 Split data into multiple topics with `StreamingDataFrame.to_topic()`

To dynamically route messages to different topics based on the message content, you can now provide a callable that returns a Topic object to the StreamingDataFrame.to_topic() method:

from quixstreams import Application


app = Application(...)

# Declare topics
input_topic = app.topic('sensor-data', value_deserializer='json')
normal_topic = app.topic('normal-readings', value_serializer='json')
alert_topic = app.topic('high-temp-alerts', value_serializer='json')

sdf = app.dataframe(input_topic)

def route_by_temperature(value, key, timestamp: int, headers):
    """
    Send messages to different topics based on the temperature sensor value.
    """
    if value.get('temperature', 0) > 80:
        return alert_topic
    else:
        return normal_topic

sdf.to_topic(topic=route_by_temperature)

See more in the "Splitting data into multiple topics" section in the docs.

By @gwaramadze in #976

Dependencies

Update confluent-kafka[avro,json,protobuf,schemaregistry] requirement from <2.10,>=2.8.2 to >=2.8.2,<2.12 by @dependabot[bot] in #849
Bump testcontainers[postgres] from 4.10.0 to 4.12.0 by @dependabot[bot] in #984
Bump types-jsonschema from 4.24.0.20250708 to 4.25.0.20250720 by @dependabot[bot] in #981

Full Changelog: v3.20.0...v3.21.0

Contributors

gwaramadze and dependabot

Assets 2

18 Jul 10:58

v3.20.0

9f95914

v3.20.0

What's Changed

Prefix transactional.id by the Quix workspace id when connecting to Quix brokers by @gwaramadze in #974
Add transactional:bool parameter to Application.get_producer() to enable Kafka Transactions API in raw Producer by @gwaramadze in #974
lookup.quix_configuration_service: add support for binary and JSON fields by @ovv in #971
lookup.quix_configuration_service: Handle valid_from=None in configuration version selection by @ovv in #946
PostgreSQLSink: add support for ON CONFLICT DO UPDATE on primary keys by @tim-quix in #966

Dependencies

Update pydantic-settings requirement from <2.10,>=2.3 to >=2.3,<2.11 by @dependabot[bot] in #951
Bump types-jsonschema from 4.24.0.20250528 to 4.24.0.20250708 by @dependabot[bot] in #973
Bump mypy from 1.16.1 to 1.17.0 by @dependabot[bot] in #972

Full Changelog: v3.19.0...v3.20.0

Contributors

gwaramadze, ovv, and 2 other contributors

Assets 2

09 Jul 14:05

v3.19.0

698275b

v3.19.0

What's Changed

🔌 Connectors

A new sink for TDengine - an open source time series database optimized for IoT, connected vehicles, and industrial applications.
See TDengineSink docs to learn more.
By @jbrass in #931

🛠️ Internal

Update the Application.run() logs by @daniil-quix in #964
TDengine sink updates by @daniil-quix in #968

Dependencies

Bump types-protobuf from 6.30.2.20250516 to 6.30.2.20250703 by @dependabot in #965

New Contributors

@jbrass made their first contribution in #931

Full Changelog: v3.18.1...v3.19.0

Contributors

jbrass and dependabot

Assets 2

04 Jul 13:21

v3.18.1

0dddac3

v3.18.1

What's Changed

Fix PostgersLookup imports by @daniil-quix in #962
Readme: convert the Roadmap block to Features by @daniil-quix in #961

Full Changelog: v3.18.0...v3.18.1

Assets 2

03 Jul 11:11

v3.18.0

cad77e0

v3.18.0

What's Changed

💎 Join Lookup: PostgreSQL

Added a Lookup join implementation for enriching streaming data with data from a Postgres database.

The new PostgresLookup allows querying a Postgres database for each field, using a persistent connection and per-field caching based on a configurable TTL.
The cache is a "Least Recently Used" (LRU) cache with a configurable maximum size.

See PostgresLookup API docs for more info.

from quixstreams import Application
from quixstreams.dataframe.joins.lookups.postgresql import PostgresLookup

app = Application(...)
sdf = app.dataframe(...)

# Initialize PostgresLookup with Postgres credentials
lookup = PostgresLookup(
    host="<host>",
    port=5432,
    dbname="<db>",
    user="<user>",
    password="<password>",
    cache_size=1000,
)

# Add columns "table_column1" and "table_column2" from "my_table" to the Kafka record as a new field "joined".
# Match by comparing "my_record_field" on the left and "table_column1" on the right.
fields = {
    "joined": lookup.field(
        table="my_table", columns=["table_column1", "table_column2"], on="table_column1"
    ),
}
sdf = sdf.join_lookup(lookup, fields, on="my_record_field")


app.run()

By @tim-quix in #954 #959

🦠 Bugfixes

Fix typo in Producer by @gwaramadze in #953

Full Changelog: v3.17.0...v3.18.0

Contributors

gwaramadze and tim-quix

Assets 2

23 Jun 14:16

v3.17.0

de7f249

v3.17.0

What's Changed

💎 Interval joins: `StreamingDataFrame.join_interval()`

Use StreamingDataFrame.join_interval() to join two topics into a new stream where each record is merged with records from the other topic that fall within a specified time interval.

This join is useful for cases where you need to match records that occur within a specific time window of each other, rather than just the latest record (as in as-of join).

from datetime import timedelta

from quixstreams import Application

app = Application(...)

sdf_measurements = app.dataframe(app.topic("measurements"))
sdf_events = app.dataframe(app.topic("events"))

# Join records from the topic "measurements"
# with records from "events" that occur within a 5-minute window
# before and after each measurement
sdf_joined = sdf_measurements.join_interval(
    right=sdf_events,
    how="inner",                 # Emit updates only if matches are found
    on_merge="keep-left",        # Prefer the columns from the left dataframe if they overlap
    grace_ms=timedelta(days=7),  # Keep the state for 7 days
    backward_ms=timedelta(minutes=5),  # Look for events up to 5 minutes before
    forward_ms=timedelta(minutes=5),   # Look for events up to 5 minutes after
)

if __name__ == '__main__':
    app.run()

Please take a look at the Interval Join docs for more examples.

By @gwaramadze in #924

[breaking] 💥 Updated `Application.run()` behavior with `count` and `timestamp` parameters

1. Changed the meaning of the `count` parameter.

Previously, when calling Application.run(count=...) or Application.run(count=..., timeout=...), the count parameter meant
"Number of messages to process before stopping the app".

In this update, we're changing the meaning of the count parameter to "number of outputs".

A simple way to think about outputs is "how many messages my application would send to an output topic."
This behavior is more intuitive instead of counting input messages.

Note that operations like filtering or aggregations reduce the number of outputs, and StreamingDataFrame.apply(..., expand=True) may output more data than it receives.

2. `Application.run()` can now collect and return outputs when `collect=True` is passed.

You can now test and debug the applications more easily using count and/or timeout parameters:

from quixstreams import Application

app = Application(broker_address="localhost:9092")

topic = app.topic("some-topic")
# Assume the topic has one partition and three JSON messages:
#  {"temperature": 30} 
#  {"temperature": 40}
#  {"temperature": 50}

sdf = app.dataframe(topic=topic)

# Process one output and collect the values (stops if no messages for 10s)
result_values_only = app.run(count=1, timeout=10, collect=True)  
# >>> result_values_only = [
#   {"temperature": 30}
# ]


# Process one output and collect the values with metadata (stops if no messages for 10s)
result_values_and_metadata = app.run(count=1, timeout=10, collect=True, metadata=True)
# >>> result_values_and_metadata = [
#   {"temperature": 40, "_key": "<message_key>", "_timestamp": 123, "_offset": 1, "_topic": "some-topic", "_partition": 1, "_headers": None},
# ]


# Process one output and without collecting (stops if no messages for 10s)
result_empty = app.run(count=1, timeout=10, collect=False)
# >>> result_empty = []

See more details in Inspecting Data and Debugging section.

By @daniil-quix in #932

💎 Log recovery progress

The application now logs the recovery progress every 10s to simplify the monitoring:

...
[INFO] [quixstreams] : Recovery progress for <RecoveryPartition "changelog__state-store[0]">: 100 / 1000
[INFO] [quixstreams] : Recovery progress for <RecoveryPartition "changelog__state-store[0]">: 199 / 1000
...

By @gwaramadze in #941

📖 Docs

Add docs for InfluxDB1Sink by @daniil-quix in #947 #949

🛠️ Internal

Use generic WindowType by @gwaramadze in #942
Remove the window expiration logs by @daniil-quix in #945

🔌 Connectors

PostgresqlSink: correctly handle jsonb values by @tim-quix in #935
BigQuerySink: use quix-streams user agent in requests by @gwaramadze in #943
InfluxDB1Sink: new by @tim-quix in #936

Dependencies

Bump mypy from 1.16.0 to 1.16.1 by @dependabot in #934
Bump types-requests from 2.32.0.20250602 to 2.32.4.20250611 by @dependabot in #933

Full Changelog: v3.16.1...v3.17.0

Contributors

gwaramadze, dependabot, and tim-quix

Assets 2

11 Jun 09:46

v3.16.1

8da5912

v3.16.1

What's Changed

🦠 Bugfixes

StreamingDataFrame: retain a custom stream_id across operations by @daniil-quix in #925

⚠️ Note: This fix may affect the existing stateful operations on the DataFrames with single-partition topics called after .group_by() due to optimizations introduced in v3.14.0.

The stream_id is used as part of the State stores' names, and it wasn't propagated correctly, leading to incorrect store names in some cases.
The fix in #925 corrects that, but the state stores created after .filter() or .apply() operations on the grouped DataFrame won't be accessible anymore because of the corrected stream ids.
See #925 for more examples of affected code.

Other fixes

Fix conda requirements by @daniil-quix in #920
sinks/InfluxDB3: adjust check to work with v2 and v3 by @tim-quix in #922

Full Changelog: v3.16.0...v3.16.1

Contributors

tim-quix

Assets 2

Releases: quixio/quix-streams

v3.23.1

What's Changed

🦠 Bugfixes

Contributors

Uh oh!

v3.23.0

What's Changed

🦠 Bugfixes

🔌 Connectors

🛠️ Internal

Dependencies

New Contributors

Contributors

Uh oh!

v3.22.0

What's Changed

New Contributors

Contributors

Uh oh!

v3.21.0

What's Changed

💎 Split data into multiple topics with StreamingDataFrame.to_topic()

Dependencies

Contributors

Uh oh!

v3.20.0

What's Changed

Dependencies

Contributors

Uh oh!

v3.19.0

What's Changed

🔌 Connectors

🛠️ Internal

Dependencies

New Contributors

Contributors

Uh oh!

v3.18.1

What's Changed

Uh oh!

v3.18.0

What's Changed

💎 Join Lookup: PostgreSQL

🦠 Bugfixes

Contributors

Uh oh!

v3.17.0

What's Changed

💎 Interval joins: StreamingDataFrame.join_interval()

[breaking] 💥 Updated Application.run() behavior with count and timestamp parameters

1. Changed the meaning of the count parameter.

2. Application.run() can now collect and return outputs when collect=True is passed.

💎 Log recovery progress

📖 Docs

🛠️ Internal

🔌 Connectors

Dependencies

Contributors

Uh oh!

v3.16.1

What's Changed

🦠 Bugfixes

StreamingDataFrame: retain a custom stream_id across operations by @daniil-quix in #925

Other fixes

Contributors

Uh oh!

💎 Split data into multiple topics with `StreamingDataFrame.to_topic()`

💎 Interval joins: `StreamingDataFrame.join_interval()`

[breaking] 💥 Updated `Application.run()` behavior with `count` and `timestamp` parameters

1. Changed the meaning of the `count` parameter.

2. `Application.run()` can now collect and return outputs when `collect=True` is passed.