Releases · quixio/quix-streams

13 Feb 14:03

daniil-quix

v3.8.1

4150fbf

v3.8.1

What's Changed

New PandasDataFrameSource connector to stream data from pandas.DataFrames during development and debugging by @JotaBlanco and @daniil-quix in #748
Made logging of common Kafka ACL issues more helpful by providing potentially missing ACLs and topic names by @tim-quix in
#742

Fix docs for MongoDBSink by @tim-quix in #746
Bump mypy from 1.13.0 to 1.15.0 by @dependabot in #744

Full Changelog: v3.8.0...v3.8.1

Contributors

dependabot, JotaBlanco, and 2 other contributors

Assets 2

07 Feb 11:13

daniil-quix

v3.8.0

cc2ba06

v3.8.0

What's Changed

💎 Count-based windows

Count-based windows allow aggregating events based on their number instead of time.
They can be helpful when time is not relevant to the particular aggregation or when a large number of out-of-order events are expected in the data stream.
Count-based windows support the same aggregations as time-based windows, including .reduce() and .collect().
Supported window types:

tumbling_count_window() - slice incoming stream into fixed-sized batches
hopping_count_window() - slice incoming stream into overlapping batches of a fixed size with a fixed step.
sliding_count_window() - same as to count-based hopping windows with a step of 1 (e.g., last 10 events in the stream)

Example:

from quixstreams import Application

app = Application(...)
sdf = app.dataframe(...)


sdf = (
    # Define a count-based tumbling window of size 3
    sdf.tumbling_count_window(count=3)

    # Specify the "collect" aggregate function
    .collect()

    # Emit updates once the window is closed
    .final()
)

# Expected output:
# {
#    "value": [<event1>, <event2>, <event3>], 
#    "start": <min timestamp in the batch>, 
#    "end": <max timestamp in the batch>
# }

See the "Windowed Aggregations" docs page for more info.

By @quentin-quix in #736 #739

💎 New Connectors

By @tim-quix in #733 #727

💎 A callback to react to late messages in Windows

Time-based windows can now accept on_late callbacks to react to late messages in the windows.
You can use this callback to customize the logging of such messages or to send them to some dead-letter queue, for example.

Example:

from typing import Any

from datetime import timedelta
from quixstreams import Application

app = Application(...)
sdf = app.dataframe(...)


def on_late(
    value: Any,         # Record value
    key: Any,           # Record key
    timestamp_ms: int,  # Record timestamp
    late_by_ms: int,    # How late the record is in milliseconds
    start: int,         # Start of the target window
    end: int,           # End of the target window
    name: str,          # Name of the window state store
    topic: str,         # Topic name
    partition: int,     # Topic partition
    offset: int,        # Message offset
) -> bool:
    """
    Define a callback to react on late records coming into windowed aggregations.
    Return `False` to suppress the default logging behavior.
    """
    print(f"Late message is detected at the window {(start, end)}")
    return False

# Define a 1-hour tumbling window and provide the "on_late" callback to it
sdf.tumbling_window(timedelta(hours=1), on_late=on_late)


# Start the application
if __name__ == '__main__':
    app.run()

See more in the docs

by @daniil-quix in #701 #732

🦠 Bugfixes

Do not process late messages in sliding windows by @gwaramadze in #728

Other Changes

StreamingDataFrame.merge(): prep work by @daniil-quix in #725
windows: extract base class for windows and window definitions by @quentin-quix in #730
state: refactor collection store to not rely on timestamp by @quentin-quix in #734

Full Changelog: v3.7.0...v3.8.0

Contributors

gwaramadze, daniil-quix, and tim-quix

Assets 2

23 Jan 09:00

daniil-quix

v3.7.0

a14cbf9

v3.7.0

What's Changed

[NEW] 💎 Collection-based windowed aggregations

A new window operation was added to gather all events in the window into batches - collect().
You can use it to perform aggregations requiring collections that cannot be expressed via the reduce() approach, such as calculating medians.

This operation is optimized for collecting values and performs significantly better than using reduce() to accumulate batches of data.

Example:

### Collect all events over a 10-minute tumbling window into a list. ###

from datetime import timedelta
from quixstreams import Application

app = Application(...)
sdf = app.dataframe(...)

sdf = (
    # Define a tumbling window of 10 minutes
    sdf.tumbling_window(timedelta(minutes=10))

    # Collect events in the window into a list
    .collect()

    # Emit results only for closed windows
    .final()
)
# Output:
# {
#   'start': <window start>, 
#   'end': <window end>, 
#   'value': [event1, event2, event3, ...] - list of all events in the window
# }

Docs - https://quix.io/docs/quix-streams/windowing.html#collect

By @gwaramadze in #688

Full Changelog: v3.6.1...v3.7.0

Contributors

gwaramadze

Assets 2

17 Jan 14:29

daniil-quix

v3.6.1

a624bf3

v3.6.1

What's Changed

⚠️ Fix the bug when creating a changelog topic set the `cleanup.policy` for the source topic to `compact`

Only topics created on the fly and repartition topics were affected. The configuration of existing topics is intact.

Please check the cleanup.policy for the topics used in the applications and adjust if necessary.

Introduced in v3.4.0.

Fixed by @quentin-quix in #716

Other changes

Influxdb3 Sink: add some functionality and QoL improvements by @tim-quix in #689
Bump types-protobuf from 5.28.3.20241030 to 5.29.1.20241207 by @dependabot in #683

Full Changelog: v3.6.0...v3.6.1

Contributors

dependabot and tim-quix

Assets 2

15 Jan 14:40

daniil-quix

v3.6.0

1c43924

v3.6.0

What's Changed

Main Changes

⚠️ Switch to `"range"` assignor strategy from `"cooperative-sticky"`

Due to discovered issues with the "cooperative-sticky" assignment strategy, commits made during the rebalancing phase were failing.
To avoid that, we changed the partition assignor to "range" which doesn't have such issues.
Note that "range" assignor is enforced for consumers used by Application, but it can be overridden for consumers created via app.get_consumer() API.

❗How to update:
Since "cooperative-sticky" and "range" strategies must not be mixed, all consumers in the group must first leave the group, and then rejoin it after upgrading the application to Quix Streams v3.6.0.

For more details, see #705 and #712

Other Changes

Source: background file downloads for FileSource by @tim-quix in #670
Fix lateness warnings in Windows by @daniil-quix in #700
mypy: make quixstreams.core.* pass type checks by @quentin-quix in #685
mypy: ensure default are set in overloaded methods by @quentin-quix in #698
mypy: make quixstreams.dataframe.* pass type checks by @quentin-quix in #695

Docs

Update mkdocs.yml by @gwaramadze in #703
Update Documentation by @github-actions in #696
Update Documentation by @github-actions in #699
Bump version to 3.6.0 by @daniil-quix in #711

Full Changelog: v3.5.0...v3.6.0

Contributors

gwaramadze, daniil-quix, and tim-quix

Assets 2

19 Dec 15:17

daniil-quix

v3.5.0

39ec91b

v3.5.0

What's Changed

Features

Added Azure File Source and Azure File Sink by @tim-quix in #669 and #671
Pydantic ImportString for oauth_cb in ConnectionConfig by @mkp-jansen in #680

Fixes

Re-raise the exceptions from the platform API by @daniil-quix in #686
mypy: make quixstreams.platforms.* pass type checks by @quentin-quix in #678
BigQuery Sink: fix bug around dataset and table ids by @tim-quix in #691

Docs

Cleanup Examples and Tutorials by @tim-quix in #675
Rename docs files by @daniil-quix in #674
mypy: make quixstreams.models.* pass type checks by @quentin-quix in #673
fix broken doc refs by @tim-quix in #677

New Contributors

@mkp-jansen made their first contribution in #680

Full Changelog: v3.4.0...v3.5.0

Contributors

daniil-quix, mkp-jansen, and tim-quix

Assets 2

04 Dec 15:39

daniil-quix

v3.4.0

01de03e

v3.4.0

What's Changed

Breaking changes💥

Prefix topic names with `source__` for auto-generated source topics

By default, each Source provides a default topic by implementing the default_topic() method.
⚠️Since v3.4.0, the names of default topics are always prefixed with "source__" for better visibility across other topics in the cluster.
This doesn't apply when the topic is passed explicitly via app.dataframe(source, topic) or app.add_source(source, topic).

After upgrading to 3.4.0, the existing Sources using default topics will look for the topic with the new name on restart and create it if
doesn't exist.
To keep using the existing topics, pass the pre-configured Topic instance with the existing name and serialization config:

from quixstreams import Application

app = Application(...)
# Configure the topic instance to use it together with the Source
topic = app.topic("<existing topic name>", value_serializer=..., value_deserializer=..., key_serializer=..., key_deserializer=...)
source = SomeSource(...)

# To run Sources together with a StreamingDataFrame:
sdf = app.dataframe(source=source, topic=topic)

# or for running Sources stand-alone:
app.add_source(source=source, topic=topic)

by @daniil-quix in #651 #662

Features 🌱

Amazon Kinesis Sink by @gwaramadze in #642 #649
Amazon Kinesis Source by @tim-quix in #646
Amazon S3 Sink by @gwaramadze in #654
Amazon S3 Source by @tim-quix in #653
PostgreSQL Sink by @tomas-quix in #641
Redis Sink by @daniil-quix in #655
Stateful sources API implementation by @quentin-quix in #615 #631

Improvements 💎

On app.stop(), commit checkpoint before closing the consumer by @daniil-quix in #638
Trigger AdminClient.poll on initialization by @daniil-quix in #661

Docs 📄

Remove the list of supported connectors from the Connectors docs. by @daniil-quix in #664

Other

CI: Implement mypy pre-commit check by @quentin-quix in #643
Update pydantic requirement from <2.10,>=2.7 to >=2.7,<2.11 by @dependabot in #652
mypy: make quixstreams.state.* pass type checks by @quentin-quix in #657

Full Changelog: v3.3.0...v3.4.0

Contributors

gwaramadze, dependabot, and 3 other contributors

Assets 2

19 Nov 11:22

daniil-quix

v3.3.0

9494d8d

v3.3.0

What's Changed

New Connectors for Google Cloud

In this release, 3 new connectors have been added:

Google Cloud Pub/Sub Source by @tim-quix in #622
Google Cloud Pub/Sub Sink by @gwaramadze in #616 , #626
Google Cloud BigQuery Sink by @daniil-quix in #621, #627

To learn more about them, see the respective docs pages.

Other updates

Conda drop Python 3.8 support by @gwaramadze in #629
Remove connectors docs from the nav by @daniil-quix in #630
Update Documentation by @github-actions in #617
Update connectors docs by @daniil-quix in #625

Full Changelog: v3.2.1...v3.3.0

Contributors

gwaramadze, daniil-quix, and tim-quix

Assets 2

08 Nov 15:21

daniil-quix

v3.2.1

308c197

v3.2.1

What's Changed

This is a bugfix release downgrading confluent-kafka to 2.4.0 because of the authentication issue introduced in 2.6.0.

Full Changelog: v3.2.0...v3.2.1

Assets 2

07 Nov 12:51

daniil-quix

v3.2.0

ec72f97

v3.2.0

What's Changed

[new] Sliding Windows

Sliding windows are overlapping time-based windows that advance with each incoming message rather than at fixed intervals like hopping windows.
They have a fixed 1 ms resolution, perform better, and are less resource-intensive than hopping windows with a 1 ms step.
Read more in Sliding Windows docs.

PR by @gwaramadze - #515

[new] FileSink and FileSource connectors

FileSink allows to batches of data to files on disk in JSON and Parquet formats.

FileSource enables processing data streams from JSON or Parquet files.
The resulting messages can be produced in "replay" mode, where the time between record producing is matched as close as possible to the original.

Learn more on File Sink and FileSource pages.

PRs:

local file sink by @tomas-quix in #560
local file source by @tim-quix in #601

[upd] Updated time tracking in windowed aggregations

In previous versions, Windowed aggregations were tracking time in the streams per topic-partition, but kept expiring them per keys.
It was not a fully consistent behavior, and it also created problems when processing data from misaligned producers.

For example, IoT and other physical devices may produce data at certain frequency, which results in misaligned data streams within one topic-partition, and more data is considered "late" and dropped from the processing.

To make the processing of such data more complete, Quix Streams now tracks event time per each message key in the windows.

PRs:

#591 by @daniil-quix
#607 by @daniil-quix

[upd] Updated CSVSource

Some breaking changes were made to CSVSource to make it easier to use:

It now accepts CSV files in arbitrary formats and produces each row as a message value, making it less opinionated about the data format.
It now requires the name to be passed directly. Previously, it was using the file name as a name of the source.
Message keys and timestamps can be extracted from the rows via key_extractor and timestamp_extractor params
Removed params key_serializer and value_serializer

PR by @daniil-quix in #602

Bug fixes

Fix invalid mapping for oauth_cb in BaseSettings by @daniil-quix in #606

Dependencies

Update confluent-kafka requirement from <2.5,>=2.2 to >=2.6,<2.7 by @dependabot in #578

Docs

Update README by @gwaramadze in #604
Update sinks.md by @SteveRosam in #610

Full Changelog: v3.1.1...v3.2.0

Contributors

gwaramadze, dependabot, and 4 other contributors

Assets 2

Releases: quixio/quix-streams

v3.8.1

What's Changed

Contributors

Uh oh!

v3.8.0

What's Changed

💎 Count-based windows

💎 New Connectors

💎 A callback to react to late messages in Windows

🦠 Bugfixes

Other Changes

Contributors

Uh oh!

v3.7.0

What's Changed

[NEW] 💎 Collection-based windowed aggregations

Contributors

Uh oh!

v3.6.1

What's Changed

⚠️ Fix the bug when creating a changelog topic set the cleanup.policy for the source topic to compact

Other changes

Contributors

Uh oh!

v3.6.0

What's Changed

Main Changes

⚠️ Switch to "range" assignor strategy from "cooperative-sticky"

Other Changes

Docs

Contributors

Uh oh!

v3.5.0

What's Changed

Features

Fixes

Docs

New Contributors

Contributors

Uh oh!

v3.4.0

What's Changed

Breaking changes💥

Prefix topic names with source__ for auto-generated source topics

Features 🌱

Improvements 💎

Docs 📄

Other

Contributors

Uh oh!

v3.3.0

What's Changed

New Connectors for Google Cloud

Other updates

Contributors

Uh oh!

v3.2.1

What's Changed

Uh oh!

v3.2.0

What's Changed

[new] Sliding Windows

[new] FileSink and FileSource connectors

[upd] Updated time tracking in windowed aggregations

[upd] Updated CSVSource

Bug fixes

Dependencies

Docs

Contributors

Uh oh!

⚠️ Fix the bug when creating a changelog topic set the `cleanup.policy` for the source topic to `compact`

⚠️ Switch to `"range"` assignor strategy from `"cooperative-sticky"`

Prefix topic names with `source__` for auto-generated source topics