30 Oct 15:24

daniil-quix

fafc2f0

v3.1.1

What's Changed

Fixes

Fix topics management for apps connecting to the Quix brokers by @tim-quix in #594

Other

Bump pydantic-settings for Conda by @gwaramadze in #589
Create pre-commit hook that checks Conda requirements by @gwaramadze in #596
Turn on isort check in Ruff by @gwaramadze in #597
Update Documentation by @github-actions in #598

Full Changelog: v3.1.0...v3.1.1

Contributors

gwaramadze and tim-quix

Assets 2

22 Oct 18:07

daniil-quix

v3.1.0

5aedce7

v3.1.0

What's Changed

[NEW] Apache Iceberg sink

A new sink that writes batches of data to an Apache Iceberg table.

It serializes incoming data batches into Parquet format and appends them to the
Iceberg table, updating the table schema as necessary.

Currently, it supports Apache Iceberg hosted in AWS and AWS Glue data catalogs.

To learn more about the Iceberg sink, see the docs.

Added by @tomas-quix in #555

Docs

Update import paths in sources docs by @daniil-quix in #570
Fix missing imports in Windowing docs by @daniil-quix in #574
Update README.md by @mikerosam in #582
Iceberg sink docs by @daniil-quix in #586
Chore/docs updates by @daniil-quix in #577

Dependencies

Update pydantic-settings requirement from <2.6,>=2.3 to >=2.3,<2.7 by @dependabot in #583
Bump testcontainers from 4.8.1 to 4.8.2 by @dependabot in #579

Misc

Abstract away the state update cache by @daniil-quix in #576
Add Conda release script by @gwaramadze in #571
app: Add option to select store backend by @quentin-quix in #544
Refactor WindowedRocksDBPartitionTransaction.get_windows by @gwaramadze in #558

New Contributors

@tomas-quix made their first contribution in #555

Full Changelog: v3.0.0...v3.1.0

Contributors

gwaramadze, dependabot, and 3 other contributors

Assets 2

10 Oct 11:01

tim-quix

v3.0.0

63a1403

v3.0.0

Quix Streams v3.0.0

Why the "major" version bump (v2.X --> v3.0)?

Quix Streams v3.0 brings branching and multiple topic consumption support, which changed some functionality under the hood. We want users to be mindful when upgrading to v3.0.

❗ Potential breaking change ❗ - Dropping Python v3.8 support:

Python v3.8 reaches End of Life in October 2024, so we are equivalently dropping support for Python v3.8.

We currently support Python v3.9 through v3.12.

❗ Potential breaking change ❗ - keyword arguments only for `Application` :

While not really a functional change (and most people are doing this anyway), v3.0 is going to enforce all arguments for Application to be keyword arguments rather than positional, so be sure to check this during your upgrade!

Previously (v2.X):
app = Application("localhost:9092")

Now (v3.0):
app = Application(broker_address="localhost:9092")

❗ Potential "data-altering" change ❗ - changelog topic name adjustment for "named" windows:

This change is primarily for accommodating windowing with branching.

If you have a windowed operation where the name parameter was provided (ex: sdf.tumbling_window(name=<NAME>), that topic naming scheme has been changed, meaning a new topic will be created and the window will temporarily be inaccurate since it will start from scratch.

It's important to note that this change will not cause an exception to be raised, so be aware!!

❗ Existing Sources and Sinks have been moved ❗

To accommodate the new structure in Connectors, we moved existing Sinks and Source to new modules.
To use them, you need to update the import paths:

InfluxDB3Sink -> quixstreams.sinks.core.influxdb3.InfluxDB3Sink
CSVSink -> quixstreams.sinks.core.csv.CSVSink
KafkaReplicatorSource -> quixstreams.sources.core.kafka.KafkaReplicatorSource
CSVSource -> quixstreams.sources.core.csv.CSVSource
QuixEnvironmentSource -> quixstreams.sources.core.kafka.QuixEnvironmentSource

v3.0 General Backwards compatibility with v2.X

v3.0 should otherwise be fully backwards compatible with any code working with 2.X (assuming no other breaking changes between 2.X versions you upgraded from) and should produce the same results. However, pay close attention to your apps after upgrading, just in case!

To learn more about the specifics of the underlying StreamingDataFrameassignment pattern adjustments along with some additional supplemental clarifications, check out the new assignment rules docs section which also highlights the differences between v2.X to v3.0 (in short: always re-assign your SDFs and you'll be good).

❗ Potential Breaking Changes (summarized) ❗

Dropping Support for Python v3.8
Topic naming change for explicitly named StreamingDataFrame Window operations.
Enforcement of keyword argument usage only for Application
Removal of deprecated Application.Quix() (can just use Application now)
Moved Sinks and Sources

🌱 New Features 🌱

StreamingDataFrame Branching
Consuming multiple topics per Application ("multiple StreamingDataFrames")
Automatic StreamingDataFrame tracking (no arguments needed for Application.run())

1. `StreamingDataFrame` (`SDF`) Branching

Now SDF supports the ability to "branch" (or fork) them into multiple independent operations (no limits on amount).

Previously (v2.X), only linear operations were possible:

sdf
└── apply()
    └── apply()
        └── apply()
            └── apply()

But now (v3.0), things like this are possible:

sdf
└── apply()
    └── apply()
        ├── apply()
        │   └── apply()
        └── filter()  - (does following operations only to this filtered subset)
            ├── apply()
            ├── apply()
            └── apply()

Or, as an (unrelated) simple pseudo code-snippet form:

sdf_0 = app.dataframe().apply(func_a)
sdf_0 = sdf_0.apply(func_b)  # sdf_0 -> sdf_0: NOT a (new) branch
sdf_1 = sdf_0.apply(func_c)  # sdf_0 -> sdf_1: generates new branch off sdf_0
sdf_2 = sdf_0.apply(func_d)  # sdf_0 -> sdf_2: generates new branch off sdf_0

app.run()

What Branches enable:

Handle Multiple data formats/transformations in one Application
Conditional operations
- ex: producing to different topics based on different criteria
Consolidating Applications that originally spanned multiple due to previous limitations

Limitations of Branching:

Cannot filter or column assign using two different branches together at once (see docs for more info)
Copies data for each branch, which can have performance implications (but may be better compared to running another Application).

To learn more, check out the in-depth branching docs.

2. Multiple Topic Consumption (multiple `StreamingDataFrame`).

Applications now support consuming multiple topics by initializing multiple StreamingDataFrame (SDF) with an Application:

from quixstreams import Application

app = Application("localhost:9092")
input_topic_a = app.topic("input_a")
input_topic_b = app.topic("input_b")
output_topic = app.topic("output")

sdf_a = app.dataframe(input_topic_a)
sdf_a = sdf_a.apply(func_x).to_topic(output_topic)

sdf_b = app.dataframe(input_topic_b)
sdf_b.update(func_y).to_topic(output_topic)

app.run()

Each SDF can then do any operations you could normally perform, including branching (but each SDF should be treated like the others do not exist).

Also, note they run concurrently (1 consumer that's subscribed to multiple topics), NOT in parallel.

3. Automatic `StreamingDataFrame` tracking

As a result of branching and multiple SDFs, it was necessary to automate the tracking of SDFs, so now you no longer need to provide any SDF when doing Application.run():

Previously (v2.X):

app = Application("localhost:9092")
sdf = app.dataframe(topic)
app.run(sdf)

Now (v3.0):

app = Application("localhost:9092")
sdf = app.dataframe(topic)
app.run()

💎 Enhancements 💎

Extensive Documentation improvements and additions

🦠 Bugfixes 🦠

Fix issue with handling of Quix Cloud topics where topic was being created with the workspace ID appended twice.
Overlapping window names now properly print a message saying how to solve it.

Full Changelog: v2.11.1...v3.0.0

Assets 2

25 Sep 15:45

daniil-quix

v2.11.1

faf6f3f

v2.11.1

What's Changed

Fixes

Fix QuixEnvironmentSource behavior when streaming data from one Quix environment to another by @quentin-quix in #520
Fix consumers not fetching the data when connecting to the Quix broker by temporarily downgrading confluent-kafka to 2.4.0 by @daniil-quix in #522

Other changes

Update custom-sources.md by @mikerosam in #509
Add an example of custom websocket source by @daniil-quix in #505
Update README.md by @daniil-quix in #512, #514
Script to test Conda build by @gwaramadze in #493
Document writing custom source in a Jupyter Notebook by @quentin-quix in #518
Update Documentation by @github-actions in #521

Full Changelog: v2.11.0...v2.11.1

Contributors

gwaramadze, mikerosam, and daniil-quix

Assets 2

20 Sep 11:08

daniil-quix

v2.11.0

20bf7b5

v2.11.0

What's Changed

[New] Source API and built-in Sources

With the new Sources API, you can stream data from any data source to a Kafka topic and process it with Streaming DataFrames in the same application.

You can either use one of the built-in sources (e.g. KafkaReplicatorSource, CSVSource, QuixEnvironmentSource) or create a custom one.

To learn more about Sources, please see the Sources documentation

PRs: #420, #448, #490 , #494 , #495, #498, #506

Dependencies updates

Bump testcontainers from 4.8.0 to 4.8.1 by @dependabot in #492
Update pydantic requirement from <2.9,>=2.7 to >=2.7,<2.10 by @dependabot in #491
Update pydantic-settings requirement from <2.5,>=2.3 to >=2.3,<2.6 by @dependabot in #500

Documentation updates

Fix typo in FixedTimeWindowDefinition.reduce docstring by @gwaramadze in #501
Fix documentation link by @gwaramadze in #478
Update README.md by @daniil-quix in #479

Other changes

Add Conda configuration by @gwaramadze in #482
Optimize get_window_ranges function by @gwaramadze in #489

Full Changelog: v2.10.0...v2.11.0

Contributors

gwaramadze, dependabot, and daniil-quix

Assets 2

30 Aug 14:12

gwaramadze

v2.10.0

18f8733

v2.10.0

What's Changed

Schema Registry Support

Introduced Schema Registry support for JSONSchema, Avro, and Protobuf formats.

To learn how to use Schema Registry, please follow the docs on the Schema Registry page..

PRs: #447, #449, #451, #454, #458, #472, #476).

Dependencies updates

Support confluent-kafka versions 2.5.x by @gwaramadze in #459
Bump testcontainers from 4.5.1 to 4.8.0 by @dependabot in #462
Update pydantic requirement from <2.8,>=2.7 to >=2.7,<2.9 by @dependabot in #463
Update pydantic-settings requirement from <2.4,>=2.3 to >=2.3,<2.5 by @dependabot in #464
Update pre-commit requirement from <3.5,>=3.4 to >=3.4,<3.9 by @dependabot in #465
Update black requirement from <24.4,>=24.3.0 to >=24.3.0,<24.9 by @dependabot in #466

Documentation updates

fix(docs): minor correction in an example by @shrutimantri in #444
fix(docs): correcting the output showcased for word count with other minor corrections by @shrutimantri in #445
Update docs headers structure by @daniil-quix in #456

Other changes

Application config API by @quentin-quix in #470

Full Changelog: v2.9.0...v2.10.0

Contributors

gwaramadze, shrutimantri, and 2 other contributors

Assets 2

08 Aug 20:49

tim-quix

v2.9.0

9d397b2

v2.9.0

What's Changed

NEW: Optional installs (extras)

With this release, we have introduced optional requirements for various features. These requirements will be outlined alongside its given feature.

To install one, simply do pip install quixstreams[{extra}] (or a comma-separated list like extra1,extra2)

There is also an option to install all extras with extra=all (pip install quixstreams[all])

Features

More Message Serialization Options

Additional serialization options have been added:

JSON Schema (original plain JSON option still supported)
Avro (requires installed extra=avro)
Protobuf (requires installed extra=protobuf)

For more details on their usage, see the Serialization docs.

Sinks (beta)

NOTE: This feature is in beta; functionality may change at any time!

We have introduced a new Sink API/framework for sending data from Kafka to an external destination in a robust manner. It additionally includes a template/class for users to generate their own sink implementations!

We have also included two fully implemented sinks for use out of the box:

InfluxDB v3
CSV

Example usage with InfluxDB v3:

from quixstreams import Application
from quixstreams.sinks.influxdb3 import InfluxDB3Sink

app = Application(broker_address="localhost:9092")
topic = app.topic("numbers-topic")

# Initialize InfluxDB3Sink
influx_sink = InfluxDB3Sink(...params...)

sdf = app.dataframe(topic)
# Do some processing here ...
# Sink data to InfluxDB
sdf.sink(influx_sink)

For more details on their usage, see the Sinks docs

`commit_every` option for `Applications`

Applications can now commit every M consumed messages in addition to every N seconds (whichever occurs first for that checkpoint).

By default, it is 0, which means no limit (how it worked before introducing this setting).

For more details, see the Checkpoint docs

app = Application(commit_every=10000)

`errors` option for `StreamingDataFrame.drop()`

You can now ignore the default behavior of an exception being raised when the specified column(s) are missing with errors="ignore".

app = Application()
sdf = app.dataframe()
sdf = sdf.drop(["col_a", "col_b"], errors="ignore")

Enhancements

README updates
Various Documentation improvements

Changelog

Full Changelog: v2.8.1...v2.9.0

Assets 2

22 Jul 10:48

daniil-quix

v2.8.1

8f126ab

v2.8.1

What's Changed

Bugfixes

fix Topic.deserialize not using the correct value deserializer by @tim-quix in #413

Other changes

Update docs by @daniil-quix in #406
fix bad timestamp values in groupby tests by @tim-quix in #412

Full Changelog: v2.8.0...v2.8.1

Contributors

daniil-quix and tim-quix

Assets 2

18 Jul 11:01

daniil-quix

v2.8.0

8b20c0c

v2.8.0

What's Changed

`.update()` and `.to_topic()` now modify the `StreamingDataFrame` objects in-place.

In previous versions, methods like StreamingDataFrame.update() and StreamingDataFrame.to_topic() always returned a new SDF object.
We were getting feedback that this behavior is not always obvious, and it's easy to forget to re-assign the result of .update() or .to_topic() calls to the variable.
Now, both of these methods modify the SDF object in place and return itself, so the previous usage will still work:

sdf = app.dataframe(...)

# The SDF is now modified in place and it's not necessary to re-assign it to the variable
sdf.update(...)
sdf.to_topic(...)

# This code will keep working as before
sdf = sdf.update(...)
sdf = sdf.to_topic(...)
# or 
sdf = sdf.update(...).to_topic(...)

⚠️ Note: If there's an sdf.update()or sdf.to_topic() in the code not assigned back to the variable, it will now update the SDF instance.

New method `StreamingDataFrame.print()`

Users can now easily debug the value and metadata of the current record in the stream with StreamingDataFrame.print().
By default, it will print the current record value wrapped into a dictionary.
If called with metadata=True, it will also print the record's key, timestamp, and headers.

It's aimed to be a shortcut for the previous workaround with StreamingDataFrame.update(print).

Example:

sdf = app.dataframe(...)

# Print only the value of the current record
sdf.print()

>>> {'value': {'number': 163937}}
{'value': {'number': 163938}}


# Print value and metadata of the current record
sdf.print(metadata=True)

>>> { 'value': {'number': 12175},
  'key': b'd22d884a-eb88-44de-b22f-abfdc0b215f6',
  'timestamp': 1721129697926,
  'headers': [('header', b'123'), ('header', b'1234')]}

New method `StreamingDataFrame.drop()`

With StreamingDataFrame.drop(), users can easily drop the unnecessary keys from the dictionary-like values.

StreamingDataFrame.drop() mutates data in place by deleting the keys from the original dictionary.

sdf = app.dataframe(...)

sdf.drop('A')  # Drop a key "A" from the value assuming it's a dict

sdf.drop(['A', 'B'])  # Drop multiple keys from the value

Other changes

Fix doc formatting for processing guarantees by @stereosky in #396
Update doc reference to exactly once delivery guarantees by @stereosky in #397
Optimize tree() method of Stream class to reduce time complexity by @aparnadotk in #400
Consumer docstring cleanup by @tim-quix in #394

New Contributors

@aparnadotk made their first contribution in #400

Full Changelog: v2.7.0...v2.8.0

Contributors

stereosky, aparnadotk, and tim-quix

Assets 2

04 Jul 16:58

daniil-quix

v2.7.0

892535d

v2.7.0

Release 2.7.0

What's changed

[New] Support for Exactly-once processing guarantees using Kafka transactions.

With exactly-once processing guarantees enabled, each Kafka message is processed only one time and without duplicated outputs.

It is especially helpful when consistency of data in the output topics is crucial, and the downstream consumers don't handle duplicated data gracefully.

To learn more about the exactly-once processing and configuration, see the "Processing Guarantees" section here.

Other Improvements

Removed column_name parameter from the Deserializer class by @tim-quix in #392
Update quickstart code and copy to match by @stereosky in #389

Contributors

stereosky and tim-quix

Assets 2

Releases: quixio/quix-streams

v3.1.1

What's Changed

Fixes

Other

Contributors

Uh oh!

v3.1.0

What's Changed

[NEW] Apache Iceberg sink

Docs

Dependencies

Misc

New Contributors

Contributors

Uh oh!

v3.0.0

Quix Streams v3.0.0

Why the "major" version bump (v2.X --> v3.0)?

❗ Potential breaking change ❗ - Dropping Python v3.8 support:

❗ Potential breaking change ❗ - keyword arguments only for Application :

❗ Potential "data-altering" change ❗ - changelog topic name adjustment for "named" windows:

❗ Existing Sources and Sinks have been moved ❗

v3.0 General Backwards compatibility with v2.X

❗ Potential Breaking Changes (summarized) ❗

🌱 New Features 🌱

1. StreamingDataFrame (SDF) Branching

What Branches enable:

Limitations of Branching:

2. Multiple Topic Consumption (multiple StreamingDataFrame).

3. Automatic StreamingDataFrame tracking

💎 Enhancements 💎

🦠 Bugfixes 🦠

Uh oh!

v2.11.1

What's Changed

Fixes

Other changes

Contributors

Uh oh!

v2.11.0

What's Changed

[New] Source API and built-in Sources

Dependencies updates

Documentation updates

Other changes

Contributors

Uh oh!

v2.10.0

What's Changed

Schema Registry Support

Dependencies updates

Documentation updates

Other changes

Contributors

Uh oh!

v2.9.0

What's Changed

NEW: Optional installs (extras)

Features

More Message Serialization Options

Sinks (beta)

commit_every option for Applications

errors option for StreamingDataFrame.drop()

Enhancements

Changelog

Uh oh!

v2.8.1

What's Changed

Bugfixes

Other changes

Contributors

Uh oh!

v2.8.0

What's Changed

.update() and .to_topic() now modify the StreamingDataFrame objects in-place.

New method StreamingDataFrame.print()

New method StreamingDataFrame.drop()

Other changes

New Contributors

❗ Potential breaking change ❗ - keyword arguments only for `Application` :

1. `StreamingDataFrame` (`SDF`) Branching

2. Multiple Topic Consumption (multiple `StreamingDataFrame`).

3. Automatic `StreamingDataFrame` tracking

`commit_every` option for `Applications`

`errors` option for `StreamingDataFrame.drop()`

`.update()` and `.to_topic()` now modify the `StreamingDataFrame` objects in-place.

New method `StreamingDataFrame.print()`

New method `StreamingDataFrame.drop()`