Feat: Add Doris adapter #5033

xinge-ji · 2025-07-27T11:18:53Z

Engine Adapter Implementation
Doris Adapter: Implemented DorisEngineAdapter to support Doris-specific SQL behavior (sqlmesh/core/engine_adapter/doris.py)

Connection Configuration
Doris Connection Config: Added a new DorisConnectionConfig, including support for basic authentication and configurable HTTP port (sqlmesh/core/config/connection.py)

Documentation Updates
Doris Guide: Added a detailed guide for Doris integration, covering setup, model support, connection configuration, and limitations (docs/integrations/engines/doris.md)

erindru · 2025-07-27T21:15:57Z

Hey @xinge-ji , let us know when you'd like us to review.

We typically ignore a PR still in 'Draft' state, and also PR's with failing tests and merge conflicts with main (with the assumption that the requester is still working on the PR)

xinge-ji · 2025-08-06T02:36:47Z

Hey @xinge-ji , let us know when you'd like us to review.

We typically ignore a PR still in 'Draft' state, and also PR's with failing tests and merge conflicts with main (with the assumption that the requester is still working on the PR)

There are still some CI tests failing at the moment, and it seems to be due to pymysql not being installed. Would it be possible to add it in the ci environment?

erindru · 2025-08-06T03:14:07Z

tests/core/engine_adapter/integration/test_integration_doris.py

+    IntegrationTestEngine,
+)
+
+pytestmark = [pytest.mark.doris, pytest.mark.engine, pytest.mark.slow]


I this pytestmark is why the "style_and_cicd_tests" task is failing - because it's causing the tests in this file to get included when they shouldnt be, they should only be included in the "engine_doris" task.

If you remove it, "style_and_cicd_tests" should stop failing on this test. See the other test_integration_<engine>.py files for an example

I this pytestmark is why the "style_and_cicd_tests" task is failing - because it's causing the tests in this file to get included when they shouldnt be, they should only be included in the "engine_doris" task.

If you remove it, "style_and_cicd_tests" should stop failing on this test. See the other test_integration_<engine>.py files for an example

Thank you! This is now ready for review.

erindru · 2025-08-06T03:15:29Z

Would it be possible to add it in the ci environment?

You can adjust the ci environment by editing .circleci/continue_config.yml but I think in this case it's caused by the doris-specific tests getting picked up by the general test task

erindru

Thanks for this very complete PR @xinge-ji , nice work!

I've completed a first pass reviewing it, let me know if anything seems off.

I've never used Doris before so i'm approaching it from the perspective of how the other engine adapters tend to work and the general concepts established within the SQLMesh codebase

erindru · 2025-08-07T18:02:43Z

docs/integrations/engines/doris.md

+
+## Table Properties
+
+The Doris adapter supports a comprehensive set of table properties that can be configured in the `physical_properties` section of your model. These properties are processed by the `_build_table_properties_exp` method.


_build_table_properties_exp is an internal method of EngineAdapter and doesnt really belong in user-facing docs

I've removed the reference to _build_table_properties_exp from the doc.

erindru · 2025-08-07T18:09:18Z

docs/integrations/engines/doris.md

+  kind INCREMENTAL_BY_TIME_RANGE(time_column (event_date, '%Y-%m-%d')),
+  partitioned_by event_date
+  physical_properties (
+    partitioned_by_expr = 'FROM ("2000-11-14") TO ("2099-11-14") INTERVAL 2 YEAR',


SQLMesh does support expressions in the existing partitioned_by property, eg:

partitioned_by date_trunc(event_timestamp)

or eg for Iceberg transforms on Athena/Trino:

partitioned_by (day(cola), truncate(colb, 8))

Is there a reason why we couldn't make this more ergonomic for Doris? eg something like:

MODEL ( name my_partitioned_model, ... partitioned_by RANGE(event_date), --or LIST(event_date) for list partitioning physical_properties ( partitions = ( `default` FROM ("2000-11-14") TO ("2099-11-14") INTERVAL 2 YEAR), `p2025` VALUES [("2025-01-01"), ("2026-01-01"))', `other` VALUES LESS THAN MAXVALUE ) )

to define the partition type (RANGE, LIST - etc) in partitioned_by and then the partitions themselves in physical_properties.

I appreciate this might require some tweaks to the parser to achieve, let me know what you think

I have adjusted the parser to support partitioned_by RANGE(event_date) / LIST(values) syntax and defined the partition ranges/bounds in physical_properties as partitions as you suggested.

erindru · 2025-08-08T01:42:59Z

sqlmesh/core/config/connection.py

@@ -50,6 +50,8 @@
    "trino",
    # Nullable types are problematic
    "clickhouse",
+    # Do not support table name starts with "_"


Wow, that's quite a random limitation :)

erindru · 2025-08-08T01:54:04Z

sqlmesh/core/engine_adapter/doris.py

+        df = self.fetchdf(query)
+
+        result = []
+        for row in df.itertuples(index=False, name=None):


self.fetchall() might be better here because it already returns a list of tuples. Something like:

for schema_name, table_name, table_type in self.fetchall(query): ... rest of logic ...

Thanks for the suggestion, I have updated the code.

erindru · 2025-08-08T02:06:20Z

sqlmesh/core/engine_adapter/doris.py

+                    for k, v in properties.items():
+                        v_value = v.this if isinstance(v, exp.Literal) else str(v)
+                        props.append(f"'{k}'='{v_value}'")
+                    doris_clauses.append(f"PROPERTIES ({', '.join(props)})")


This is getting into the realm of raw string concatenation which we generally try to avoid.

SQLGlot has exp.Properties for this, eg:

>>> exp.Properties(expressions=[ exp.Property(this="foo", value=exp.Literal.string("bar")), exp.Property(this="baz", value=exp.Literal.string("bing")), ]).sql(dialect="doris") "PROPERTIES ('foo'='bar', 'baz'='bing')"

I've refactored the implementation to avoid raw string concatenation

erindru · 2025-08-08T02:46:26Z

tests/core/engine_adapter/integration/__init__.py

@@ -493,6 +494,18 @@ def get_table_comment(
            """
        elif self.dialect == "clickhouse":
            query = f"SELECT name, comment FROM system.tables WHERE database = '{schema_name}' AND name = '{table_name}'"
+        elif self.dialect == "doris":
+            # Doris uses MySQL-compatible information_schema


If that's true, should doris be added to the elif self.dialect in ["mysql", "snowflake"] branch instead?

Good catch, I have moved it to the mysql branch

erindru · 2025-08-08T02:47:12Z

tests/core/engine_adapter/integration/__init__.py

@@ -588,6 +601,18 @@ def get_column_comments(
        elif self.dialect in ["spark", "databricks", "clickhouse"]:
            query = f"DESCRIBE TABLE {schema_name}.{table_name}"
            comment_index = 2 if self.dialect in ["spark", "databricks"] else 4
+        elif self.dialect == "doris":
+            # Doris uses MySQL-compatible information_schema


Same for this, can it be added to the MySQL branch?

I have moved it to the mysql branch

erindru · 2025-08-08T02:48:26Z

tests/core/engine_adapter/integration/docker/compose.doris.yaml

+    container_name: doris-fe-01
+    hostname: fe-01
+    environment:
+      - FE_SERVERS=fe1:172.20.80.2:9030


FMI: why do we need to hardcode IP addresses like this? Does Doris not support DNS lookup?

The official Docker example uses hardcoded IPs. When I was setting up the development environment, I attempted to replace the IPs with localhost for simplicity, but the internal container networking did not resolve it correctly, and the setup failed.

erindru · 2025-08-08T02:51:00Z

tests/core/engine_adapter/integration/test_integration.py

+        # Wait for the materialized view to be created
+        import time
+
+        time.sleep(1)


time.sleep()'s like this tend to lead to brittle tests. Are you able to use the tenacity module used elsewhere in the codebase instead? It can retry on a cadence and give up after a certain limit has passed

Sure, I switched to tenacity

erindru · 2025-08-08T02:53:31Z

tests/core/engine_adapter/integration/test_integration.py

+
+    # Convert dates based on ds_type
+    if ctx.dialect == "doris":
+        # For Doris with DATE type, use pandas date objects


Why is this necessary vs what this test was doing before?

The old test used hardcoded dates from 2022-01-01. In a Doris table with dynamic partitioning enabled, there's a default limit (max_dynamic_partition_num) of 500 partitions. If the date in the test is too far in the past (like 2022), it can fall outside the range of partitions that Doris automatically manages, causing the insert to fail.

…unique key model

xinge-ji marked this pull request as draft July 27, 2025 12:55

erindru reviewed Aug 6, 2025

View reviewed changes

xinge-ji marked this pull request as ready for review August 6, 2025 12:23

erindru requested changes Aug 8, 2025

View reviewed changes

xinge-ji force-pushed the doris-adapter branch 2 times, most recently from 05f42b9 to 380a969 Compare August 18, 2025 15:57

xinge-ji and others added 11 commits August 25, 2025 21:29

Feat(doris): Add support for Doris

ebee7de

Merge branch 'TobikoData:main' into doris

aaf3b87

Merge remote-tracking branch 'upstream/main' into doris

b03fab3

Refactor: Update Doris engine adapter

cc3090a

Merge branch 'TobikoData:main' into doris

b6f17d6

refactor partition

589713c

Merge branch 'doris' of https://github.com/xinge-ji/sqlmesh into doris

2b4ccc8

Merge branch 'TobikoData:main' into doris

22ebb0c

Merge branch 'TobikoData:main' into doris

2088603

Update Doris documentation and engine adapter

150eba2

Merge branch 'doris' of https://github.com/xinge-ji/sqlmesh into doris

870675d

xinge-ji force-pushed the doris-adapter branch from 81af28d to 870675d Compare September 10, 2025 12:17

xinge-ji added 2 commits September 11, 2025 10:53

ensure unique_key is included in physical properties for incremental …

96d5be2

…unique key model

make style

fdd9a8f


		## Table Properties

		The Doris adapter supports a comprehensive set of table properties that can be configured in the `physical_properties` section of your model. These properties are processed by the `_build_table_properties_exp` method.

Feat: Add Doris adapter #5033

Are you sure you want to change the base?

Feat: Add Doris adapter #5033

Uh oh!

Conversation

xinge-ji commented Jul 27, 2025

Uh oh!

erindru commented Jul 27, 2025

Uh oh!

xinge-ji commented Aug 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erindru commented Aug 6, 2025

Uh oh!

erindru left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!