-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
The core issue: Cube ignores the explicit join path specified in views when resolving dimension dependencies.
When a view explicitly specifies a join path (like A.B.C), Cube correctly uses that path initially. However, when resolving dimension dependencies, Cube loses this context and searches for ALL possible paths from the root to reach required tables, instead of staying within the specified path.
This causes two issues:
- Duplicate table aliases - When multiple paths exist, Cube joins the same table multiple times with identical aliases (causing an error in the generated SQL)
- False loop detection - Cube sees these multiple valid paths as circular dependencies / an "infinite loop" (refusing to generate any SQL)
The bug appears to be in the dependency resolution phase, where the original join path context isn’t preserved when recursively resolving what dimensions need.
To Reproduce
- Define the following Cube schema:
cubes:
# Base table A - will be joined multiple times
- name: table_a
sql: |
(
SELECT 'a1' AS id, 'x' AS value
UNION ALL
SELECT 'a2' AS id, 'y' AS value
)
public: false
dimensions:
- name: id
sql: "{CUBE}.id"
type: string
primary_key: true
- name: value
sql: "{CUBE}.value"
type: string
# Base table B - acts as intermediate dimension
- name: table_b
sql: |
(
SELECT 'b1' AS id, 'a1' AS a_id
UNION ALL
SELECT 'b2' AS id, 'a2' AS a_id
)
public: false
dimensions:
- name: id
sql: "{CUBE}.id"
type: string
primary_key: true
- name: a_id
sql: "{CUBE}.a_id"
type: string
# Table C - used for subquery dimension
- name: table_c
sql: |
(
SELECT 'a1' AS a_id, DATE '2023-01-01' AS date_value
UNION ALL
SELECT 'a2' AS a_id, DATE '2023-02-01' AS date_value
)
public: false
dimensions:
- name: a_id
sql: "{CUBE}.a_id"
type: string
primary_key: true
- name: date_value
sql: "{CUBE}.date_value"
type: time
measures:
- name: min_date
sql: "{CUBE}.date_value"
type: min
# Table D - used for comparison in dimensions
- name: table_d
sql: |
(
SELECT 'a1' AS a_id, DATE '2023-01-15' AS comparison_date
UNION ALL
SELECT 'a2' AS a_id, DATE '2023-02-15' AS comparison_date
)
public: false
dimensions:
- name: a_id
sql: "{CUBE}.a_id"
type: string
primary_key: true
- name: comparison_date
sql: "{CUBE}.comparison_date"
type: time
# Extended dimension with joins
- name: dimension_a
extends: table_a
joins:
- name: table_c
relationship: one_to_one
sql: "{CUBE}.id = {table_c}.a_id"
- name: table_d
relationship: one_to_one
sql: "{CUBE}.id = {table_d}.a_id"
dimensions:
# Subquery dimension - works correctly when queried directly
- name: subquery_date
sql: "{table_c.min_date}"
type: time
sub_query: true
# Dimension that references subquery - causes duplicate aliases
- name: comparison_result
sql: |
CASE
WHEN {table_d.comparison_date} >= {CUBE.subquery_date}
THEN 'after'
ELSE 'before'
END
type: string
# Alternative without subquery reference - triggers loop detection
- name: comparison_result_hardcoded
sql: |
CASE
WHEN {table_d.comparison_date} >= DATE '2023-01-01'
THEN 'after'
ELSE 'before'
END
type: string
# Intermediate dimension with join to dimension_a
- name: dimension_b
extends: table_b
joins:
- name: dimension_a
relationship: many_to_one
sql: "{CUBE}.a_id = {dimension_a}.id"
# Role-specific dimensions using extends
- name: dimension_b_role1
extends: dimension_b
- name: dimension_b_role2
extends: dimension_b
# Fact table with multiple join paths
- name: fact_table
sql: |
(
SELECT
DATE '2023-03-01' AS date_dim,
'b1' AS b1_ref,
'b2' AS b2_ref,
'a1' AS direct_a_ref,
100 AS measure_value
)
public: false
joins:
# Join to role-specific dimensions
- name: dimension_b_role1
relationship: many_to_one
sql: "{CUBE}.b1_ref = {dimension_b_role1}.id"
- name: dimension_b_role2
relationship: many_to_one
sql: "{CUBE}.b2_ref = {dimension_b_role2}.id"
# Direct join to table_c - creates multiple paths
- name: table_c
relationship: many_to_one
sql: "{CUBE}.direct_a_ref = {table_c}.a_id"
dimensions:
- name: date_dim
sql: "{CUBE}.date_dim"
type: time
primary_key: true
- name: b1_ref
sql: "{CUBE}.b1_ref"
type: string
primary_key: true
- name: b2_ref
sql: "{CUBE}.b2_ref"
type: string
primary_key: true
measures:
- name: test_measure
sql: "{table_c.a_id}"
type: count_distinct
filters:
- sql: "{CUBE}.measure_value > 0"
# View definition - explicitly defines the join path
views:
- name: test_view
cubes:
- join_path: fact_table
includes:
- name: date_dim
- name: test_measure
# NOTE: This explicitly says to use dimension_b_role2, NOT dimension_b_role1
- join_path: fact_table.dimension_b_role2.dimension_a
includes:
- name: comparison_result
- name: comparison_result_hardcoded
- name: subquery_date
- Execute queries to reproduce both issues:
Issue 1: Duplicate Table Aliases
Query with dimension that references a subquery dimension:
select date_dim, comparison_result, measure(test_measure) from test_view group by 1, 2
Translated SQL result: https://pastebin.com/HqRycQ18. "line 127:52: Column 'dimension_a.id' is ambiguous"
Issue 2: False Loop Detection
Query with dimension using hardcoded date:
select date_dim, comparison_result_hardcoded, measure(test_measure) from test_view group by 1, 2
Working Case (for comparison)
Query with just the subquery dimension:
select date_dim, subquery_date, measure(test_measure) from test_view group by 1, 2
Actual Results
The Fundamental Problem
The view explicitly specifies: fact_table.dimension_b_role2.dimension_a
But Cube ignores this and activates BOTH:
fact_table → dimension_b_role1 → dimension_a
fact_table → dimension_b_role2 → dimension_a
Issue 1: Duplicate Table Aliases
The query produces an SQL error: column 'dimension_a.id' is ambiguous
Generated SQL shows Cube joining through BOTH role1 and role2:
FROM fact_table
LEFT JOIN table_b AS dimension_b_role2 ON fact_table.b2_ref = dimension_b_role2.id
LEFT JOIN table_b AS dimension_b_role1 ON fact_table.b1_ref = dimension_b_role1.id -- WRONG! Not in view path!
LEFT JOIN table_c ON fact_table.direct_a_ref = table_c.a_id
LEFT JOIN table_a AS dimension_a ON dimension_b_role2.a_id = dimension_a.id -- Correct path
LEFT JOIN (subquery) AS dimension_a_subquery_date_subquery ON ...
LEFT JOIN table_a AS dimension_a ON dimension_b_role1.a_id = dimension_a.id -- WRONG! Duplicate alias!
LEFT JOIN (subquery) AS dimension_a_subquery_date_subquery ON ... -- WRONG! Also duplicate!
LEFT JOIN table_d ON dimension_a.id = table_d.a_id -- Now ambiguous!
Issue 2: False Loop Detection
The query fails with error:
Error: Can not construct joins for the query, potential loop detected:
fact_table->table_c->dimension_b_role2->dimension_a->dimension_b_role1->table_d
vs
fact_table->dimension_b_role2->dimension_b_role1->dimension_a->table_c->table_d
This shows Cube is trying to use both dimension_b_role1
and dimension_b_role2
even though the view only specifies dimension_b_role2
.
Working Case
When querying just subquery_date
, Cube correctly uses ONLY the specified path:
FROM fact_table
LEFT JOIN table_b AS dimension_b_role2 ON fact_table.b2_ref = dimension_b_role2.id -- Only role2!
LEFT JOIN table_a AS dimension_a ON dimension_b_role2.a_id = dimension_a.id
LEFT JOIN (subquery) AS dimension_a_subquery_date_subquery ON ...
LEFT JOIN table_c ON fact_table.direct_a_ref = table_c.a_id
Expected Behavior
Cube must respect the explicit join path specified in the view.
When the view says fact_table.dimension_b_role2.dimension_a
, Cube should:
- ONLY join through
dimension_b_role2
- NEVER activate
dimension_b_role1
unless explicitly referenced - Generate unambiguous SQL using only the specified path
Expected SQL for both cases:
FROM fact_table
LEFT JOIN table_b AS dimension_b_role2 ON fact_table.b2_ref = dimension_b_role2.id
LEFT JOIN table_a AS dimension_a ON dimension_b_role2.a_id = dimension_a.id
LEFT JOIN (subquery) AS dimension_a_subquery_date_subquery ON ...
LEFT JOIN table_d ON dimension_a.id = table_d.a_id
LEFT JOIN table_c ON fact_table.direct_a_ref = table_c.a_id
Root Cause
The bug: Cube's join resolution algorithm doesn't preserve the context of which extended cube variant is being used when resolving dimension dependencies.
When resolving comparison_result
in the context of fact_table.dimension_b_role2.dimension_a
:
- Cube sees that
comparison_result
needsdimension_a
and its dependencies - Instead of using the already-established path through
dimension_b_role2
- It searches for ALL possible ways to reach
dimension_a
fromfact_table
- It finds both
dimension_b_role1
anddimension_b_role2
can reachdimension_a
- It incorrectly activates BOTH paths
This is a flaw in how Cube tracks join context during dependency resolution.
Version
Cube.js v1.3.39