Skip to content

Join Path Resolution Ignores Explicit View Paths #9852

@nandresen-stripe

Description

@nandresen-stripe

Describe the bug

The core issue: Cube ignores the explicit join path specified in views when resolving dimension dependencies.

When a view explicitly specifies a join path (like A.B.C), Cube correctly uses that path initially. However, when resolving dimension dependencies, Cube loses this context and searches for ALL possible paths from the root to reach required tables, instead of staying within the specified path.
This causes two issues:

  1. Duplicate table aliases - When multiple paths exist, Cube joins the same table multiple times with identical aliases (causing an error in the generated SQL)
  2. False loop detection - Cube sees these multiple valid paths as circular dependencies / an "infinite loop" (refusing to generate any SQL)

The bug appears to be in the dependency resolution phase, where the original join path context isn’t preserved when recursively resolving what dimensions need.

To Reproduce

  1. Define the following Cube schema:
cubes:
  # Base table A - will be joined multiple times
  - name: table_a
    sql: |
      (
        SELECT 'a1' AS id, 'x' AS value
        UNION ALL
        SELECT 'a2' AS id, 'y' AS value
      )
    public: false
    dimensions:
      - name: id
        sql: "{CUBE}.id"
        type: string
        primary_key: true
      - name: value
        sql: "{CUBE}.value"
        type: string

  # Base table B - acts as intermediate dimension
  - name: table_b
    sql: |
      (
        SELECT 'b1' AS id, 'a1' AS a_id
        UNION ALL
        SELECT 'b2' AS id, 'a2' AS a_id
      )
    public: false
    dimensions:
      - name: id
        sql: "{CUBE}.id"
        type: string
        primary_key: true
      - name: a_id
        sql: "{CUBE}.a_id"
        type: string

  # Table C - used for subquery dimension
  - name: table_c
    sql: |
      (
        SELECT 'a1' AS a_id, DATE '2023-01-01' AS date_value
        UNION ALL
        SELECT 'a2' AS a_id, DATE '2023-02-01' AS date_value
      )
    public: false
    dimensions:
      - name: a_id
        sql: "{CUBE}.a_id"
        type: string
        primary_key: true
      - name: date_value
        sql: "{CUBE}.date_value"
        type: time
    measures:
      - name: min_date
        sql: "{CUBE}.date_value"
        type: min

  # Table D - used for comparison in dimensions
  - name: table_d
    sql: |
      (
        SELECT 'a1' AS a_id, DATE '2023-01-15' AS comparison_date
        UNION ALL
        SELECT 'a2' AS a_id, DATE '2023-02-15' AS comparison_date
      )
    public: false
    dimensions:
      - name: a_id
        sql: "{CUBE}.a_id"
        type: string
        primary_key: true
      - name: comparison_date
        sql: "{CUBE}.comparison_date"
        type: time

  # Extended dimension with joins
  - name: dimension_a
    extends: table_a
    joins:
      - name: table_c
        relationship: one_to_one
        sql: "{CUBE}.id = {table_c}.a_id"
      - name: table_d
        relationship: one_to_one
        sql: "{CUBE}.id = {table_d}.a_id"
    dimensions:
      # Subquery dimension - works correctly when queried directly
      - name: subquery_date
        sql: "{table_c.min_date}"
        type: time
        sub_query: true
      # Dimension that references subquery - causes duplicate aliases
      - name: comparison_result
        sql: |
          CASE 
            WHEN {table_d.comparison_date} >= {CUBE.subquery_date}
            THEN 'after' 
            ELSE 'before' 
          END
        type: string
      # Alternative without subquery reference - triggers loop detection
      - name: comparison_result_hardcoded
        sql: |
          CASE 
            WHEN {table_d.comparison_date} >= DATE '2023-01-01'
            THEN 'after' 
            ELSE 'before' 
          END
        type: string

  # Intermediate dimension with join to dimension_a
  - name: dimension_b
    extends: table_b
    joins:
      - name: dimension_a
        relationship: many_to_one
        sql: "{CUBE}.a_id = {dimension_a}.id"

  # Role-specific dimensions using extends
  - name: dimension_b_role1
    extends: dimension_b
    
  - name: dimension_b_role2
    extends: dimension_b

  # Fact table with multiple join paths
  - name: fact_table
    sql: |
      (
        SELECT 
          DATE '2023-03-01' AS date_dim,
          'b1' AS b1_ref,
          'b2' AS b2_ref,
          'a1' AS direct_a_ref,
          100 AS measure_value
      )
    public: false
    joins:
      # Join to role-specific dimensions
      - name: dimension_b_role1
        relationship: many_to_one
        sql: "{CUBE}.b1_ref = {dimension_b_role1}.id"
      - name: dimension_b_role2
        relationship: many_to_one
        sql: "{CUBE}.b2_ref = {dimension_b_role2}.id"
      # Direct join to table_c - creates multiple paths
      - name: table_c
        relationship: many_to_one
        sql: "{CUBE}.direct_a_ref = {table_c}.a_id"
    dimensions:
      - name: date_dim
        sql: "{CUBE}.date_dim"
        type: time
        primary_key: true
      - name: b1_ref
        sql: "{CUBE}.b1_ref"
        type: string
        primary_key: true
      - name: b2_ref
        sql: "{CUBE}.b2_ref"
        type: string
        primary_key: true
    measures:
      - name: test_measure
        sql: "{table_c.a_id}"
        type: count_distinct
        filters:
          - sql: "{CUBE}.measure_value > 0"

# View definition - explicitly defines the join path
views:
  - name: test_view
    cubes:
      - join_path: fact_table
        includes:
          - name: date_dim
          - name: test_measure
      # NOTE: This explicitly says to use dimension_b_role2, NOT dimension_b_role1
      - join_path: fact_table.dimension_b_role2.dimension_a
        includes:
          - name: comparison_result
          - name: comparison_result_hardcoded
          - name: subquery_date
  1. Execute queries to reproduce both issues:

Issue 1: Duplicate Table Aliases

Query with dimension that references a subquery dimension:

select date_dim, comparison_result, measure(test_measure) from test_view group by 1, 2

Translated SQL result: https://pastebin.com/HqRycQ18. "line 127:52: Column 'dimension_a.id' is ambiguous"

Issue 2: False Loop Detection

Query with dimension using hardcoded date:

select date_dim, comparison_result_hardcoded, measure(test_measure) from test_view group by 1, 2

Working Case (for comparison)

Query with just the subquery dimension:

select date_dim, subquery_date, measure(test_measure) from test_view group by 1, 2

Actual Results

The Fundamental Problem

The view explicitly specifies: fact_table.dimension_b_role2.dimension_a

But Cube ignores this and activates BOTH:

  • fact_table → dimension_b_role1 → dimension_a
  • fact_table → dimension_b_role2 → dimension_a

Issue 1: Duplicate Table Aliases

The query produces an SQL error: column 'dimension_a.id' is ambiguous

Generated SQL shows Cube joining through BOTH role1 and role2:

FROM fact_table
  LEFT JOIN table_b AS dimension_b_role2 ON fact_table.b2_ref = dimension_b_role2.id
  LEFT JOIN table_b AS dimension_b_role1 ON fact_table.b1_ref = dimension_b_role1.id  -- WRONG! Not in view path!
  LEFT JOIN table_c ON fact_table.direct_a_ref = table_c.a_id
  LEFT JOIN table_a AS dimension_a ON dimension_b_role2.a_id = dimension_a.id         -- Correct path
  LEFT JOIN (subquery) AS dimension_a_subquery_date_subquery ON ...
  LEFT JOIN table_a AS dimension_a ON dimension_b_role1.a_id = dimension_a.id         -- WRONG! Duplicate alias!
  LEFT JOIN (subquery) AS dimension_a_subquery_date_subquery ON ...                   -- WRONG! Also duplicate!
  LEFT JOIN table_d ON dimension_a.id = table_d.a_id                                  -- Now ambiguous!

Issue 2: False Loop Detection

The query fails with error:

Error: Can not construct joins for the query, potential loop detected: 
fact_table->table_c->dimension_b_role2->dimension_a->dimension_b_role1->table_d 
vs 
fact_table->dimension_b_role2->dimension_b_role1->dimension_a->table_c->table_d

This shows Cube is trying to use both dimension_b_role1 and dimension_b_role2 even though the view only specifies dimension_b_role2.

Working Case

When querying just subquery_date, Cube correctly uses ONLY the specified path:

FROM fact_table
  LEFT JOIN table_b AS dimension_b_role2 ON fact_table.b2_ref = dimension_b_role2.id  -- Only role2!
  LEFT JOIN table_a AS dimension_a ON dimension_b_role2.a_id = dimension_a.id
  LEFT JOIN (subquery) AS dimension_a_subquery_date_subquery ON ...
  LEFT JOIN table_c ON fact_table.direct_a_ref = table_c.a_id

Expected Behavior

Cube must respect the explicit join path specified in the view.

When the view says fact_table.dimension_b_role2.dimension_a, Cube should:

  1. ONLY join through dimension_b_role2
  2. NEVER activate dimension_b_role1 unless explicitly referenced
  3. Generate unambiguous SQL using only the specified path

Expected SQL for both cases:

FROM fact_table
  LEFT JOIN table_b AS dimension_b_role2 ON fact_table.b2_ref = dimension_b_role2.id
  LEFT JOIN table_a AS dimension_a ON dimension_b_role2.a_id = dimension_a.id
  LEFT JOIN (subquery) AS dimension_a_subquery_date_subquery ON ...
  LEFT JOIN table_d ON dimension_a.id = table_d.a_id
  LEFT JOIN table_c ON fact_table.direct_a_ref = table_c.a_id

Root Cause

The bug: Cube's join resolution algorithm doesn't preserve the context of which extended cube variant is being used when resolving dimension dependencies.

When resolving comparison_result in the context of fact_table.dimension_b_role2.dimension_a:

  1. Cube sees that comparison_result needs dimension_a and its dependencies
  2. Instead of using the already-established path through dimension_b_role2
  3. It searches for ALL possible ways to reach dimension_a from fact_table
  4. It finds both dimension_b_role1 and dimension_b_role2 can reach dimension_a
  5. It incorrectly activates BOTH paths

This is a flaw in how Cube tracks join context during dependency resolution.

Version

Cube.js v1.3.39

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions