Skip to content

Athena to_iceberg - allow to distinguish between columns that should be merged when matched and columns that should be merged when not matched #3176

@inbar-beehero

Description

@inbar-beehero

Is your feature request related to a problem? Please describe.
I want to add a 'created' column to all my tables, which should only be set when the row is 'NOT MATCHED' in a 'MERGE INTO' query. Right now the generated SQL uses the same df.columns for both cases ('WHEN MATCHED THEN' and 'WHEN NOT MATCHED THEN')

Describe the solution you'd like
Add optional arguments to the to_iceberg function to specify which columns should be used for 'match' / 'not matched', defaulting to None in which case the code will use all the dataframe columns. E.g:

match_update_columns: list[str] | None,
no_match_insert_columns: list[str] | None,

And then something along the lines of:

_match_update_columns = match_update_columns or df.columns
_no_match_insert_columns = no_match_insert_columns or df.column
sql_statement = f"""
                MERGE INTO "{database}"."{table}" target
                USING "{database}"."{temp_table}" source
                ON {' AND '.join([f'target."{x}" = source."{x}"' for x in merge_cols])}
                WHEN MATCHED THEN
                    UPDATE SET {', '.join([f'"{x}" = source."{x}"' for x in df.columns if x in _match_update_columns ])}
                WHEN NOT MATCHED THEN
                    INSERT ({', '.join([f'"{x}"' for x in df.columns if x in _no_match_insert_columns ])})
                    VALUES ({', '.join([f'source."{x}"' for x in df.columns if x in _no_match_insert_columns ])})

Describe alternatives you've considered
No current alternative exists, except not using the package

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions