Skip to content

Conversation

@v-shahzad
Copy link
Contributor

Description

This change introduces a pre-flight validation step within the $import data pipeline to proactively detect and report database constraint violations on a per-resource basis.

Related issues

Addresses AB# 151709

Testing

For now it just contains sample change for one constraint to get approval for the approach

FHIR Team Checklist

  • Update the title of the PR to be succinct and less than 65 characters
  • Add a milestone to the PR for the sprint that it is merged (i.e. add S47)
  • Tag the PR with the type of update: Bug, Build, Dependencies, Enhancement, New-Feature or Documentation
  • Tag the PR with Open source, Azure API for FHIR (CosmosDB or common code) or Azure Healthcare APIs (SQL or common code) to specify where this change is intended to be released.
  • Tag the PR with Schema Version backward compatible or Schema Version backward incompatible or Schema Version unchanged if this adds or updates Sql script which is/is not backward compatible with the code.
  • When changing or adding behavior, if your code modifies the system design or changes design assumptions, please create and include an ADR.
  • CI is green before merge Build Status
  • Review squash-merge requirements

Semver Change (docs)

Patch|Skip|Feature|Breaking (reason)

@v-shahzad v-shahzad added this to the CY25Q3/2Wk07 milestone Jul 16, 2025
@v-shahzad v-shahzad self-assigned this Jul 16, 2025
@v-shahzad v-shahzad added the Open source This change is only relevant to the OSS code or release. label Jul 16, 2025
Copy link
Contributor

@SergeyGaluzo SergeyGaluzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding checks before calling merge adds extra load on app compute for each normal merge call. I suggest running this check only if merge fails with constraint violation. This way there should be zero effect on performance for case when there are not constraint errors.

I do not like having any code duplicated (C# constraint code and the database). Can we get away with reporting some details of SQL error and attributing it to the resource lines in the input file for the batch we processed? This might not be that precise, but we will not have any code duplication.

We also should abort import on first constraint error and should not spend app compute on analyzing all the input resources.

Please add a test for this functionality.

@v-shahzad
Copy link
Contributor Author

Hi @SergeyGaluzo , Thanks for the detailed suggestions. I’ll start implementing these changes accordingly.

@SergeyGaluzo
Copy link
Contributor

Hi @SergeyGaluzo , Thanks for the detailed suggestions. I’ll start implementing these changes accordingly.

@v-shahzad I would consider "no code duplication" approach if possible. This sounds very attractive,

@SergeyGaluzo
Copy link
Contributor

@v-shahzad Please remove any constraint validation from C#.

@v-shahzad v-shahzad added the Bug Bug bug bug. label Dec 5, 2025
@v-shahzad
Copy link
Contributor Author

Hi @SergeyGaluzo , I have added the constraint details to the first resource of the batch with the format

Error on batch with offset 0 rows from 1 to 3. Insert failed due to the constraint 'CHK_TokenSearchParam_CodeOverflow' on 'TokenSearchParam'. Details: (len([Code])=(256) OR [CodeOverflow] IS NULL).

Please suggest me if it matches your expectation or if you would like any changes.
Thank you

@SergeyGaluzo
Copy link
Contributor

SergeyGaluzo commented Dec 12, 2025

Hi @SergeyGaluzo , I have added the constraint details to the first resource of the batch with the format

Error on batch with offset 0 rows from 1 to 3. Insert failed due to the constraint 'CHK_TokenSearchParam_CodeOverflow' on 'TokenSearchParam'. Details: (len([Code])=(256) OR [CodeOverflow] IS NULL).

Please suggest me if it matches your expectation or if you would like any changes. Thank you

@v-shahzad This is much better.

  1. The problem I see in adding to the first resource in a batch is a potential confusion for the customer. They might think that error is on that specific resource. Do we need to add any resource level errors at all? Can't we just raise exception and not record errors in the import error log?
  2. We should avoid putting constraint and table names into the error message. In this case we know that it is a search parameter of type token, so something like this "Insert failed due to the constraint violation for token search parameter. Details: (len(Code]=(256) OR CodeOverflow IS NULL)". You can remove [] to simplify reading.

@v-shahzad
Copy link
Contributor Author

Hi @SergeyGaluzo , I have added the constraint details to the first resource of the batch with the format
Error on batch with offset 0 rows from 1 to 3. Insert failed due to the constraint 'CHK_TokenSearchParam_CodeOverflow' on 'TokenSearchParam'. Details: (len([Code])=(256) OR [CodeOverflow] IS NULL).
Please suggest me if it matches your expectation or if you would like any changes. Thank you

@v-shahzad This is much better.

  1. The problem I see in adding to the first resource in a batch is a potential confusion for the customer. They might think that error is on that specific resource. Do we need to add any resource level errors at all? Can't we just raise exception and not record errors in the import error log?
  2. We should avoid putting constraint and table names into the error message. In this case we know that it is a search parameter of type token, so something like this "Insert failed due to the constraint violation for token search parameter. Details: (len(Code]=(256) OR CodeOverflow IS NULL)". You can remove [] to simplify reading.

Thanks for the feedback, @SergeyGaluzo . I’ve made the suggested changes. Please let me know if any further updates are needed.

mimetype: application/x-microsoft.net.object.bytearray.base64
value : The object must be serialized into a byte array
value : The object must be serialized into a byte array
Copy link
Contributor

@SergeyGaluzo SergeyGaluzo Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure that only added lines are marked as changed in diff.

@SergeyGaluzo
Copy link
Contributor

SergeyGaluzo commented Dec 20, 2025

Please add import tests for all expected types of constraint violations.

@SergeyGaluzo
Copy link
Contributor

We do not plan to have foreign keys, please remove this section from SQL query.

@SergeyGaluzo
Copy link
Contributor

SergeyGaluzo commented Dec 20, 2025

Please evaluate whether violation of PK/Unique constraints are possible in import. If not, remove corresponding section from SQL query.

@v-shahzad
Copy link
Contributor Author

Hi @SergeyGaluzo , I’ve removed the checks for primary key and foreign key constraints since we’re only catching exceptions related to overflow constraints. I’ll create a separate user story and PR for the test cases.

@v-shahzad v-shahzad marked this pull request as ready for review December 22, 2025 18:52
@v-shahzad v-shahzad requested a review from a team as a code owner December 22, 2025 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Bug bug bug. Open source This change is only relevant to the OSS code or release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants