-
Notifications
You must be signed in to change notification settings - Fork 566
UN-2301 [FEAT] Capture connector error logs in ETL and display in frontend #1650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…Error-is-not-captured-and-added-in-logs-if-a-connector-stopped-working-suddenly
Summary by CodeRabbitRelease Notes
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughThe PR adds comprehensive logging and error handling enhancements across workflow management and connector processing. It introduces a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
workers/shared/workflow/source_connector.py (1)
189-225: Do not swallow source listing failuresBy catching every
Exceptionand returning[], we now hide fatal issues such as bad credentials or connector outages—the workflow will think “no files” and continue, so operators never see the failure despite the PR’s goal of surfacing errors. We need to re-raise after logging so the failure bubbles up and is captured by the worker log stream.- except Exception as e: - error_msg = f"Failed to list files from source connector directory '{input_directory}': {str(e)}" - if self.workflow_log: - self.workflow_log.log_error(logger, error_msg) - logger.error(error_msg) - return [] + except Exception as e: + error_msg = ( + f"Failed to list files from source connector directory " + f"'{input_directory}': {str(e)}" + ) + if self.workflow_log: + self.workflow_log.log_error(logger, error_msg) + logger.error(error_msg, exc_info=True) + raise
🧹 Nitpick comments (1)
backend/workflow_manager/endpoint_v2/destination.py (1)
169-178: Avoid double-logging the same database error
workflow_log.log_error(...)already publishes the websocket entry and callslogger.error. The additionallogger.error(error_msg)lines duplicate every failure message in the backend logs. Please drop the second call and rely onlog_error(or passexc_info=Truevia kwargs if you need the traceback).- self.workflow_log.log_error(logger, error_msg) - logger.error(error_msg) + self.workflow_log.log_error(logger, error_msg)- self.workflow_log.log_error(logger, error_msg) - logger.error(error_msg) + self.workflow_log.log_error(logger, error_msg)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (5)
backend/workflow_manager/endpoint_v2/destination.py(2 hunks)backend/workflow_manager/utils/workflow_log.py(2 hunks)backend/workflow_manager/workflow_v2/execution_log_utils.py(3 hunks)unstract/core/src/unstract/core/pubsub_helper.py(1 hunks)workers/shared/workflow/source_connector.py(3 hunks)
🧰 Additional context used
🪛 Ruff (0.14.3)
workers/shared/workflow/source_connector.py
132-132: Abstract raise to an inner function
(TRY301)
132-132: Create your own exception
(TRY002)
149-149: Consider moving this statement to an else block
(TRY300)
152-152: Use explicit conversion flag
Replace with conversion flag
(RUF010)
155-155: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
219-219: Consider moving this statement to an else block
(TRY300)
220-220: Do not catch blind exception: Exception
(BLE001)
221-221: Use explicit conversion flag
Replace with conversion flag
(RUF010)
224-224: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
242-242: Create your own exception
(TRY002)
249-249: Create your own exception
(TRY002)
backend/workflow_manager/endpoint_v2/destination.py
170-170: Use explicit conversion flag
Replace with conversion flag
(RUF010)
172-172: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
176-176: Use explicit conversion flag
Replace with conversion flag
(RUF010)
178-178: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (3)
unstract/core/src/unstract/core/pubsub_helper.py (1)
156-162: Error-level publish logging is helpfulThanks for surfacing ERROR payloads at error level while keeping everything else at debug; this will make troubleshooting failed connectors much easier without changing publish behavior.
backend/workflow_manager/utils/workflow_log.py (1)
20-22: Fallback channel keeps log routing robustThe staged fallback to
pipeline_idand thenexecution_idguarantees we never publish to aNonechannel, which should prevent silent log drops when the state store is empty.backend/workflow_manager/workflow_v2/execution_log_utils.py (1)
66-169: Richer log processing visibility looks greatPrioritizing ERROR entries, echoing each log with the right level, and emitting per-org summaries will make diagnosing connector failures far easier while keeping the storage path unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/workflow_manager/endpoint_v2/destination.py (1)
170-179: Uselogger.exceptionto preserve traceback contextRight now we log the formatted
error_msgwithlogger.errorand immediately re-raise. That gives the worker log the text but discards the traceback, which is exactly what we need when chasing connector failures. Switching these twologger.error(...)calls tologger.exception(...)will capture the stack automatically without changing control flow. This keeps the new workflow-log entry for the UI while still leaving full context in the server logs.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (2)
backend/workflow_manager/endpoint_v2/destination.py(2 hunks)backend/workflow_manager/endpoint_v2/source.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.3)
backend/workflow_manager/endpoint_v2/destination.py
171-171: Use explicit conversion flag
Replace with conversion flag
(RUF010)
173-173: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
177-177: Use explicit conversion flag
Replace with conversion flag
(RUF010)
179-179: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
…in-logs-if-a-connector-stopped-working-suddenly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (2)
backend/workflow_manager/endpoint_v2/destination.py(2 hunks)backend/workflow_manager/endpoint_v2/source.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.5)
backend/workflow_manager/endpoint_v2/destination.py
165-165: Use explicit conversion flag
Replace with conversion flag
(RUF010)
167-167: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
171-171: Use explicit conversion flag
Replace with conversion flag
(RUF010)
173-173: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (3)
backend/workflow_manager/endpoint_v2/source.py (1)
124-149: LGTM! Logging additions improve error visibility.The logging additions follow a consistent pattern: logging at entry, before each exception, and at successful completion. This will help with debugging and make connector validation failures visible in the frontend as intended by the PR objectives.
backend/workflow_manager/endpoint_v2/destination.py (2)
127-149: Validation logging looks good with a minor consideration.The logging additions follow the same pattern as in
source.pyand improve error visibility. The refactoredrequires_connectorcheck (lines 142-145) is cleaner than checking multiple conditions inline.Note: A previous review suggested considering DEBUG level for start/end logs to reduce noise. Monitor the log volume in practice and adjust if needed.
Based on past review comments.
176-178: LGTM! Completes the validation logging lifecycle.The success log at the end provides clear confirmation that validation completed without errors, consistent with the pattern in
source.py.
…Error-is-not-captured-and-added-in-logs-if-a-connector-stopped-working-suddenly
…in-logs-if-a-connector-stopped-working-suddenly
…a-connector-stopped-working-suddenly' of github.com:Zipstack/unstract into UN-2301-ETL-Error-is-not-captured-and-added-in-logs-if-a-connector-stopped-working-suddenly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
workers/shared/workflow/source_connector.py (2)
115-147: Consider consolidating the double logging pattern.The error handling is comprehensive, but lines 145-146 log the same error message twice - once through
logger_helper.log_error(which may be a no-op ifworkflow_logis None) and once throughlogger.error.Consider using
logger.exceptioninstead oflogger.errorto capture the full stack trace, which would be more helpful for debugging connector initialization failures.Apply this diff to improve error logging:
except Exception as e: error_msg = f"Failed to initialize source connector filesystem: {str(e)}" self.logger_helper.log_error(logger, error_msg) - logger.error(error_msg) + logger.exception(error_msg) raise
201-205: Consider usinglogger.exceptionfor better debugging.The error handling logs the failure but returns an empty list, which might hide connector issues from callers. Additionally, using
logger.exceptioninstead oflogger.errorwould capture the full stack trace, making it easier to diagnose connector problems.Apply this diff:
except Exception as e: error_msg = f"Failed to list files from source connector directory '{input_directory}': {str(e)}" self.logger_helper.log_error(logger, error_msg) - logger.error(error_msg) + logger.exception(error_msg) return []workers/shared/workflow/destination_connector.py (1)
932-937: Standardize conditional logging forcurrent_file_execution_idparameter tolog_file_info()calls.The inconsistency is confirmed: Lines 932-937, 1180-1185, 1194-1199, 1207-1212, and 1219-1224 guard
log_file_info()calls withhasattr(self, "current_file_execution_id"), while lines 502-506, 528-532, 551-555, 581-585, 616-620, 1568-1572, and 1648-1652 calllog_file_info()without guards.The root cause:
self.current_file_execution_idis an instance attribute set conditionally (lines 798, 975), whereasexec_ctx.file_execution_idis always available in those execution contexts. Sincelog_file_info()acceptsfile_execution_id: str | Noneand performs null checking internally, the hasattr guards skip logging when the attribute is missing, whereas the other calls always log (with potentially None values).Choose one approach and apply consistently:
- Option 1: Remove hasattr checks and ensure
self.current_file_execution_idis always initialized before these methods are called.- Option 2: Add hasattr checks to all
log_file_info()calls that useself.current_file_execution_idand document when logging is intentionally skipped.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (5)
backend/workflow_manager/endpoint_v2/destination.py(2 hunks)backend/workflow_manager/endpoint_v2/source.py(1 hunks)workers/shared/workflow/destination_connector.py(7 hunks)workers/shared/workflow/logger_helper.py(1 hunks)workers/shared/workflow/source_connector.py(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/workflow_manager/endpoint_v2/destination.py
- backend/workflow_manager/endpoint_v2/source.py
🧰 Additional context used
🪛 Ruff (0.14.5)
workers/shared/workflow/source_connector.py
128-128: Abstract raise to an inner function
(TRY301)
128-128: Create your own exception
(TRY002)
141-141: Consider moving this statement to an else block
(TRY300)
144-144: Use explicit conversion flag
Replace with conversion flag
(RUF010)
146-146: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
202-202: Use explicit conversion flag
Replace with conversion flag
(RUF010)
204-204: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
218-218: Create your own exception
(TRY002)
224-224: Create your own exception
(TRY002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (4)
workers/shared/workflow/logger_helper.py (1)
13-48: LGTM! Clean abstraction for optional workflow logging.The
WorkflowLoggerHelperclass effectively encapsulates the conditional logging pattern, eliminating repetitive checks throughout the codebase. The implementation is straightforward, well-documented, and follows good design principles.workers/shared/workflow/source_connector.py (2)
16-17: LGTM! Proper integration of the logger helper.The import and initialization of
WorkflowLoggerHelperare correctly placed, enabling safe logging operations throughout the connector class.Also applies to: 98-98
207-224: LGTM! Validation logic enhanced with proper logging.The validation method now logs errors before raising exceptions, which will help users understand connector configuration issues. The error messages are clear and contextual.
workers/shared/workflow/destination_connector.py (1)
200-201: LGTM! Consistent logger helper initialization.The logger helper is properly initialized in the constructor, following the same pattern as
source_connector.py.
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
|



What
Why
How
Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
Database Migrations
Env Config
Relevant Docs
Related Issues or PRs
Dependencies Versions
Notes on Testing
unstract-worker-loggingandfrontendScreenshots
Checklist
I have read and understood the Contribution Guidelines.