-
Notifications
You must be signed in to change notification settings - Fork 50
new ad create enhanced tool to validate and create multiple detectors #676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Amit Galitzky <[email protected]>
| if (index.startsWith(".")) { | ||
| throw new IllegalArgumentException("System indices not supported: " + index); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have better detection of System indices than the leading dot. Example usage.
| Set<String> result = new HashSet<>(); | ||
| for (Map.Entry<String, String> entry : fieldsToType.entrySet()) { | ||
| String value = entry.getValue(); | ||
| if (value.equals("date") || value.equals("date_nanos")) { | ||
| result.add(entry.getKey()); | ||
| } | ||
| } | ||
| return result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My knee-jerk reaction on seeing a loop that can be streamified:
| Set<String> result = new HashSet<>(); | |
| for (Map.Entry<String, String> entry : fieldsToType.entrySet()) { | |
| String value = entry.getValue(); | |
| if (value.equals("date") || value.equals("date_nanos")) { | |
| result.add(entry.getKey()); | |
| } | |
| } | |
| return result; | |
| Set<String> dateTypes = Set.of("date", "date_nanos"); | |
| return fieldsToType.entrySet().stream() | |
| .filter(e -> dateTypes.contains(e.getValue())) | |
| .map(Entry::getKey) | |
| .collect(Collectors.toSet()); |
But since you're just moving an existing method probably ok to leave as is.
| if (dataAsMap.containsKey("response")) { | ||
| return (String) dataAsMap.get("response"); | ||
| } else if (dataAsMap.containsKey("output")) { | ||
| // Parse Bedrock format: output.message.content[0].text | ||
| try { | ||
| Map<String, Object> output = (Map<String, Object>) dataAsMap.get("output"); | ||
| Map<String, Object> message = (Map<String, Object>) output.get("message"); | ||
| List<Map<String, Object>> content = (List<Map<String, Object>>) message.get("content"); | ||
| return (String) content.getFirst().get("text"); | ||
| } catch (Exception e) { | ||
| log.error("Failed to parse Bedrock response format", e); | ||
| return null; | ||
| } | ||
| } else if (dataAsMap.containsKey("content")) { | ||
| // Parse Claude format: content field as array | ||
| try { | ||
| List<Map<String, Object>> content = (List<Map<String, Object>>) dataAsMap.get("content"); | ||
| return (String) content.getFirst().get("text"); | ||
| } catch (Exception e) { | ||
| log.error("Failed to parse Claude content format", e); | ||
| return null; | ||
| } | ||
| } else { | ||
| log.error("Unknown response format. Available keys: {}", dataAsMap.keySet()); | ||
| return null; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hard-coded parsing and relying on exception handling seems like it could be improved. At least add a comment showing the expected format, so if this breaks in the future, the next maintainer (or future-you) can quickly identify the format change without walking through the code.
| ModelTensors tensors = output.getMlModelOutputs().get(0); | ||
| ModelTensor tensor = tensors.getMlModelTensors().get(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No exception handling for empty outputs/tensors.
| String fixPrompt = createFixPrompt(originalSuggestions, validationError); | ||
| String fullPrompt = contextBuilder.toString() + fixPrompt; | ||
|
|
||
| callLLM(fullPrompt, tenantId, ActionListener.wrap(fixedResponse -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of duplicated code between retryDetectorValidation() and retryModelValidation, consider a helper method for the common parts.
| private static final String EXTRACT_INFORMATION_REGEX = | ||
| "(?s).*\\{category_field=([^|]*)\\|aggregation_field=([^|]*)\\|aggregation_method=([^|]*)\\|interval=([^}]*)}.*"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a really strict regex that would fail on things like whitespace or the keys being in a different order (and cause retries which lose the benefit of the optimal date field.).
Given we're just extracting characters between known strings, it seems there's likely a better way of iterating between the | characters, splitting on =, and trimming the key/value into a map?
| } else { | ||
| log.error("Max detector validation retries reached: {}", errorMessage); | ||
| listener | ||
| .onFailure( | ||
| new RuntimeException( | ||
| "Detector validation failed after " + MAX_DETECTOR_VALIDATION_RETRIES + " retries: " + errorMessage | ||
| ) | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conditional (currentRetry >= MAX_DETECTOR_VALIDATION_RETRIES) should be a fast-fail at the top of the validateDetectorPhase method, otherwise you spend a lot of time validating the detector which, if validation succeeds, you suddenly can't use.
|
|
||
| if (response.getIssue() != null) { | ||
| String issueAspect = response.getIssue().getAspect().toString(); | ||
| boolean isBlockingError = issueAspect != null && issueAspect.toLowerCase(Locale.ROOT).startsWith("detector"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is starting with "detector" always blocking?
Is there some helper method somewhere which defines blocking errors that you can use?
| respondWithError(listener, detector.getIndices().get(0), "validation", errorMessage); | ||
| } else { | ||
| DetectorResult result = DetectorResult | ||
| .failedValidation(detector.getIndices().get(0), "Non-blocking warning: " + errorMessage); | ||
| listener.onResponse((T) result.toJson()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If detector.getIndices().get(0) throws an exception the listener will never be completed. Probably need some exception handling here.
| && (validationError.contains("240") | ||
| || validationError.contains("480") | ||
| || validationError.contains("960") | ||
| || validationError.contains("1440")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These seem like magic numbers. Looks like minutes in a day?
At a minimum needs a comment.
Perhaps move the strings into a constant:
// Time Series values from IntervalCalculation
private static final Set<String> KNOWN_HIGH_INTERVAL_STRINGS =
IntStream.of(4, 8, 12, 24)
.map(hours -> hours * 60)
.mapToObj(String::valueOf)
.collect(Collectors.toUnmodifiableSet());And then
boolean isUnreasonableInterval = validationError.contains("interval")
&& KNOWN_HIGH_INTERVAL_STRINGS.stream()
.anyMatch(validationError::contains);Alternately just parse the interval from the string and compare it >= 240.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, highlighting the fact that magic numbers are generally bad, the INTERVAL_LADDER I linked to includes 720 and not 960. :)
kaituo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial review
| private void getMappingsAndFilterFields(String indexName, ActionListener<MappingContext> listener) { | ||
| GetMappingsRequest getMappingsRequest = new GetMappingsRequest().indices(indexName); | ||
|
|
||
| client.admin().indices().getMappings(getMappingsRequest, ActionListener.wrap(response -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you support remote index?
| @@ -0,0 +1,4 @@ | |||
| { | |||
| "CLAUDE": "\n\nHuman: Analyze this index and suggest ONE anomaly detector for operational monitoring that generates clear, actionable alerts.\n\nIndex: ${indexInfo.indexName}\nMapping: ${indexInfo.indexMapping}\nAvailable date fields: ${dateFields}\n\nCORE PRINCIPLE:\nCreate detectors that will find anomalies on operational issues teams can immediately investigate and resolve.\nFocus on 1-2 RELATED fields that will be used for defining our features which are the important KPIs that we are looking for anomalies in.\n\nSTEP 1 - IDENTIFY MONITORING PRIORITY:\nOperational impact priority:\n\n1. Service Reliability: error_count, failed_requests, 5xx_status, exceptions, timeout_count, failures\n2. Performance Issues: response_time, latency, processing_time, duration, delay_minutes\n3. Resource Problems: cpu_usage, memory_usage, disk_usage, connection_count\n4. Traffic/Capacity: request_count, bytes_transferred, active_connections, queue_size, throughput\n5. Security Events: blocked_requests, authentication_failures, suspicious_activity\n6. Business Impact: revenue, conversion_rate, transaction_amount, operational_failures\n\nSTEP 2 - FEATURE SELECTION STRATEGY:\nDEFAULT TO 1 FEATURE unless multiple features provide complementary evidence of the SAME operational issue.\n\n✓ EXCELLENT 2-FEATURE COMBINATIONS (investigated together):\n\n* [error_count, timeout_count] → \"Service degradation: errors up 300%, timeouts up 150%\"\n* [response_time, error_rate] → \"Performance issue: response time 2x higher, error rate spiked\"\n* [cpu_usage, memory_usage] → \"Resource exhaustion: CPU at 90%, memory at 85%\"\n* [failed_requests, retry_count] → \"Service instability: failures up 400%, retries up 250%\"\n* [bytes_sent, bytes_received] → \"Network anomaly: traffic pattern changed significantly\"\n* [blocked_requests, failed_auth] → \"Security event: attack pattern detected\"\n* [request_count, error_count] → \"Service stress: high load with increasing failures\"\n\n✓ GOOD SINGLE FEATURES (clear, actionable alerts):\n\n* [error_count] → \"Error spike: 400% increase in errors\"\n* [response_time] → \"Performance issue: response time 3x normal\"\n* [bytes_transferred] → \"Traffic anomaly: data transfer volume unusual\"\n* [cpu_usage] → \"Resource issue: CPU utilization spiked\"\n\nSTEP 3 - AGGREGATION METHOD RULES:\nCRITICAL: OpenSearch Anomaly Detection supports ONLY these 5 aggregation methods:\n\n* avg() - Average value of numeric fields\n* sum() - Sum total of numeric fields\n* min() - Minimum value of numeric fields\n* max() - Maximum value of numeric fields\n* count() - Count of documents (works on any field type)\n\nField Type Constraints:\n\n* Numeric fields (long, integer, double, float): Can use avg, sum, min, max, count\n* Keyword fields: Can ONLY use count ('count' ONLY when meaningful - avoid mixed good/bad values)\n* NEVER use sum/avg/min/max on keyword fields - Will cause errors\n\nOperational Logic Rules:\n\n* Times/Durations/Delays: ALWAYS 'avg' (NEVER 'sum' - summing time is meaningless)\n* Errors/Failures/Counts: 'sum' for totals, 'avg' for rates/percentages\n* Bytes/Size fields: 'sum' for total volume (bytes, object_size, response.bytes)\n* Memory/Resource fields: 'avg' for percentages, 'max' for absolute values\n* Business metrics: 'avg' for per-transaction values, 'sum' for revenue totals\n* Keyword fields for traffic: 'count' (counts specific when there is an error or something specific like bad status codes, not just all status codes)\n\nWhen to AVOID 'count' on keyword fields:\n\n* Mixed success/error fields: status_code.keyword (contains 200, 404, 500 - becomes total traffic, not error detection vs error_code.keyword which would be count of docs with error)\n* High-cardinality descriptive: user_agent.keyword, message.keyword (not operational KPIs)\n* Static/descriptive fields: version.keyword, environment.keyword (don't change operationally)\n\nField Pattern Recognition:\n\n* *_count, *_errors, *_failures, *_requests → 'sum' (if numeric) OR 'count' (if keyword)\n* *_bytes, *_size, object_size → 'sum' (if numeric)\n* status_code, method, protocol → 'count' (if keyword)\n\nSTEP 4 - CATEGORY FIELD SELECTION:\nIMPORTANT: Category fields are OPTIONAL. If no field provides actionable segmentation, leave empty.\n\nCRITICAL CONSTRAINT: Category field MUST be a keyword or ip field type from the mapping above.\nCheck the field type - ONLY use fields marked as \"keyword\" or \"ip\".\nIf no keyword/ip fields exist, leave category_field empty.\n\nChoose ONE keyword field that provides actionable segmentation for operations teams:\n\n✓ EXCELLENT choices (actionable alerts):\n\n* service_name, endpoint, api_path → \"Error spike on /checkout endpoint\"\n* host, instance_id, server → \"CPU spike on web-server-01\"\n* region, datacenter, availability_zone → \"Network issues in us-west-2\"\n* status_code, error_type → \"500 errors spiking\"\n* method, protocol → \"POST requests failing\"\n\n✓ GOOD choices (moderate cardinality):\n\n* device_type, browser, user_agent for web analytics\n* database_name, table_name for DB monitoring\n* queue_name, topic for messaging systems\n* payment_method, transaction_type for financial monitoring\n\n✗ AVOID (too specific or not actionable for general monitoring):\n\n* Unique identifiers: transaction_id, session_id, request_id\n* High-cardinality user data: user_id, customer_id\n* Timestamp fields, hash fields, UUIDs\n* Fields ending in _key, _hash, _uuid\n\nCARDINALITY GUIDELINES:\n\n* Ideal: 5-50 unique values (most operational use cases)\n* Acceptable: 50-500 values (if actionable segmentation)\n* No category field: Perfectly fine if no field provides actionable insights\n\nSTEP 5 - DETECTION INTERVAL GUIDELINES:\nMatch interval to data frequency and operational needs:\n\n* Real-time systems: 10-15 minutes (APIs, web services, errors, response times)\n* Infrastructure monitoring: 15-30 minutes (servers, databases, resource usage)\n* Business processes: 30-60 minutes (transactions, conversions)\n* Security logs: 10-30 minutes (access logs, firewalls, authentication)\n* Batch/ETL processes: 60+ minutes (data pipeline monitoring)\n* Sparse data: 60+ minutes (avoid false positives from empty buckets)\n\nSTEP 6 - SPARSE DATA INTELLIGENCE:\nCRITICAL: If validation suggests intervals >120 minutes due to sparse data:\n\n* PREFERRED SOLUTION: Remove category field entirely (better no segmentation than 4+ hour intervals)\n* ALTERNATIVE: Choose lower-cardinality category field\n* NEVER ACCEPT: If intervals are over 24 hours then we should aim for lower interval with different categorical fields.\n* PRINCIPLE: Operational usefulness over perfect segmentation\n\nOUTPUT FORMAT:\nReturn ONLY this structured format, no explanation:\n{category_field=field_name_or_empty|aggregation_field=field1,field2|aggregation_method=method1,method2|interval=minutes}\n\nIf no suitable fields exist, use empty strings for aggregation_field and aggregation_method.\n\nAssistant:\"", | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
operational monitoring
the same operational issue
It may not be just operational like UBI data for business/marketing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixed success/error fields
What if users want to monitor total traffic instead of just errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g., user_agent.keyword, message.keyword
What if I want to monitor the message volume and the number of agents running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g., version.keyword, environment.keyword
We are not sure whether count is good or not until we know the meaning of environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenSearch Anomaly Detection supports ONLY these 5 aggregation methods:
We can add cardinality aggregation that is the unique count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggregation_field=field1,field2|aggregation_method=method1,method2
Should you be upfront on what happens for 1 feature (e.g., should it produce aggregation_method=mehtod1, or aggregation_method=method1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
category_field=field_name_or_empty
Should we tell LLM what happens when there is no categorical field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interval=minutes
Should we tell LLM what happens when there are no suitable interval found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If validation suggests intervals >120 minutes
Where do you put validation suggested interval in the prompt? I only saw that in the error message of fix prompt, not the initial default prompt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can immediately investigate and resolve
Can you define what is the criteria by which the team can investigate and resolve?
| } | ||
| contextBuilder.append("Available date fields: ").append(String.join(", ", mappingContext.dateFields)).append("\n\n"); | ||
|
|
||
| String fixPrompt = createFixPrompt(originalSuggestions, validationError); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the agent have memory what you have asked before? If yes, where is the previous conversation stored and how does agent decide which conversation to give to LLM?
| @@ -0,0 +1,13 @@ | |||
| { | |||
| "name": "Test_create_anomaly_detector_enhanced_flow_agent", | |||
| "type": "flow", | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The agent should be equipped with index search tool as well, right? If so, can you point me the code?
| @@ -0,0 +1,4 @@ | |||
| { | |||
| "CLAUDE": "\n\nHuman: Analyze this index and suggest ONE anomaly detector for operational monitoring that generates clear, actionable alerts.\n\nIndex: ${indexInfo.indexName}\nMapping: ${indexInfo.indexMapping}\nAvailable date fields: ${dateFields}\n\nCORE PRINCIPLE:\nCreate detectors that will find anomalies on operational issues teams can immediately investigate and resolve.\nFocus on 1-2 RELATED fields that will be used for defining our features which are the important KPIs that we are looking for anomalies in.\n\nSTEP 1 - IDENTIFY MONITORING PRIORITY:\nOperational impact priority:\n\n1. Service Reliability: error_count, failed_requests, 5xx_status, exceptions, timeout_count, failures\n2. Performance Issues: response_time, latency, processing_time, duration, delay_minutes\n3. Resource Problems: cpu_usage, memory_usage, disk_usage, connection_count\n4. Traffic/Capacity: request_count, bytes_transferred, active_connections, queue_size, throughput\n5. Security Events: blocked_requests, authentication_failures, suspicious_activity\n6. Business Impact: revenue, conversion_rate, transaction_amount, operational_failures\n\nSTEP 2 - FEATURE SELECTION STRATEGY:\nDEFAULT TO 1 FEATURE unless multiple features provide complementary evidence of the SAME operational issue.\n\n✓ EXCELLENT 2-FEATURE COMBINATIONS (investigated together):\n\n* [error_count, timeout_count] → \"Service degradation: errors up 300%, timeouts up 150%\"\n* [response_time, error_rate] → \"Performance issue: response time 2x higher, error rate spiked\"\n* [cpu_usage, memory_usage] → \"Resource exhaustion: CPU at 90%, memory at 85%\"\n* [failed_requests, retry_count] → \"Service instability: failures up 400%, retries up 250%\"\n* [bytes_sent, bytes_received] → \"Network anomaly: traffic pattern changed significantly\"\n* [blocked_requests, failed_auth] → \"Security event: attack pattern detected\"\n* [request_count, error_count] → \"Service stress: high load with increasing failures\"\n\n✓ GOOD SINGLE FEATURES (clear, actionable alerts):\n\n* [error_count] → \"Error spike: 400% increase in errors\"\n* [response_time] → \"Performance issue: response time 3x normal\"\n* [bytes_transferred] → \"Traffic anomaly: data transfer volume unusual\"\n* [cpu_usage] → \"Resource issue: CPU utilization spiked\"\n\nSTEP 3 - AGGREGATION METHOD RULES:\nCRITICAL: OpenSearch Anomaly Detection supports ONLY these 5 aggregation methods:\n\n* avg() - Average value of numeric fields\n* sum() - Sum total of numeric fields\n* min() - Minimum value of numeric fields\n* max() - Maximum value of numeric fields\n* count() - Count of documents (works on any field type)\n\nField Type Constraints:\n\n* Numeric fields (long, integer, double, float): Can use avg, sum, min, max, count\n* Keyword fields: Can ONLY use count ('count' ONLY when meaningful - avoid mixed good/bad values)\n* NEVER use sum/avg/min/max on keyword fields - Will cause errors\n\nOperational Logic Rules:\n\n* Times/Durations/Delays: ALWAYS 'avg' (NEVER 'sum' - summing time is meaningless)\n* Errors/Failures/Counts: 'sum' for totals, 'avg' for rates/percentages\n* Bytes/Size fields: 'sum' for total volume (bytes, object_size, response.bytes)\n* Memory/Resource fields: 'avg' for percentages, 'max' for absolute values\n* Business metrics: 'avg' for per-transaction values, 'sum' for revenue totals\n* Keyword fields for traffic: 'count' (counts specific when there is an error or something specific like bad status codes, not just all status codes)\n\nWhen to AVOID 'count' on keyword fields:\n\n* Mixed success/error fields: status_code.keyword (contains 200, 404, 500 - becomes total traffic, not error detection vs error_code.keyword which would be count of docs with error)\n* High-cardinality descriptive: user_agent.keyword, message.keyword (not operational KPIs)\n* Static/descriptive fields: version.keyword, environment.keyword (don't change operationally)\n\nField Pattern Recognition:\n\n* *_count, *_errors, *_failures, *_requests → 'sum' (if numeric) OR 'count' (if keyword)\n* *_bytes, *_size, object_size → 'sum' (if numeric)\n* status_code, method, protocol → 'count' (if keyword)\n\nSTEP 4 - CATEGORY FIELD SELECTION:\nIMPORTANT: Category fields are OPTIONAL. If no field provides actionable segmentation, leave empty.\n\nCRITICAL CONSTRAINT: Category field MUST be a keyword or ip field type from the mapping above.\nCheck the field type - ONLY use fields marked as \"keyword\" or \"ip\".\nIf no keyword/ip fields exist, leave category_field empty.\n\nChoose ONE keyword field that provides actionable segmentation for operations teams:\n\n✓ EXCELLENT choices (actionable alerts):\n\n* service_name, endpoint, api_path → \"Error spike on /checkout endpoint\"\n* host, instance_id, server → \"CPU spike on web-server-01\"\n* region, datacenter, availability_zone → \"Network issues in us-west-2\"\n* status_code, error_type → \"500 errors spiking\"\n* method, protocol → \"POST requests failing\"\n\n✓ GOOD choices (moderate cardinality):\n\n* device_type, browser, user_agent for web analytics\n* database_name, table_name for DB monitoring\n* queue_name, topic for messaging systems\n* payment_method, transaction_type for financial monitoring\n\n✗ AVOID (too specific or not actionable for general monitoring):\n\n* Unique identifiers: transaction_id, session_id, request_id\n* High-cardinality user data: user_id, customer_id\n* Timestamp fields, hash fields, UUIDs\n* Fields ending in _key, _hash, _uuid\n\nCARDINALITY GUIDELINES:\n\n* Ideal: 5-50 unique values (most operational use cases)\n* Acceptable: 50-500 values (if actionable segmentation)\n* No category field: Perfectly fine if no field provides actionable insights\n\nSTEP 5 - DETECTION INTERVAL GUIDELINES:\nMatch interval to data frequency and operational needs:\n\n* Real-time systems: 10-15 minutes (APIs, web services, errors, response times)\n* Infrastructure monitoring: 15-30 minutes (servers, databases, resource usage)\n* Business processes: 30-60 minutes (transactions, conversions)\n* Security logs: 10-30 minutes (access logs, firewalls, authentication)\n* Batch/ETL processes: 60+ minutes (data pipeline monitoring)\n* Sparse data: 60+ minutes (avoid false positives from empty buckets)\n\nSTEP 6 - SPARSE DATA INTELLIGENCE:\nCRITICAL: If validation suggests intervals >120 minutes due to sparse data:\n\n* PREFERRED SOLUTION: Remove category field entirely (better no segmentation than 4+ hour intervals)\n* ALTERNATIVE: Choose lower-cardinality category field\n* NEVER ACCEPT: If intervals are over 24 hours then we should aim for lower interval with different categorical fields.\n* PRINCIPLE: Operational usefulness over perfect segmentation\n\nOUTPUT FORMAT:\nReturn ONLY this structured format, no explanation:\n{category_field=field_name_or_empty|aggregation_field=field1,field2|aggregation_method=method1,method2|interval=minutes}\n\nIf no suitable fields exist, use empty strings for aggregation_field and aggregation_method.\n\nAssistant:\"", | |||
| "OPENAI": "Analyze this index and suggest ONE anomaly detector for operational monitoring that generates clear, actionable alerts.\n\nIndex: ${indexInfo.indexName}\nMapping: ${indexInfo.indexMapping}\nAvailable date fields: ${dateFields}\n\nCORE PRINCIPLE:\nCreate detectors that will find anomalies on operational issues teams can immediately investigate and resolve.\nFocus on 1-2 RELATED fields that will be used for defining our features which are the important KPIs that we are looking for anomalies in.\n\nSTEP 1 - IDENTIFY MONITORING PRIORITY:\nOperational impact priority:\n\n1. Service Reliability: error_count, failed_requests, 5xx_status, exceptions, timeout_count, failures\n2. Performance Issues: response_time, latency, processing_time, duration, delay_minutes\n3. Resource Problems: cpu_usage, memory_usage, disk_usage, connection_count\n4. Traffic/Capacity: request_count, bytes_transferred, active_connections, queue_size, throughput\n5. Security Events: blocked_requests, authentication_failures, suspicious_activity\n6. Business Impact: revenue, conversion_rate, transaction_amount, operational_failures\n\nSTEP 2 - FEATURE SELECTION STRATEGY:\nDEFAULT TO 1 FEATURE unless multiple features provide complementary evidence of the SAME operational issue.\n\n✓ EXCELLENT 2-FEATURE COMBINATIONS (investigated together):\n\n* [error_count, timeout_count] → \"Service degradation: errors up 300%, timeouts up 150%\"\n* [response_time, error_rate] → \"Performance issue: response time 2x higher, error rate spiked\"\n* [cpu_usage, memory_usage] → \"Resource exhaustion: CPU at 90%, memory at 85%\"\n* [failed_requests, retry_count] → \"Service instability: failures up 400%, retries up 250%\"\n* [bytes_sent, bytes_received] → \"Network anomaly: traffic pattern changed significantly\"\n* [blocked_requests, failed_auth] → \"Security event: attack pattern detected\"\n* [request_count, error_count] → \"Service stress: high load with increasing failures\"\n\n✓ GOOD SINGLE FEATURES (clear, actionable alerts):\n\n* [error_count] → \"Error spike: 400% increase in errors\"\n* [response_time] → \"Performance issue: response time 3x normal\"\n* [bytes_transferred] → \"Traffic anomaly: data transfer volume unusual\"\n* [cpu_usage] → \"Resource issue: CPU utilization spiked\"\n\nSTEP 3 - AGGREGATION METHOD RULES:\nCRITICAL: OpenSearch Anomaly Detection supports ONLY these 5 aggregation methods:\n\n* avg() - Average value of numeric fields\n* sum() - Sum total of numeric fields\n* min() - Minimum value of numeric fields\n* max() - Maximum value of numeric fields\n* count() - Count of documents (works on any field type)\n\nField Type Constraints:\n\n* Numeric fields (long, integer, double, float): Can use avg, sum, min, max, count\n* Keyword fields: Can ONLY use count ('count' ONLY when meaningful - avoid mixed good/bad values)\n* NEVER use sum/avg/min/max on keyword fields - Will cause errors\n\nOperational Logic Rules:\n\n* Times/Durations/Delays: ALWAYS 'avg' (NEVER 'sum' - summing time is meaningless)\n* Errors/Failures/Counts: 'sum' for totals, 'avg' for rates/percentages\n* Bytes/Size fields: 'sum' for total volume (bytes, object_size, response.bytes)\n* Memory/Resource fields: 'avg' for percentages, 'max' for absolute values\n* Business metrics: 'avg' for per-transaction values, 'sum' for revenue totals\n* Keyword fields for traffic: 'count' (counts specific when there is an error or something specific like bad status codes, not just all status codes)\n\nWhen to AVOID 'count' on keyword fields:\n\n* Mixed success/error fields: status_code.keyword (contains 200, 404, 500 - becomes total traffic, not error detection vs error_code.keyword which would be count of docs with error)\n* High-cardinality descriptive: user_agent.keyword, message.keyword (not operational KPIs)\n* Static/descriptive fields: version.keyword, environment.keyword (don't change operationally)\n\nField Pattern Recognition:\n\n* *_count, *_errors, *_failures, *_requests → 'sum' (if numeric) OR 'count' (if keyword)\n* *_bytes, *_size, object_size → 'sum' (if numeric)\n* status_code, method, protocol → 'count' (if keyword)\n\nSTEP 4 - CATEGORY FIELD SELECTION:\nIMPORTANT: Category fields are OPTIONAL. If no field provides actionable segmentation, leave empty.\n\nCRITICAL CONSTRAINT: Category field MUST be a keyword or ip field type from the mapping above.\nCheck the field type - ONLY use fields marked as \"keyword\" or \"ip\".\nIf no keyword/ip fields exist, leave category_field empty.\n\nChoose ONE keyword field that provides actionable segmentation for operations teams:\n\n✓ EXCELLENT choices (actionable alerts):\n\n* service_name, endpoint, api_path → \"Error spike on /checkout endpoint\"\n* host, instance_id, server → \"CPU spike on web-server-01\"\n* region, datacenter, availability_zone → \"Network issues in us-west-2\"\n* status_code, error_type → \"500 errors spiking\"\n* method, protocol → \"POST requests failing\"\n\n✓ GOOD choices (moderate cardinality):\n\n* device_type, browser, user_agent for web analytics\n* database_name, table_name for DB monitoring\n* queue_name, topic for messaging systems\n* payment_method, transaction_type for financial monitoring\n\n✗ AVOID (too specific or not actionable for general monitoring):\n\n* Unique identifiers: transaction_id, session_id, request_id\n* High-cardinality user data: user_id, customer_id\n* Timestamp fields, hash fields, UUIDs\n* Fields ending in _key, _hash, _uuid\n\nCARDINALITY GUIDELINES:\n\n* Ideal: 5-50 unique values (most operational use cases)\n* Acceptable: 50-500 values (if actionable segmentation)\n* No category field: Perfectly fine if no field provides actionable insights\n\nSTEP 5 - DETECTION INTERVAL GUIDELINES:\nMatch interval to data frequency and operational needs:\n\n* Real-time systems: 10-15 minutes (APIs, web services, errors, response times)\n* Infrastructure monitoring: 15-30 minutes (servers, databases, resource usage)\n* Business processes: 30-60 minutes (transactions, conversions)\n* Security logs: 10-30 minutes (access logs, firewalls, authentication)\n* Batch/ETL processes: 60+ minutes (data pipeline monitoring)\n* Sparse data: 60+ minutes (avoid false positives from empty buckets)\n\nSTEP 6 - SPARSE DATA INTELLIGENCE:\nCRITICAL: If validation suggests intervals >120 minutes due to sparse data:\n\n* PREFERRED SOLUTION: Remove category field entirely (better no segmentation than 4+ hour intervals)\n* ALTERNATIVE: Choose lower-cardinality category field\n* NEVER ACCEPT: If intervals are over 24 hours then we should aim for lower interval with different categorical fields.\n* PRINCIPLE: Operational usefulness over perfect segmentation\n\nOUTPUT FORMAT:\nReturn ONLY this structured format, no explanation:\n{category_field=field_name_or_empty|aggregation_field=field1,field2|aggregation_method=method1,method2|interval=minutes}\n\nIf no suitable fields exist, use empty strings for aggregation_field and aggregation_method" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is hard to maintain two prompts with almost identical content. Can you try to generate the claude one programmatically by adding extra information?
| } | ||
|
|
||
| Map<String, String> indexInfo = ImmutableMap | ||
| .of("indexName", indexName, "indexMapping", tableInfoJoiner.toString(), "dateFields", dateFields); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users may have huge index mapping. Do you have a max limit?
| } | ||
|
|
||
| @SuppressWarnings("unchecked") | ||
| **/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SuppressWarnings should be outside of comment block. This may cause CI compile failure.
| dependencies { | ||
| // 3P dependencies | ||
| compileOnly group: 'com.google.code.gson', name: 'gson', version: '2.10.1' | ||
| implementation group: 'org.owasp.encoder' , name: 'encoder', version: '1.3.1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you use owasp?
| if (response.getInterval() != null) { | ||
| newInterval = response.getInterval(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this override LLM's interval? If yes, is this on purpose? why do we need LLM's interval suggestion?
| TimeConfiguration newWindowDelay = originalDetector.getWindowDelay(); | ||
| Integer newHistoryIntervals = originalDetector.getHistoryIntervals(); | ||
| if (response.getInterval() != null) { | ||
| newInterval = response.getInterval(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this override LLM's interval? If yes, is this on purpose? why do we need LLM's interval suggestion?
Description
This PR introduces CreateAnomalyDetectorToolEnhanced, a tool that leverages LLM capabilities to automatically generate, validate, and create anomaly detectors for different given indices
Implementation Overview
The tool implements a step-by-step processing flow that handles batch operations on indices while maintaining robust error handling and validation mechanisms.
Code Flow
The implementation includes utilizing an LLM configuration generation, and retrying on different stages with the validate and suggest api to get optimal configuration that will lead to detector initializing
detector reliability.
The tool processes indices sequentially to prevent LLM throttling and implements retry logic with LLM-assisted error correction. Each index undergoes mapping analysis, LLM configuration generation, detector validation, suggest api based changes, and model validation before final deployment.
Usage Example
Input:
Output:
Processing Flow
The implementation follows a step-by-step flow for each index: extraction and validation, mapping retrieval, LLM configuration generation, detector structure validation, parameter optimization via suggest API, model requirement validation, and finally detector creation and activation. The tool maintains state consistency across batch operations and writes it to the detector result object that is returned as a string so an LLM can use this as one of its tools
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.