- 
                Notifications
    You must be signed in to change notification settings 
- Fork 21
Initial Icinga Notifications Source #998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
98939bd    to
    25677ca      
    Compare
  
    | I've changed quite a few things since your last push, so I'll summarize the changes here for you: 
 Also, all the individual commits are self-contained, and includes the relevant commit message with some details about Footnotes | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the inline comments, there's also a more general question that still needs consideration: error handling. In the sense that what happens if submitting the event to Icinga Notifications doesn't work. With the current implementation, it just discarded and thereby lost. That would be solved if the submission was synchronous in the history pipeline, just as writing the history to the SQL database currently is. In that case, the event would only be deleted from Redis after it was both written to the database and sent to Icinga Notifications (on the other hand, if Icinga Notifications is unavailable, this would probably prevent history from being written).
This would also solve a related issue currently present in the implementation: once it's written to the Go channel, it's considered done. If Icinga DB is stopped, there may be still events in flight and there seems to be nothing that would wait for them. So unless I'm missing something, a clean restart of the Icinga DB process might result in lost events.
        
          
                pkg/icingadb/history/sync.go
              
                Outdated
          
        
      | // executed successfully. | ||
|  | ||
| if callback != nil { | ||
| pipeline = append(pipeline, makeCallbackStageFunc(callback)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should always force a copy. pipeline is still a reference into what's stored in syncPipelines and if these still have capacity, it could be appended there in place. Not sure it this could become a problem, but just avoiding it sounds like the safer option to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid point. I have changed the pipeline logic, addressing your other comment #998 (comment).
Now, the variable is being shadowed and only this one gets altered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, the variable is being shadowed and only this one gets altered.
That's not what I meant to say. As long as the backing array has still free capacity, append() will keep using the same backing array. So if you append multiple times to the same slice descriptor, you may replace elements in slice descriptors returned by previous calls to append.
I wanted to say this should do something like this which forces a copy:
pipeline = append(slices.Clip(pipeline), makeCallbackStageFunc(callback))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed accordingly.
        
          
                pkg/icingadb/history/sync.go
              
                Outdated
          
        
      | switch key { // keep in sync with syncPipelines below | ||
| case "notification": | ||
| structPtr = (*v1.NotificationHistory)(nil) | ||
| case "state": | ||
| structPtr = (*v1.StateHistory)(nil) | ||
| case "downtime": | ||
| structPtr = (*v1.DowntimeHistory)(nil) | ||
| case "comment": | ||
| structPtr = (*v1.CommentHistory)(nil) | ||
| case "flapping": | ||
| structPtr = (*v1.FlappingHistory)(nil) | ||
| case "acknowledgement": | ||
| structPtr = (*v1.AcknowledgementHistory)(nil) | ||
| default: | ||
| return fmt.Errorf("unsupported key %q", key) | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how much nicer this will be in the end, but just an idea for now: What if instead of keeping syncPipelines as more or less a constant, why have a function instead that generated that map and takes the callback pointer and if it's set, directly inserts the extra stage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reworked this and removed the duplicate type mapping switch case. Now there is a "consumer"-supplied map of requested pipeline keys and destination types.
So there are only callback calls for keys one is actually interested in and one is able to say what kind of type should be used. This became necessary for the DowntimeHistory/HistoryDowntime thingy.
        
          
                pkg/notifications/redis_fetch.go
              
                Outdated
          
        
      | redisHGet := func(typ, field string, out *redisLookupResult) error { | ||
| ctx, cancel := context.WithTimeout(ctx, 5*time.Second) | ||
| defer cancel() | ||
|  | ||
| err := retry.WithBackoff( | ||
| ctx, | ||
| func(ctx context.Context) error { return s.redisClient.HGet(ctx, "icinga:"+typ, field).Scan(out) }, | ||
| retry.Retryable, | ||
| backoff.DefaultBackoff, | ||
| retry.Settings{}, | ||
| ) | ||
| if err != nil { | ||
| if errors.Is(err, redis.Nil) { | ||
| return fmt.Errorf("%s with ID %s not found in Redis", typ, hostId) | ||
| } | ||
| return fmt.Errorf("failed to fetch %s with ID %s from Redis: %w", typ, field, err) | ||
| } | ||
| return nil | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this be much simpler if this function just returned the name, i.e. the return type would be (string, error)? It seems to be used for nothing else than fetching the name and just returning would avoid the struct member with the "don't use it" comment and also avoid clearing that member below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this be much simpler if this function just returned the name, i.e. the return type would be
(string, error)?
Which function are you referring to? I don't follow your suggestions either! The result fetched from Redis is a Json string, so I either had to parse that JSON manually or use the already supported Json unmarshalling mechanics from the Redis client. Though, that Json string will always contain name as a filed for both host and services, so I have to use the same result field for both of them, but as states in the comment, it shouldn't be used for anything else other than decoding the Json result. So, I've no idea what exactly would make this simpler or which part of this you're considered to be complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which function are you referring to?
The one I've quoted, i.e. replace the out *redisLookupResult parameter of redisHGet with a string return value.
so I either had to parse that JSON manually or use the already supported Json unmarshalling mechanics from the Redis client
What are you referring to with already supported? Isn't the following function doing the JSON unmarshalling here, i.e. is something you implemented?
icingadb/pkg/notifications/redis_fetch.go
Lines 76 to 89 in 25677ca
| // UnmarshalBinary implements the [encoding.BinaryUnmarshaler] interface for redisLookupResult. | |
| // | |
| // It unmarshals the binary data of the Redis HGet result into the redisLookupResult struct. | |
| // This is required for the HGet().Scan() usage in the [Source.fetchHostServiceName] function to work correctly. | |
| func (rlr *redisLookupResult) UnmarshalBinary(data []byte) error { | |
| if len(data) == 0 { | |
| return errors.New("empty data received for redisLookupResult") | |
| } | |
| if err := json.Unmarshal(data, rlr); err != nil { | |
| return fmt.Errorf("failed to unmarshal redis result: %w", err) | |
| } | |
| return nil | |
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which function are you referring to?
The one I've quoted, i.e. replace the
out *redisLookupResultparameter of redisHGet with astringreturn value.
How would make this the don't use this for anything else struct member description obsolete? Ok, using an intermediate variable within that lambda would eliminate the need to clear the Name field after each Redis query but I don't see, how this would replace the field description/hints in anyway.
What are you referring to with already supported? Isn't the following function doing the JSON unmarshalling here, i.e. is something you implemented?
I'm referring to the Scan() call. Yes, I have to implement an interface to make this work but without the Scan() call the Json string was wrapped in another results field or something, and that's the JSON unmarshalling support I's referring to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's roughly what I had in mind (compiles but untested apart from that):
func (client *Client) fetchHostServiceName(ctx context.Context, hostId, serviceId types.Binary) (*redisLookupResult, error) {
	getNameFromRedis := func(typ, id string) (string, error) {
		ctx, cancel := context.WithTimeout(ctx, 5*time.Second)
		defer cancel()
		var data string
		err := retry.WithBackoff(
			ctx,
			func(ctx context.Context) (err error) {
				data, err = client.redisClient.HGet(ctx, "icinga:"+typ, id).Result()
				return
			},
			retry.Retryable,
			backoff.DefaultBackoff,
			retry.Settings{},
		)
		if err != nil {
			if errors.Is(err, redis.Nil) {
				return "", fmt.Errorf("%s with ID %s not found in Redis", typ, hostId)
			}
			return "", fmt.Errorf("failed to fetch %s with ID %s from Redis: %w", typ, id, err)
		}
		var result struct {
			Name string `json:"name"`
		}
		if err := json.Unmarshal([]byte(data), &result); err != nil {
			return "", fmt.Errorf("failed to unmarshal redis result: %w", err)
		}
		return result.Name, nil
	}
	var result redisLookupResult
	var err error
	result.HostName, err = getNameFromRedis("host", hostId.String())
	if err != nil {
		return nil, err
	}
	if serviceId != nil {
		result.ServiceName, err = getNameFromRedis("service", serviceId.String())
		if err != nil {
			return nil, err
		}
	}
	return &result, nil
}
// redisLookupResult defines the structure of the Redis message we're interested in.
type redisLookupResult struct {
	HostName    string `json:"-"` // Name of the host (never empty).
	ServiceName string `json:"-"` // Name of the service (only set in service context).
}Less readable diff with the same changes that can just be applied to the file
diff --git a/pkg/notifications/redis_fetch.go b/pkg/notifications/redis_fetch.go
index 3cacde1..c221400 100644
--- a/pkg/notifications/redis_fetch.go
+++ b/pkg/notifications/redis_fetch.go
@@ -24,40 +24,52 @@ import (
 // request and return an error indicating that the operation timed out. In case of the serviceId being set, the
 // maximum execution time of the Redis HGet commands is 10s (5s for each HGet call).
 func (client *Client) fetchHostServiceName(ctx context.Context, hostId, serviceId types.Binary) (*redisLookupResult, error) {
-	redisHGet := func(typ, field string, out *redisLookupResult) error {
+	getNameFromRedis := func(typ, id string) (string, error) {
 		ctx, cancel := context.WithTimeout(ctx, 5*time.Second)
 		defer cancel()
 
+		var data string
 		err := retry.WithBackoff(
 			ctx,
-			func(ctx context.Context) error { return client.redisClient.HGet(ctx, "icinga:"+typ, field).Scan(out) },
+			func(ctx context.Context) (err error) {
+				data, err = client.redisClient.HGet(ctx, "icinga:"+typ, id).Result()
+				return
+			},
 			retry.Retryable,
 			backoff.DefaultBackoff,
 			retry.Settings{},
 		)
 		if err != nil {
 			if errors.Is(err, redis.Nil) {
-				return fmt.Errorf("%s with ID %s not found in Redis", typ, hostId)
+				return "", fmt.Errorf("%s with ID %s not found in Redis", typ, hostId)
 			}
-			return fmt.Errorf("failed to fetch %s with ID %s from Redis: %w", typ, field, err)
+			return "", fmt.Errorf("failed to fetch %s with ID %s from Redis: %w", typ, id, err)
 		}
-		return nil
+
+		var result struct {
+			Name string `json:"name"`
+		}
+
+		if err := json.Unmarshal([]byte(data), &result); err != nil {
+			return "", fmt.Errorf("failed to unmarshal redis result: %w", err)
+		}
+
+		return result.Name, nil
 	}
 
 	var result redisLookupResult
-	if err := redisHGet("host", hostId.String(), &result); err != nil {
+	var err error
+
+	result.HostName, err = getNameFromRedis("host", hostId.String())
+	if err != nil {
 		return nil, err
 	}
 
-	result.HostName = result.Name
-	result.Name = "" // Clear the name field for the host, as we will fetch the service name next.
-
 	if serviceId != nil {
-		if err := redisHGet("service", serviceId.String(), &result); err != nil {
+		result.ServiceName, err = getNameFromRedis("service", serviceId.String())
+		if err != nil {
 			return nil, err
 		}
-		result.ServiceName = result.Name
-		result.Name = "" // It's not needed anymore, clear it!
 	}
 
 	return &result, nil
@@ -67,23 +79,4 @@ func (client *Client) fetchHostServiceName(ctx context.Context, hostId, serviceI
 type redisLookupResult struct {
 	HostName    string `json:"-"` // Name of the host (never empty).
 	ServiceName string `json:"-"` // Name of the service (only set in service context).
-
-	// Name is used to retrieve the host or service name from Redis.
-	// It should not be used for any other purpose apart from within the [Client.fetchHostServiceName] function.
-	Name string `json:"name"`
-}
-
-// UnmarshalBinary implements the [encoding.BinaryUnmarshaler] interface for redisLookupResult.
-//
-// It unmarshals the binary data of the Redis HGet result into the redisLookupResult struct.
-// This is required for the HGet().Scan() usage in the [Client.fetchHostServiceName] function to work correctly.
-func (rlr *redisLookupResult) UnmarshalBinary(data []byte) error {
-	if len(data) == 0 {
-		return errors.New("empty data received for redisLookupResult")
-	}
-
-	if err := json.Unmarshal(data, rlr); err != nil {
-		return fmt.Errorf("failed to unmarshal redis result: %w", err)
-	}
-	return nil
 }There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, what do you want me say? I'm not working on this, I was just commenting on why I did this, so if that look simpler to you, change it however you like, I don't mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any response at all would have been nice. If you need to figure out who is supposed to address the review comments, that would be a first step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed and applied diff.
| // The raw SQL query in the database is URL-encoded (mostly the space character is replaced by %20). | ||
| // So, we need to unescape it before passing it to the database. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds odd. I thought there was no support in Web for that yet. And independent of whether it currently exists in Web, I don't see why that should be URL-encoded. (Previously, URL-encoding was used to escape operators in tag names/values, but there's no surrounding structure anymore that would need something like this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just needed to resolve a problem I was facing and instead of digging into it on the Web side, I just added a simple fix here. And as it states in the comments mostly only the space characters were URL-encoded, but whether this will still be the case after Web has a full support of this, I don't know 🤷!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way, this has to be removed before this will get merged. And probably more importantly, this must not result in team web adding pointless URL encoding here because we expect it (as we currently with this state of the PR).
6dd9d96    to
    a199863      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question on something I didn't think about before: the history is passed over multiple streams that are processed independently. What consideration did you put into this? At first glance, this sounds like a nasty source of race conditions to me if things happen out of order.
        
          
                pkg/icingaredis/telemetry/stats.go
              
                Outdated
          
        
      | "config_sync": &Stats.Config, | ||
| "state_sync": &Stats.State, | ||
| "history_sync": &Stats.History, | ||
| "callback_sync": &Stats.Callback, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That name makes no sense, nobody will understand what that name is supposed to say unless Icinga 2 mapped it to a more meaningful name, i.e. make the association callback == notification submission, but then this can directly get a more meaningful name here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored to allow callback-specific names.
        
          
                cmd/icingadb/main.go
              
                Outdated
          
        
      | go func() { | ||
| var callback func(database.Entity) bool | ||
| var callbackKeyStructPtr map[string]any | ||
|  | ||
| if cfg := cmd.Config.NotificationsSource; cfg.ApiBaseUrl != "" { | ||
| logger.Info("Starting Icinga Notifications source") | ||
|  | ||
| notificationsSource := notifications.NewNotificationsClient( | ||
| ctx, | ||
| db, | ||
| rc, | ||
| logs.GetChildLogger("notifications-source"), | ||
| cfg) | ||
| callback = notificationsSource.Submit | ||
| callbackKeyStructPtr = notifications.SyncKeyStructPtrs | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be done in this background routine given that notifications.NewNotificationsClient() can terminate the process:
icingadb/pkg/notifications/notifications.go
Lines 65 to 68 in ed7b0d1
| notificationsClient, err := source.NewClient(client.Config, fmt.Sprintf("Icinga DB %s", internal.Version.Version)) | |
| if err != nil { | |
| logger.Fatalw("Cannot create Icinga Notifications client", zap.Error(err)) | |
| } | 
Also, look at the number of calls to logger.Fatal*() in this function. So for consistency, that function shouldn't even terminate the process by itself but leave the job to main().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
        
          
                pkg/notifications/notifications.go
              
                Outdated
          
        
      | notificationsClient, err := source.NewClient(client.Config, fmt.Sprintf("Icinga DB %s", internal.Version.Version)) | ||
| if err != nil { | ||
| logger.Fatalw("Cannot create Icinga Notifications client", zap.Error(err)) | ||
| } | ||
| client.notificationsClient = notificationsClient | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider doing that first. Then you can just initialize the Client struct in one go (and just return &Client{...}).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
        
          
                pkg/notifications/notifications.go
              
                Outdated
          
        
      | ctx: ctx, | ||
| } | ||
|  | ||
| notificationsClient, err := source.NewClient(client.Config, fmt.Sprintf("Icinga DB %s", internal.Version.Version)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also just be a simple string concatenation using +:
| notificationsClient, err := source.NewClient(client.Config, fmt.Sprintf("Icinga DB %s", internal.Version.Version)) | |
| notificationsClient, err := source.NewClient(client.Config, "Icinga DB " + internal.Version.Version) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
        
          
                pkg/notifications/notifications.go
              
                Outdated
          
        
      | var ev *event.Event | ||
| var eventErr error | ||
|  | ||
| // Keep the type switch in sync with syncPipelines from pkg/icingadb/history/sync.go | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That comment seems to be outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
        
          
                pkg/notifications/notifications.go
              
                Outdated
          
        
      | // buildCommonEvent creates an event.Event based on Host and (optional) Service names. | ||
| // | ||
| // This function is used by all event builders to create a common event structure that includes the host and service | ||
| // names, the absolute URL to the Icinga Web 2 Icinga DB page for the host or service, and the tags for the event. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the absolute URL to the Icinga Web 2 Icinga DB page for the host or service
Outdated, it's no longer absolute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
        
          
                pkg/notifications/notifications.go
              
                Outdated
          
        
      | func (client *Client) buildCommonEvent(rlr *redisLookupResult) (*event.Event, error) { | ||
| var ( | ||
| objectName string | ||
| objectUrl url.URL | ||
| objectTags map[string]string | ||
| ) | ||
|  | ||
| if rlr.ServiceName != "" { | ||
| objectName = rlr.HostName + "!" + rlr.ServiceName | ||
|  | ||
| objectUrl.Path = "/icingadb/service" | ||
| objectUrl.RawQuery = "name=" + utils.RawUrlEncode(rlr.ServiceName) + "&host.name=" + utils.RawUrlEncode(rlr.HostName) | ||
|  | ||
| objectTags = map[string]string{ | ||
| "host": rlr.HostName, | ||
| "service": rlr.ServiceName, | ||
| } | ||
| } else { | ||
| objectName = rlr.HostName | ||
|  | ||
| objectUrl.Path = "/icingadb/host" | ||
| objectUrl.RawQuery = "name=" + utils.RawUrlEncode(rlr.HostName) | ||
|  | ||
| objectTags = map[string]string{ | ||
| "host": rlr.HostName, | ||
| } | ||
| } | ||
|  | ||
| return &event.Event{ | ||
| Name: objectName, | ||
| URL: objectUrl.String(), | ||
| Tags: objectTags, | ||
| }, nil | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think url.URL is still doing anything useful anymore, it just inserts a ? now. That could also be string concatenation.
Also, the variables provide little benefit here, both branches could just return &event.Event{...} directly.
Finally, there's no more case returning an error, so that could be removed from the function signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to string and reworked method a bit.
| case "downtime_end": | ||
| if h.HasBeenCancelled.Valid && h.HasBeenCancelled.Bool { | ||
| ev.Type = event.TypeDowntimeRemoved | ||
| ev.Message = "Downtime was cancelled" | ||
|  | ||
| if h.CancelledBy.Valid { | ||
| ev.Username = h.CancelledBy.String | ||
| } | ||
| } else { | ||
| ev.Type = event.TypeDowntimeEnd | ||
| ev.Message = "Downtime expired" | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where's the unmute happening? And is there any replacement for the old isMuted() function that checks if the checkable is even to be unmuted?
| backlogLastId = msgs[1].ID | ||
| backlogTimerInterval = backlogTimerMinInterval | ||
| _ = backlogTimer.Reset(backlogTimerInterval) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wont this effectively do a sleep(10ms)? So if everything works fine, won't this artificially limit throughput?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduced timeout.
        
          
                pkg/icingadb/history/sync.go
              
                Outdated
          
        
      | backlogLastId = "" | ||
| logger.Infow("Finished rolling back backlog of callback elements", zap.Int("delay", backlogMsgCounter)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this in anyway synchronize with what's read from in? Can events be lost or processed twice because of this?
This is the first version to use Icinga DB as an event source for Icinga Notifications. If configured accordingly, Icinga DB forwards events crafted from the Redis pipeline to the Icinga Notifications API. This required a small refactoring of the history synchronization to allow hooking into the Redis stream. Afterwards, the newly introduced notifications package handles the rest. Note: As part of this architectural change, Icinga Notifications offers filters to be evaluated by Icinga DB. At the moment, these are SQL queries being executed on the Icinga DB relational database. Either consider both Icinga DB and Icinga Notifications to be part of the same trust domain or consider the security implications. Furthermore, this change requires a change on Icinga Notifications as well. This will not work with the current version 0.1.1.
Most of the notifications related code from here were outsourced to Icinga Go Library, thus removes all the now obsolte ones from here.
Instead allow them to reference any columns of the database entity as long as that entity provides it. It also removes the retry mechanism used to execute the queries as this would block the worker unnecessarily.
Otherwise, posting the entity in a `go s.Submit(entity)` manner in the background will mess up the order of events as there might be another even in the queue affecting the same entity. Apart from that, the log entry "submitted event ..." is also downgraded to debug level, as it creates too much noise at the info level without saying anything relevant to an end user.
Instead of retrieving the host and service names from the used RDBMs, this commit allows us to query them from Redis. This is done to avoid the overhead of database queries, especially when the host and service names are always to be found in Redis. The previous implementation simply perfomed two database queries with each received entity based on their IDs, but we can perform this operation more efficiently from Redis using the same filtering logic as before. Of course, we now have to maintain more code needed to handle the Redis operations, but this is a trade-off we should be willing to make for performance reasons.
There won't be any concurrent access to the rules, so we don't need to guard it with a mutex.
WIP because we might move the code elsewhere.
- Bump IGL to latest changes in Icinga/icinga-go-library#145. - Allow specifying which pipeline keys are relevant, ignore others. - Allow specifying which pipeline key should be parsed in which type. - Create history.DowntimeHistoryMeta as a chimera combining history.DowntimeHistory and history.HistoryDowntime to allow access event_type, distinguishing between downtime_start and downtime_end. - Trace times for submission steps in the worker. Turns out, the single threaded worker blocks roughly two seconds for each Client.ProcessEvent method call. This might sum up to minutes if lots of events are processed at once. My current theory is that the delay results in the expensive bcrypt hash comparison on Notifications.
The rules and rule version is now part of the Event. Also rename the Client method receiver variable.
Do not silently drop failing callback submissions - such as Icinga Notification during restarts or network disruptions -, but switch the internal makeCallbackStageFunc stageFunc into a backlog mode. This resulted in multiple changes, including removing the background worker for notifications.Client, as otherwise the event submission status could not be propagated back.
There is no need to let each Icinga Notifications source know the root URL of Icinga Web 2. Since the latest IGL and IN change, partly URLs relative to Icinga Web 2 are supported.
When a faulty - like syntactical incorrect - object filter expression was loaded, each evaluation fails. However, prior to this change, the submission logic was exited, making Icinga DB unable to recover. Now, the event will be considered as no rule has matched and new rule version can be loaded.
The RulesInfo type was simplified. Rules are no longer a custom struct, but just represented by the map key and a filter expression string.
Briefly describe the required configuration for Icinga Notifications Source next to mentioning it in the About section.
Refactor the telemetry.Stats to allow custom names. This enabled dynamic callback names for the Redis history sync, used by Icinga Notifications.
- Don't validate notifications config in a background Goroutine. - Clip pipeline slice to avoid reusing capability twice. - Rework notification Client.buildCommonEvent and depending methods. - Resubmit events after updating rules in one go. - Simplify Client.fetchHostServiceName based on Julian's suggestion. Co-Authored-By: Julian Brost <[email protected]>
ed7b0d1    to
    0ab4f90      
    Compare
  
    | Note: After talking with @lippserd, we realized that the custom vars are required for permission filtering in Web. I would check how to add the custom vars from Icinga DB in a "quick fix" manner. | 
After reintroducing Event.ExtraTags in the IGL and Icinga Notifications, Icinga DB populates events by their custom variables. At the moment, the required customvars are fetched from Redis for each event. Due to the Redis schema, at least on HGETALL with manual filtering is required. This might be a good candidate for further caching, and cache invalidation.
48248d9    to
    214bfe2      
    Compare
  
    | Icinga/icingadb-web#1289 now prepares this as filter: {
  "version": 1,
  "config": {
    "type": "all",
    "filter": "host.name=docker-master"
  },
  "queries": {
    "host": [
      "SELECT (1) FROM host WHERE ((host.id = ?) AND (host.environment_id = ?)) AND (host.name = ?) LIMIT 1",
      [":host_id",":environment_id","docker-master"]
    ],
    "service": [
      "SELECT (1) FROM service LEFT JOIN host service_host ON service_host.id = service.host_id WHERE ((service.id = ?) AND (service.environment_id = ?)) AND (service_host.name = ?) LIMIT 1",
      [":service_id",":environment_id","docker-master"]
    ]
  }
}
 | 
This is the first version to use Icinga DB as an event source for Icinga Notifications. If configured accordingly, Icinga DB forwards events crafted from the Redis pipeline to the Icinga Notifications API.
This required a small refactoring of the history synchronization to allow hooking into the Redis stream. Afterwards, the newly introduced notifications package handles the rest.
Note: As part of this architectural change, Icinga Notifications offers filters to be evaluated by Icinga DB. At the moment, these are SQL queries being executed on the Icinga DB relational database. Either consider both Icinga DB and Icinga Notifications to be part of the same trust domain or consider the security implications.
Furthermore, this change requires a change on Icinga Notifications as well. This will not work with the current version 0.1.1.
Other PRs are