Skip to content

Conversation

@rafal-lal
Copy link
Collaborator

@rafal-lal rafal-lal commented Feb 11, 2025

Generally ready to review, but not yet ready to merge as changes made here needs to be reflected in main cnti-testsuite.

Changes:

  • add logging scopes with Log.for()
  • clean many unneeded log messages (used during development?)
  • refactor existing logs to be more informative, log.info should be clear message in english, log.debug should hold informations useful to debugging and not produce wall of logs with values of variables or other development leftovers. Also add log.trace usage for very verbose logs like outputs of commands etc.
  • add throwing of exceptions in case of errors in CMD execution - one of the main problems with this shard in my opinion - lack of error handling, there was simply no way to tell if some of the methods failed or not. In case of the simpler ones Process::Status, stdout and stderr were returned in format of NamedTuple - so that was ok (some of the methods returned NamedTuple, some just Bool - this was standarised as well). It is not idomatic to crystal-lang Exceptions based approach but acceptable. In case of more complex methods, CLI command that fetches some resource from the kubernetes could fail, and method would ignore this fact and continue with the logic and return some value - in most cases empty one. This leaves the caller of the method in complete unknown - did the method fail?, did it finish succesfully and returned empty value?, is the empty value valid response or not?. Error handling is actaully very important in testsuite - lets look at scenario in which test case that checks if number of replicas can be changed is executed. Lets say some 3rd or 4th call to KubectlClient shard fails due to random connectivity error to the cluster. Testcase fails. Is the user notified that testcase failed due to network error and should be restarted or it fails without proper feedback leaving users with 0 points for it? Exceptions should be caught somewhere at testcase level and clearly described to users. In situation where exception is somewhat expected, it can be rescued and some adequate to sitation action can be taken. Reference Crystal's way to do error handling is by raising and rescuing exceptions.
  • remove many non-generic Get submodule methods with universal Get::resource one - there were couple of methods
    repeating the very same pattern: kubectl get <kind> <resource_name>, but the kind was hardcoded to deployment, service, pods, nodes... The only changing value was kind. This is making this shard harder to maintain, debug and keep in consistent state, I've seen many duplicated functionality across this shard mostly due to lack of universality in methods.
  • remove some of the most exotic methods e.g. container_digests_by_nodes and others - the reason was mainly that these methods were not used at all in main cnti-testsuite. The second reason was that these methods are just too specific to make sense. We can take golang kubectl-client package as reference. It mostly provides functions to fetch or post the resources, any aggregations or filtering is mainly specific to certain use cases and is done inside solutions code, not inside library. Some reasonable helpers are definitely okay, but here we have situation when we have those quite specific helpers, and they are just not used anywhere. Instead main cnti-testsuite uses this shard to fetch the resources (as it should be) and leaves tons of these helpers here, forgotten, unmainted, unused, filled with TODOs and develop time logs.
  • remove commented out code
  • add new file with Wait submodule with methods extracted from wider Get submodule - there were enough of them that creation of new submodule made sense
  • add new constants
  • simplify logic of some methods where possible
  • add arguments and return types to many of the methods
  • update test cases

This is not everything but enough as first step. Second would be to move this shard to main cnti-testsuite as maintaining those shards outside of main codebase is problematic in itself.

@rafal-lal rafal-lal marked this pull request as draft February 11, 2025 12:31
@rafal-lal rafal-lal force-pushed the rlal-refactor-k8sclient-module branch from 8384b6b to fb5e1a9 Compare February 11, 2025 12:33
@rafal-lal rafal-lal requested a review from svteb February 17, 2025 12:38
@rafal-lal rafal-lal marked this pull request as ready for review February 17, 2025 12:38
@rafal-lal rafal-lal requested a review from martin-mat February 17, 2025 12:38
@rafal-lal rafal-lal changed the title Refactor KubectlClient module with focus on logging (without ::Get) Refactor KubectlClient module with focus on logging Feb 17, 2025
rafal-lal added a commit to lfn-cnti/testsuite that referenced this pull request Feb 20, 2025
- Also change regex error constants so they are more generic
- Update docker_client shard to latest release

Signed-off-by: Rafal Lal <[email protected]>
rafal-lal added a commit to cnf-testsuite/k8s_netstat that referenced this pull request Feb 24, 2025
rafal-lal added a commit to cnf-testsuite/k8s_netstat that referenced this pull request Feb 24, 2025
rafal-lal added a commit to cnf-testsuite/cluster_tools that referenced this pull request Feb 28, 2025
rafal-lal added a commit to lfn-cnti/testsuite that referenced this pull request Feb 28, 2025
rafal-lal added a commit to lfn-cnti/testsuite that referenced this pull request Mar 5, 2025
- Use log.debug for most of the 'get something' methods
- Use log.info for methods that are changing something on the cluster
- Use log.trace for very detailed logs
- Use log.warn for non-critical issues

Signed-off-by: Rafal Lal <[email protected]>
Signed-off-by: Rafal Lal <[email protected]>
Copy link
Collaborator

@svteb svteb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor inconsistencies:

  • Sometimes when running a function from the same module you prefix it with KubectlClient::Module_name.function and at other times you only use function.
  • Occasionally there is no info message printed when running a function, this is kind of inconsistent. Especially in Get the very first message is usually debug while in other modules the first message is info.
  • Sometimes you print the result of function with info and other times not. Look at pods_by_resource_labels for example.
  • I don't quite understand why you sometimes specify the amount of resources returned (this is only done with pods, I believe). I have no qualms about this but it feels a little exotic/purposeless.
  • In the future we could consider making pod_ready and node_ready functions "unified", similarly to how it has been done with get_resource.

None of these propositions are too important so I'm approving.

Comment on lines +80 to +86
class K8sClientCMDException < Exception
MSG_TEMPLATE = "kubectl CMD failed, exit code: %s, error: %s"

def initialize(message : String?, exit_code : Int32, cause : Exception? = nil)
super(MSG_TEMPLATE % {exit_code, message}, cause)
end
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like using % is not exactly idiomatic to crystal, consider using #{}.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different use case, here Im filling previously defined template string and not create new string with variables.

Comment on lines +29 to +34
cmd = "kubectl get #{kind} #{resource_name}"
cmd = "#{cmd} --field-selector #{field_selector}" if field_selector && !resource_name
cmd = "#{cmd} --selector #{selector}" if selector && !resource_name
cmd = "#{cmd} -n #{namespace}" if namespace && !all_namespaces
cmd = "#{cmd} -A" if !namespace && all_namespaces
cmd = "#{cmd} -o json"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably result in some uneven formatting if cmd ever gets printed (kubectl get #{kind} --field-selector #{field_selector}), consider normalizing the spacing in some final command.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there will be one space more printed in log.trace. This is so NIT that I honestly don't want to bother with pushing new commit.

# todo check for success/fail
JSON.parse(result[:output])
end
def self.privileged_containers(namespace : String? = nil, all_namespaces : Bool? = true) : Array(JSON::Any)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting all_namespaces to true here is a little destructive if a user passes namespace. Although I have no idea how to resolve it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, Im hoping people will look up the signature before using. Hypothetically I could add validator, but that would mean adding validators to all methods for consistency - not something I would like to do.

@martin-mat martin-mat requested a review from rich-l March 11, 2025 13:01
@rafal-lal
Copy link
Collaborator Author

Minor inconsistencies:

* Sometimes when running a function from the same module you prefix it with `KubectlClient::Module_name.function` and at other times you only use `function`.

* Occasionally there is no `info` message printed when running a function, this is kind of inconsistent. Especially in `Get` the very first message is usually `debug` while in other modules the first message is `info`.

* Sometimes you print the result of function with `info` and other times not. Look at `pods_by_resource_labels` for example.

* I don't quite understand why you sometimes specify the amount of resources returned (this is only done with pods, I believe). I have no qualms about this but it feels a little exotic/purposeless.

* In the future we could consider making `pod_ready` and `node_ready` functions "unified", similarly to how it has been done with `get_resource`.

None of these propositions are too important so I'm approving.

  • If I'm calling something from other KubectlClient submodule I need to use KubectlClient::Module_name.function, otherwise it wont be found, methods from the same submodule can be called with function.
  • Its on purpose to not bloat the INFO logs. Reasoning is that methods which logs INFO are actually modifying something on the cluster, so it should be loudly reported. Those Gets use DEBUG as they only fetch data from the cluster, not that important unless investigating behavior closely.
  • This is on purpose, look at pods_by_resource_labels end, it calls 2 other methods which are printing logs, printing here would mean duplication of logs.
  • That is a little inconsistent. I might change it actually.
  • Good point, but I would leave it for now.

@martin-mat martin-mat requested a review from Smitholi67 March 11, 2025 14:15
Copy link

@rich-l rich-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@martin-mat martin-mat merged commit 1096aed into rlal-split-main-file Mar 11, 2025
1 check passed
rafal-lal added a commit that referenced this pull request Mar 12, 2025
Refactor KubectlClient module with focus on logging

Signed-off-by: Rafal Lal <[email protected]>
rafal-lal added a commit that referenced this pull request Mar 12, 2025
* Split kubectl_client.cr into smaller files (#16)

Signed-off-by: Rafal Lal <[email protected]>
rafal-lal added a commit to lfn-cnti/testsuite that referenced this pull request Mar 12, 2025
rafal-lal added a commit to cnf-testsuite/cluster_tools that referenced this pull request Mar 12, 2025
* Update kubectl_client version to v1.0.8

Signed-off-by: Rafal Lal <[email protected]>
martin-mat pushed a commit to cnf-testsuite/helm that referenced this pull request Mar 12, 2025
Update shard.* files

Signed-off-by: Rafal Lal <[email protected]>
martin-mat pushed a commit to cnf-testsuite/k8s_kernel_introspection that referenced this pull request Mar 12, 2025
martin-mat pushed a commit to cnf-testsuite/k8s_netstat that referenced this pull request Mar 12, 2025
rafal-lal added a commit to lfn-cnti/testsuite that referenced this pull request Mar 12, 2025
rafal-lal added a commit to lfn-cnti/testsuite that referenced this pull request Mar 14, 2025
)

* Update shard.yml with new versions of shards (which were also updated accordingly to cnf-testsuite/kubectl_client#17)

Signed-off-by: Rafal Lal <[email protected]>
LuciaSirova pushed a commit to lfn-cnti/testsuite that referenced this pull request Mar 27, 2025
)

* Update shard.yml with new versions of shards (which were also updated accordingly to cnf-testsuite/kubectl_client#17)

Signed-off-by: Rafal Lal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants