Skip to content

Add more prometheus metrics #33307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

TheFox0x7
Copy link
Contributor

@TheFox0x7 TheFox0x7 commented Jan 16, 2025

Adds http tracking metrics, cache latency histogram, hit/miss counter for cache, counter and histogram for git commands and counters for migration success/fail and currently running ones.

gitea_issues_open and gitea_issues_closed are deprecated for labels on gitea_issues

Closes: #14724

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jan 16, 2025
@github-actions github-actions bot added the modifies/go Pull requests that update Go code label Jan 16, 2025
@TheFox0x7 TheFox0x7 force-pushed the prometheus-reorganization branch from cc06fe3 to 28615e2 Compare January 16, 2025 22:40
@TheFox0x7 TheFox0x7 force-pushed the prometheus-reorganization branch 2 times, most recently from 46424ca to 2f86823 Compare January 17, 2025 23:09
@TheFox0x7 TheFox0x7 force-pushed the prometheus-reorganization branch from 2f86823 to 056a308 Compare January 23, 2025 23:00
@TheFox0x7
Copy link
Contributor Author

TheFox0x7 commented Jul 8, 2025

List of added metrics (and notes):

migrations

gitea_repository_inflight_migrations
gitea_repository_migrations{result: success/fail} - not sure if that's useful. Inflight could be used for alerting if suddenly someone's trying to migrate a lot of repositories and abuse the instance but this...? Thoughts? I'd guess this would go better as a span if anything.

cache

there's no standard here so cache stays in gitea namespace until one emerges

gitea_cache_response{state: hit/miss}
gitea_cache_latency - TODO tune buckets, probably cap at 0.5s.

http

Well defined so standard is applied with few deviations.

http_server_response_body_size{http_request_method, http_response_status_code, http_route} -buckets taken from echo instrumentation.
http_server_request_body_size{http_request_method,http_response_status_code,http_route} - see above
http_server_request_duration_seconds{http_request_method,http_response_status_code,http_route} - non-default buckets taken from dotnet version.
http_server_active_requests{http_request_method}

db

per spec

db_client_operation_duration_seconds - TODO: missing db type label

git

might be useful to find if there are some very long running commands and look for them or if a lot of them are processing at the same time. I'd like to add a label which command it was as version will take a very different time to diff or log but that's still a concept.

gitea_git_command_duration_seconds
gitea_git_active_commands

cron

gitea_cron_active_tasks

Suggestions for more or comments are welcome. I'll update this comment as things go.

@TheFox0x7
Copy link
Contributor Author

TheFox0x7 commented Jul 8, 2025

for completeness from the linked issue:

  • memory usage - covered by golang to a degree. Full data should be gathered by external system (node_exporter/cadvisor)
  • cpu - out of scope for gitea I feel. Node exporter/cadvisor can provide this
  • running git process number - provided by PR
  • running cron tasks number - provided
  • running migration tasks - provided
  • running queue worker number - TODO I don't see a use but it can be done
  • admin notices - provided
  • successful/failed ssh logins - AFAIK that would be doable only on built-in ssh unless we scrape logs. External would have to be done by ebpf or other external solutions unless I'm missing something as I haven't looked closely at how gitea handles ssh servers.
  • login attempts - provided
  • TCP connections - I feel they are out of scope and better reported by some dedicated exporter (ebpf/others) or a reverse proxy
  • http latency - provided per route, method and code.
  • errors with git repos - unsure what is the expectation here. a counter which increments on error codes or one bumped manually on errors with repo handling? I think this is best left to (structured in the future) logging.
  • webhook calls - provided

sidenote relating to #32866

While this tries to follow otel semconv (deviating for buckets and units) migrating to otel with prometheus exporter is not on my roadmap as of now due to performance difference between the two.

@TheFox0x7
Copy link
Contributor Author

TheFox0x7 commented Jul 9, 2025

I feel like this is now in a state where I wouldn't want it merged as is but some feedback/test deployment to finetune buckets or figure out if metrics are useful or not would be good so I'm taking the draft label down.
I'll try to daily drive this for a bit and see how it reports

@TheFox0x7 TheFox0x7 marked this pull request as ready for review July 9, 2025 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. modifies/dependencies modifies/go Pull requests that update Go code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose more informations on /metrics
4 participants