Skip to content

Conversation

chewbranca
Copy link
Contributor

This PR supercedes #5491 and includes @iilyak's excellent HTTP updates on top of it, as well as some final cleanup and documentation from me. I've copied the contents of CSRT.md into the PR description here:

Couch Stats Resource Tracker (CSRT)

CSRT (Couch Stats Resource Tracker) is a real time stats tracking system that
tracks the quantity of resources induced at the process level in a live
queryable manner that also generates process lifetime reports containing
statistics on the total resource load of a request, as a function of things like
dbs/docs opened, view and changes rows read, changes returned vs processed,
Javascript filter usage, duration, and more. This system is a paradigm shift in
CouchDB visibility and introspection, allowing for expressive real time querying
capabilities to introspect, understand, and aggregate CouchDB internal resource
usage, as well as powerful filtering facilities for conditionally generating
reports on "heavy usage" requests or "long/slow" requests. CSRT also extends
recon:proc_window with csrt:proc_window allowing for the same style of
battle hardened introspection with Recon's excellent proc_window, but with the
sample window over any of the CSRT tracked CouchDB stats!

CSRT does this by piggy-backing off of the existing metrics tracked by way of
couch_stats:increment_counter at the time when the local process induces those
metrics inc calls, and then CSRT updates an ets entry containing the context
information for the local process, such that global aggregate queries can be
performed against the ets table as well as the generation of the process
resource usage reports at the conclusions of the process's lifecyle.The ability
to do aggregate querying in realtime in addition to the process lifecycle
reports for post facto analysis over time, is a cornerstone of CSRT that is the
result of a series of iterations until a robust and scalable aproach was built.

The real time querying is achieved by way of a global ets table with
read_concurrency, write_concurrency, and decentralized_counters enabled.
Great care was taken to ensure that zero concurrent writes to the same key
occure in this model, and this entire system is predicated on the fact that
incremental updates to ets:update_counters provides really fast and
efficient updates in an atomic and isolated fashion when coupled with
decentralized counters and write concurrency. Each process that calls
couch_stats:increment_counter tracks their local context in CSRT as well, with
zero concurrent writes from any other processes. Outside of the context setup
and teardown logic, only operations to ets:update_counter are performed, one
per process invocation of couch_stats:increment_counter, and one for
coordinators to update worker deltas in a single batch, resulting in a 1:1 ratio
of ets calls to real time stats updates for the primary workloads.

The primary achievement of CSRT is the core framework iself for concurrent
process local stats tracking and real time RPC delta accumulation in a scalable
manner that allows for real time aggregate querying and process lifecycle
reports. This took several versions to find a scalable and robust approach that
induced minimal impact on maximum system throughput. Now that the framework is
in place, it can be extended to track any further desired process local uses of
couch_stats:increment_counter. That said, the currently selected set of stats
to track was heavily influenced by the challenges in reotractively understanding
the quantity of resources induced by a query like /db/_changes?since=$SEQ, or
similarly, /db/_find.

CSRT started as an extension of the Mango execution stats logic to _changes
feeds to get proper visibility into quantity of docs read and filtered per
changes request, but then the focus inverted with the realization that we should
instead use the existing stats tracking mechanisms that have already been deemed
critical information to track, which then also allows for the real time tracking
and aggregate query capabilities. The Mango execution stats can be ported into
CSRT itself and just become one subset of the stats tracked as a whole, and
similarly, any additional desired stats tracking can be easily added and will
be picked up in the RPC deltas and process lifetime reports.

CSRT Config Keys

-define(CSRT, "csrt").

config:get("csrt").

Primary CSRT config namespace: contains core settings for enabling different
layers of functionality in CSRT, along with global config settings for limiting
data volume generation.

-define(CSRT_MATCHERS_ENABLED, "csrt_logger.matchers_enabled").

config:get("csrt_logger.matchers_enabled").

Config toggles for enabling specific builtin logger matchers, see the dedicated
section below on # CSRT Default Matchers.

-define(CSRT_MATCHERS_THRESHOLD, "csrt_logger.matchers_threshold").

config:get("csrt_logger.matchers_threshold").

Config settings for defining the primary Threshold value of the builtin logger
matchers, see the dedicated section below on # CSRT Default Matchers.

-define(CSRT_MATCHERS_DBNAMES, "csrt_logger.dbnames_io").

config:get("csrt_logger.matchers_enabled").

Config section for setting $db_name = $threshold resulting in instantiating a
"dbname_io" logger matcher for each $db_name that will generate a CSRT
lifecycle report for any contexts that that induced more operations on any one
field of ioq_calls|get_kv_node|get_kp_node|docs_read|rows_read that is greater
than $threshold and is on database $db_name.

This is basically a simple matcher for finding heavy IO requests on a particular
database, in a manner amenable to key/value pair specifications in this .ini
file until a more sophisticated declarative model exists. In particular, it's
not easy to sequentially generate matchspecs by way ets:fun2ms/1, and so an
alternative mechanism for either dynamically assembling an #rctx{} to match
against or generating the raw matchspecs themselves is warranted.

-define(CSRT_INIT_P, "csrt.init_p").

config:get("csrt.init_p").

Config toggles for tracking counters on spawning of RPC fabric_rpc workers by
way of rexi_server:init_p. This allows us to conditionally enable new metrics
for the desired RPC operations in an expandable manner, without having to add
new stats for every single potential RPC operation. These are for the individual
metrics to track, the feature is enabled by way of the config toggle
config:get(?CSRT, "enable_init_p"), and these configs can left alone for the
most part until new operations are tracked.

CSRT Code Markers

-define(CSRT_ETS, csrt_server).

This is the reference to the CSRT ets table, it's managed by csrt_server so
that's where the name originates from.

-define(MATCHERS_KEY, {csrt_logger, all_csrt_matchers}).

This marker is where the active matchers are written to in persisten_term for
concurrently and parallelly and accessing the logger matchers in the CSRT
tracker processes for lifecycle reporting.

CSRT Process Dictionary Markers

-define(PID_REF, {csrt, pid_ref}).

This marker is for the core storing the core PidRef identifier. The key idea
here is that a lifecycle is a context lifecycle is contained to within the given
PidRef, meaning that a Pid can instantiate different CSRT lifecycles and
pass those to different workers.

This is specifically necessary for long running processes that need to handle
many CSRT context lifecycles over the course of that individual process's
lifecycle independent. In practice, this is immediately needed for the actual
coordinator lifecycle tracking, as chttpd uses a worker pool of http request
handlers that can be re-used, so we need a way to create a CSRT lifecycle
corresponding to the given request currently being serviced. This is also
intended to be used in other long running processes, like IOQ or couch_js pids
such that we can track the specific context inducing the operations on the
couch_file pid or indexer or replicator or whatever.

Worker processes have a more clear cut lifecycle, but either style of process
can be exit'ed in a manner that skips the ability to do cleanup operations, so
additionally there's a dedicated tracker process spawned to monitor the process
that induced the CSRT context such that we can do the dynamic logger matching
directly in these tracker processes and also we can properly cleanup the ets
entries even if the Pid crashes.

-define(TRACKER_PID, {csrt, tracker}).

A handle to the spawned tracker process that does cleanup and logger matching
reprots at the end of the process lifecycle. We store a reference to the tracker
pid so that for explicit context destruction, like in chttpd workers after a
request has been serviced, we can update stop the tracker and perform the
expected cleanup directly.

-define(DELTA_TA, {csrt, delta_ta}).

This stores our last delta snapshot to track progress since the last incremental
streaming of stats back to the coordinator process. This will be updated after
the next delta is made with the latest value. Eg this stores T0 so we can do
T1 = get_resource() make_delta(T0, T1) and then we save T1 as the new T0
for use in our next delta.

-define(LAST_UPDATED, {csrt, last_updated}).

This stores the integer corresponding to the erlang:monotonic_time() value of
the most recent updated_at value. Basically this lets us utilize a pdict
value to be able to turn update_at tracking into an incremental operation that
can be chained in the existing atomic ets:update_counter and
ets:update_element calls.

The issue being that our updates are of the form +2 to ioq_calls for $pid_ref,
which ets does atomically in a guaranteed atomic and isolated manner. The
strict use of the atomic operations for tracking these values is why this
system works effeciently at scale. This means that we can increment counters on
all of the stats counter fields in a batch, very quickly, but for tracking
updated_at timestamps we'd need to either do an extra ets call to get the last
updated_at value, or do an extra ets call to ets:update_element to set the
updated_at value to csrt_util:tnow(). The core problem with this is that the
batch inc operation is essentially the only write operation performed after the
initial context setting of dbname/handler/etc; this means that we'd literally
double the number of ets calls induced to track CSRT updates, just for tracking
the updated_at. So instead, we rely on the fact that the local process
corresponding to $pid_ref is the only process doing updates so we know the
last updated_at value will be the last time this process updated the data. So
we track that value in the pdict and then take a delta between tnow() and
updated_at, and then updated_at becomes a value we can sneak into the other
integer counter updates we're already performing!

Primary Config Toggles

CSRT (?CSRT="csrt") Config Settings

config:get(?CSRT, "enable", false).

Core enablement toggle for CSRT, defaults to false. Enabling this setting
intiates local CSRT stats collection as well as shipping deltas in RPC
responses to accumulate in the coordinator.

This does not trigger the new RPC spawn metrics, and it does not enable
reporting for any of the rctx types.

NOTE: you MUST have all nodes in the cluster running a CSRT aware CouchDB
before you enable it on any node, otherwise the old version nodes won't know
how to handle the new RPC formats including an embedded Delta payload.

config:get(?CSRT, "enable_init_p", false).

Enablement of tracking new metric counters for different fabric_rpc operations
types to track spawn rates of RPC work induced across the cluster. There is
corresponding config lookups into the ?CSRT_INIT_P namespace for keys of the
form: atom_to_list(Mod) ++ "__" atom_to_list(Fun), eg "fabric_rpc__open_doc"
for enabling the specific RPC endpoints.

However, those individual settings can be ignored and this top level config
toggle is what should be used in general, as the function specific config
toggles predominantly exist to enable tracking a subet of total RPC operations
in the cluster, and new endpoints can be added here.

config:get(?CSRT, "enable_reporting", false).

This is the primary toggle for enabling CSRT process lifetime reports containing
detailed information about the quantity of work induced by the given
request/worker/etc. This is the top level toggle for enabling any reporting,
and there also exists config:get(?CSRT, "enable_rpc_reporting", false). to
disable the reporting of any individual RPC workers, leaving the coordinator
responsible of generating a report with the accumulated deltas.

config:get(?CSRT, "enable_rpc_reporting", false).

This enables the possibility of RPC workers generating reports. They still need
to hit the configured thresholds to induce a report, but this will generate CSRT
process lifetime reports for individual RPC workers that trigger the configured
logger thresholds. This allows for quantifying per node resource usage when
desired, as otherwise the reports are at the http request level and don't
provide per node stats.

The key idea here is that having RPC level CSRT process lifetime reporting is
incredibly useful, but can also generate large quantities of data. For example,
a view query on a Q=64 database will stream results from 64 shard replicas,
resulting in at least 64 RPC reports, plus any that might have been generated
from RPC workers that "lost" the race for shard replica. This is very useful,
but a lot of data given the verbose nature of funneling it through the RSyslog
reports, however, the ability to write directly to something like ClickHouse or
another columnar store would be great.

Until there's an efficient storage mechanism to stream the results to, the
rsyslog entries work great and are very practical, but care must be taken to
not generate too much data for aggregate queries as they generate at least Qx
more report than an individual report per http request from the coordinator.
This setting exists as a way to either a) utilize the logger matcher configured
thresholds to allow for any rctx's to be recorded when they induce heavy
operations, either Coordinator or RPC worker; or b) to only log workloads at
the coordinator level.

NOTE: this setting exists because we lack an expressive enough config
declaration to easily chain the matchspec constructions as ets:fun2ms/1 is a
special compile time parse transform macro that requires the fully definition to
be specified directly, it cannot be iteractively constructed. That said, you
can register matchers through remsh with more specific and fine grained
pattern matching, and a more expressive system for defining matchers are being
explored.

config:get_boolean(?CSRT, "should_truncate_reports", true)

Enables truncation of the CSRT process lifetime reports to not include any
fields that are zero at the end of process lifetime, eg don't include
js_filter=0 in the report if the request did not induce Javascript filtering.

This can be disabled if you really care about consistent fields in the report
logs, but this is a log space saving mechanism, similar to disabling RPC
reporting by default, as its a simple way to reduce overall volume

config:get(?CSRT, "randomize_testing", true).

This is a make eunit only feature toggle that will induce randomness into the
cluster's csrt:is_enabled() state, specifically to utilize the test suite to
exercise edge case scenarios and failures when CSRT is only conditionally
enabled, ensuring that it gracefuly and robustly handles errors without fallout
to the underlying http clients.

The idea here is to introduce randomness into whether CSRT is enabled across all
the nodes to simulate clusters with heterogeneous CSRT enablement and also to
ensure that CSRT works properly when toggled on/off wihout causing any
unexpected fallout to the client requests.

This is a config toggle specifically so that the actual CSRT tests can disable
it for making accurate assertions about resource usage traacking, and is not
intended to be used directly.

config:get_integer(?CSRT, "query_limit", ?QUERY_LIMIT)

Limit the quantity of rows that can be loaded in an http query.

CSRT_INIT_P (?CSRT_INIT_P="csrt.init_p") Config Settings

config:get(?CSRT_INIT_P, ModFunName, false).

These config toggles exist to conditionaly enable additional tracking of RPC
endpoints of interest, but rather it's a way to selectively enable tracking for
a subset of RPC operations, in a way we can extend later to add more. The
ModFunName is of the form atom_to_list(Mod) ++ "__" atom_to_list(Fun), eg
"fabric_rpc__open_doc", and right now, only exists for fabric_rpc modules.

NOTE: this is a bit awkward and isn't meant to be used directly, instead,
utilize config:set(?CSRT, "enable_init_p", "true"). to enable or disable these
as a whole.

The current set of operations, as copied in from default.ini

[csrt.init_p]
fabric_rpc__all_docs = true
fabric_rpc__changes = true
fabric_rpc__get_all_security = true
fabric_rpc__map_view = true
fabric_rpc__open_doc = true
fabric_rpc__open_shard = true
fabric_rpc__reduce_view = true
fabric_rpc__update_docs = true

CSRT Logger Matcher Enablement and Thresholds

There are currently six builtin default loggers designed to make it easy to do
filtering on heavy resource usage inducing and long running requests. These are
designed as a simple baseline of useful matchers, declared in a manner amenable
to default.ini based constructs. More expressive matcher declarations are
being explored, and matchers of arbitrary complexity can be registered directly
through remsh. The default matchers are all designed around an integer config
Threshold that triggers on a specific field, eg docs read, or on a delta of
fields for long requests and changes requests that process many rows but return
few.

The current default matchers are:

  • docs_read: match all requests reading more than N docs
  • rows_read: match all requests reading more than N rows
  • docs_written: match all requests writing more than N docs
  • long_reqs: match all requests lasting more than N milliseconds
  • changes_processed: match all changes requests that returned at least N rows
    less than was necessarily loaded to complete the request (eg find heavy
    filtered changes requests reading many rows but returning few).
  • ioq_calls: match all requests inducing more than N ioq_calls

Each of the default matchers has an enablement setting in
config:get(?CSRT_MATCHERS_ENABLED, Name) for toggling enablement of it, and a
corresponding threshold value setting in config:get(?CSRT_MATCHERS_THRESHOLD, Name) that is an integer value corresponding to the specific nature of that
matcher.

CSRT Logger Matcher Enablement (?CSRT_MATCHERS_ENABLED)

-define(CSRT_MATCHERS_THRESHOLD, "csrt_logger.matchers_enabled").

config:get_boolean(?CSRT_MATCHERS_ENABLED, "docs_read", false)

Enable the docs_read builtin matcher, with a default Threshold=1000, such
that any request that reads more than Threshold docs will generate a CSRT
process lifetime report with a summary of its resouce consumption.

This is different from the rows_read filter in that a view with ?limit=1000
will read 1000 rows, but the same request with ?include_docs=true will also
induce an additional 1000 docs read.

config:get_boolean(?CSRT_MATCHERS_ENABLED, "rows_read", false)

Enable the rows_read builtin matcher, with a default Threshold=1000, such
that any request that reads more than Threshold rows will generate a CSRT
process lifetime report with a summary of its resouce consumption.

This is different from the docs_read filter so that we can distinguish between
heavy view requests with lots of rows or heavy requests with lots of docs.

config:get_boolean(?CSRT_MATCHERS_ENABLED, "docs_written", false)

Enable the docs_written builtin matcher, with a default Threshold=500, such
that any request that writtens more than Threshold docs will generate a CSRT
process lifetime report with a summary of its resouce consumption.

config:get_boolean(?CSRT_MATCHERS_ENABLED, "ioq_calls", false)

Enable the ioq_calls builtin matcher, with a default Threshold=10000, such
that any request that induces more than Threshold IOQ calls will generate a
CSRT process lifetime report with a summary of its resouce consumption.

config:get_boolean(?CSRT_MATCHERS_ENABLED, "long_reqs", false)

Enable the long_reqs builtin matcher, with a default Threshold=60000, such
that any request where the the last CSRT rctx updated_at timestamp is at least
Threshold milliseconds grather than the started_at timestamp will generate a
CSRT process lifetime report with a summary of its resource consumption.

CSRT Logger Matcher Threshold (?CSRT_MATCHERS_THRESHOLD)

-define(CSRT_MATCHERS_THRESHOLD, "csrt_logger.matchers_threshold").

config:get_integer(?CSRT_MATCHERS_THRESHOLD, "docs_read", 1000)

Threshold for docs_read logger matcher, defaults to 1000 docs read.

config:get_integer(?CSRT_MATCHERS_THRESHOLD, "rows_read", 1000)

Threshold for rows_read logger matcher, defaults to 1000 rows read.

config:get_integer(?CSRT_MATCHERS_THRESHOLD, "docs_written", 500)

Threshold for docs_written logger matcher, defaults to 500 docs written.

config:get_integer(?CSRT_MATCHERS_THRESHOLD, "ioq_calls", 10000)

Threshold for ioq_calls logger matcher, defaults to 10000 IOQ calls made.

config:get_integer(?CSRT_MATCHERS_THRESHOLD, "long_reqs", 60000)

Threshold for long_reqs logger matcher, defaults to 60000 milliseconds.

Core CSRT API

The csrt(.erl) module is the primary entry point into CSRT, containing API
functionality for tracking the lifecycle of processes, inducing metric tracking
over that lifecycle, and also a variety of functions for aggregate querying.

It's worth noting that the CSRT context tracking functions are specifically
designed to not throw and be safe in the event of unexpected CSRT failures or
edge cases. The aggregate query API has some callers that will actually throw,
but aside from this core CSRT operations will not bubble up exceptions, and will
either return the error value, or catch the error and move on rather than
chaining further errors.

PidRef API

These are functions are CRUD operations around creating and storing the CSRT
PidRef handle.

-export([
    destroy_pid_ref/0,
    destroy_pid_ref/1,
    create_pid_ref/0,
    get_pid_ref/0,
    get_pid_ref/1,
    set_pid_ref/1
]).

Context Lifecycle API

These are the CRUD functions for handling a CSRT context lifecycle, where a
lifecycle context is created in a chttpd coordinator process by way of
csrt:create_coordinator_context/2, or in rexi_server:init_p by way of
csrt:create_worker_context/3. Additional functions are exposed for setting
context specific info like username/dbname/handler. get_resource fetches the
context being tracked corresponding to the given PidRef.

-export([
    create_context/2,
    create_coordinator_context/2,
    create_worker_context/3,
    destroy_context/0,
    destroy_context/1,
    get_resource/0,
    get_resource/1,
    set_context_dbname/1,
    set_context_dbname/2,
    set_context_handler_fun/1,
    set_context_handler_fun/2,
    set_context_username/1,
    set_context_username/2
]).

Public API

The "Public" or miscellaneous API for lack of a better name. These are various
functions exposed for wider use and/or testing purposes.

-export([
    clear_pdict_markers/0,
    do_report/2,
    is_enabled/0,
    is_enabled_init_p/0,
    maybe_report/2,
    to_json/1
]).

Stats Collection API

This is the stats collection API utilized by way of
couch_stats:increment_counter to do local process tracking, and also in rexi
to adding and extracting delta contexts and then accumulating those values.

NOTE: make_delta/0 is a "destructive" operation that will induce a new delta
by way of the last local pdict's rctx delta snapshot, and then update to the
most recent version. Two individual rctx snapshots for a PidRef can safely
generate an actual delta by way of csrt_util:rctx_delta/2.

-export([
    accumulate_delta/1,
    add_delta/2,
    docs_written/1,
    extract_delta/1,
    get_delta/0,
    inc/1,
    inc/2,
    ioq_called/0,
    js_filtered/1,
    make_delta/0,
    rctx_delta/2,
    maybe_add_delta/1,
    maybe_add_delta/2,
    maybe_inc/2,
    should_track_init_p/1
]).

TODO: RPC/QUERY DOCS

%% RPC API
-export([
    rpc/2,
    call/1
]).

%% Aggregate Query API
-export([
    active/0,
    active/1,
    active_coordinators/0,
    active_coordinators/1,
    active_workers/0,
    active_workers/1,
    count_by/1,
    find_by_nonce/1,
    find_by_pid/1,
    find_by_pidref/1,
    find_workers_by_pidref/1,
    group_by/2,
    group_by/3,
    query/1,
    query/2,
    query_matcher/1,
    query_matcher/2,
    sorted/1,
    sorted_by/1,
    sorted_by/2,
    sorted_by/3
]).

Recon API Ports of https://github.com/ferd/recon/releases/tag/2.5.6

This is a "port" of recon:proc_window to csrt:proc_window, allowing for
proc_window style aggregations/sorting/filtering but with the stats fields
collected by CSRT! This is also a direct port of recon:proc_window in that it
utilizes the same underlying logic and effecient internal data structures as
recon:proc_window, but rather only changes the Sample function:

%% This is a recon:proc_window/3 [1] port with the same core logic but
%% recon_lib:proc_attrs/1 replaced with pid_ref_attrs/1, and returning on
%% pid_ref() rather than pid().
%% [1] https://github.com/ferd/recon/blob/c2a76855be3a226a3148c0dfc21ce000b6186ef8/src/recon.erl#L268-L300
-spec proc_window(AttrName, Num, Time) -> term() | throw(any()) when
    AttrName :: rctx_field(), Num :: non_neg_integer(), Time :: pos_integer().
proc_window(AttrName, Num, Time) ->
    Sample = fun() -> pid_ref_attrs(AttrName) end,
    {First, Last} = recon_lib:sample(Time, Sample),
    recon_lib:sublist_top_n_attrs(recon_lib:sliding_window(First, Last), Num).

In particular, our change is Sample = fun() -> pid_ref_attrs(AttrName) end,,
and in fact, if recon upstream parameterized the option of AttrName or
SampleFunction, this could be reimplemented as:

%% csrt:proc_window
proc_window(AttrName, Num, Time) ->
    Sample = fun() -> pid_ref_attrs(AttrName) end,
    recon:proc_window(Sample, Num, Time).

This implementation is being highlighted here because recon:proc_window/3 is
battle hardened and recon_lib:sliding_window uses an effecient internal data
structure for storing the two samples that has been proven to work in production
systems with millions of active processes, so swapping the Sample function
with a CSRT version allows us to utilize the production grade recon
functionality, but extended out to the particular CouchDB statistics we're
esepecially interested in.

And on a fun note: any further stats tracking fields added to CSRT tracking will
automatically work with this too.

-export([
    pid_ref_attrs/1,
    pid_ref_matchspec/1,
    proc_window/3
]).

Core types and Maybe types

Before we look at the #rctx{} record fields, lets examine the core datatypes
defined by CSRT for use in Dialyzer typespecs. There are more, but these are the
essentials and demonstrate the "maybe" typespec approach utilized in CSRT.

Let's say we have a -type foo() :: #foo{} and -type maybe_foo() :: foo() | undefined, we then can construct functions of the form -spec get_foo(id()) -> maybe_foo() and then we can use Dialyzer to statically assert all callers of
get_foo/1 handle the maybe_foo() data type rather than just foo() and
ensure that all subsequent callers do as well.

This approach of -spec maybe_<Type> :: <Type> | undefined is utilized
throughout CSRT and has greatly aided in the development, refactoring, and
static analysis of this system. Here's a useful snippet for running Dialyzer
while hacking on CSRT:

make && time make dialyze apps=couch_stats

-type pid_ref() :: {pid(), reference()}.
-type maybe_pid_ref() :: pid_ref() | undefined.

-type coordinator_rctx() :: #rctx{type :: coordinator()}.
-type rpc_worker_rctx() :: #rctx{type :: rpc_worker()}.
-type rctx() :: #rctx{} | coordinator_rctx() | rpc_worker_rctx().
-type rctxs() :: [#rctx{}] | [].
-type maybe_rctx() :: rctx() | undefined.

Above we have the core pid_ref() data type, which is just a tuple with a
pid() and a reference(), and naturally, maybe_pid_ref() handles the
optional presence of a pid_ref(), allowing for our APIs like
csrt:get_resource(maybe_pidref()) to handle ambiguity of the presence of a
pid_ref().

We define our core rctx() data type as an empty #rctx{}, or the more
specific coordinator_rctx() or rpc_worker_rctx() such that we can be
specific about the rctx() type in functions that need to distinguish. And then
as expected, we have the notion of maybe_rctx().

#rctx{}

This is the core data structure utilized to track a CSRT context for a
coordinator or rpc_worker process, represented by the #rctx{} record, and
stored in the ?CSRT_ETS table keyed on {keypos, #rctx.pid_ref}.

The Metadata fields store labeling data for the given process being tracked,
such as started_at and updated_at timings, the primary pid_ref id key, the
type of the process context, and some additional information like username,
dbname, and the nonce of the coordinator request.

The Stats Counters fields are non_neg_integer() monotonically increasing
counters corresponding to the couch_stats metrics counters we're interested in
tracking at a process level cardinality. The use of these purely integer counter
fields represented by a record represented in an ets table is the cornerstone of
CSRT and why its able to operate at high throughput and high concurrency, as
ets:update_counter/{3,4} take increment operations to be performed atomically
and in isolation, in a manner in which does not require fetching and loading the
data directly. We then take care to batch the accumulation of delta updates into
a single update_counter call and even sneak in the updated_at tracking as a
integer counter update without inducing an extra ets call.

NOTE: the typespec's for these fields include '_' atoms as possible types as
that is the matchspec wildcard any of the fields can be set to when using an
existing #rctx{} record to search with.

-record(rctx, {
    %% Metadata
    started_at = csrt_util:tnow() :: integer() | '_',
    %% NOTE: updated_at must be after started_at to preserve time congruity
    updated_at = csrt_util:tnow() :: integer() | '_',
    pid_ref :: maybe_pid_ref() | {'_', '_'} | '_',
    nonce :: nonce() | undefined | '_',
    type :: rctx_type() | undefined | '_',
    dbname :: dbname() | undefined | '_',
    username :: username() | undefined | '_',

    %% Stats Counters
    db_open = 0 :: non_neg_integer() | '_',
    docs_read = 0 :: non_neg_integer() | '_',
    docs_written = 0 :: non_neg_integer() | '_',
    rows_read = 0 :: non_neg_integer() | '_',
    changes_returned = 0 :: non_neg_integer() | '_',
    ioq_calls = 0 :: non_neg_integer() | '_',
    js_filter = 0 :: non_neg_integer() | '_',
    js_filtered_docs = 0 :: non_neg_integer() | '_',
    get_kv_node = 0 :: non_neg_integer() | '_',
    get_kp_node = 0 :: non_neg_integer() | '_'
    %% "Example to extend CSRT"
    %%write_kv_node = 0 :: non_neg_integer() | '_',
    %%write_kp_node = 0 :: non_neg_integer() | '_'
}).

Metadata

We use csrt_util:tnow() for time tracking, which is a native format
erlang:monotonic_time() integer, which, noteably, can be and is often a
negative value. You must either take a delta or convert the time to get into a
useable format, as one might suspect by the use of native.

We make use of erlang:mononotic_time/0 as per the recommendation in
https://www.erlang.org/doc/apps/erts/time_correction.html#how-to-work-with-the-new-api
for the suggested way to Measure Elasped Time, as quoted:

Take time stamps with erlang:monotonic_time/0 and calculate the time difference
using ordinary subtraction. The result is in native time unit. If you want to
convert the result to another time unit, you can use erlang:convert_time_unit/3.

An easier way to do this is to use erlang:monotonic_time/1 with the desired time
unit. However, you can then lose accuracy and precision.

So our csrt_util:tnow/0 is implemented as the following, and we store
timestamps in native format as long as possible to avoid precision loss at
higher units of time, eg 300 microseconds is zero milliseconds.

-spec tnow() -> integer().
tnow() ->
    erlang:monotonic_time().

We store timestamps in the node's local erlang representation of time,
specifically to be able to effeciently do time deltas, and then we track time
deltas from the local node's perspective to not send timestamps across the wire.
We then utilize calendar:system_time_to_rfc3339 to convert the local node's
native time representation to its corresponding time format when we generate the
process life cycle reports or send an http response.

NOTE: because we do an inline definition and assignment of the
#rctx.started_at and #rctx.updated_at fields to csrt_util:tnow(), we
must declare #rctx.updated_at after #rctx.started_at to avoid
fundamental time incongruenties.

#rctx.started_at = csrt_util:tnow() :: integer() | '_',

A static value corresponding to the local node's Erlang monotonic_time at which
this context was created.

#rctx.updated_at = csrt_util:tnow() :: integer() | '_',

A dynamic value corresponding to the local node's Erlang monotonic_time at which
this context was updated. Note: unlike #rctx.started_at, this value will
update over time, and in the process lifecycle reports the #rctx.updated_at
value corresponds to the point at which the context was destroyed, allowing for
calculation of the total duration of the request/context.

#rctx.pid_ref :: maybe_pid_ref() | {'', ''} | '_',

The primary identifier used to track the resources consumed by a given pid()
for a specific context identified with a make_ref(), and combined together as
unit as a given pid(), eg the chttpd worker pool, can have many contexts
over time.

#rctx.nonce :: nonce() | undefined | '_',

The Nonce value of the http request being serviced by the coordinator_rctx()
used as the primary grouping identifier of workers across the cluster, as the
Nonce is funneled through rexi_server.

#rctx.type :: rctx_type() | undefined | '_',

A subtype classifier for the #rctx{} contexts, right now only supporting
#rpc_worker{} and #coordinator{}, but CSRT was designed to accomodate
additional context types like #view_indexer{}, #search_indexer{},
#replicator{}, #compactor{}, #etc{}.

#rctx.dbname :: dbname() | undefined | '_',

The database name, filled in at some point after the initial context creation by
way of csrt:set_context_dbname/{1,2}.

#rctx.username :: username() | undefined | '_',

The requester's username, filled in at some point after the initial context
creation by way of csrt:set_context_username/{1,2}.

Stats Counters

All of these stats counters are stricly non_neg_integer() counter values that
are monotonically increasing, as we only induce positive counter increment calls
in CSRT. Not all of these values will be nonzero, eg if the context doesn't
induce Javascript filtering of documents, it won't inc the #rctx.js_filter
field. The "should_truncate_reports" config value described in this document
will conditionally exclude the zero valued fields from being included in the
process life cycle report.

#rctx.db_open = 0 :: non_neg_integer() | '_',

Tracking `couch_stats:increment_counter([couchdb, couch_server, open])

The number of couch_server:open/2 invocations induced by this context.

#rctx.docs_read = 0 :: non_neg_integer() | '_',

Tracking `couch_stats:increment_counter([couchdb, database_reads])

The number of couch_db:open_doc/3 invocations induced by this context.

#rctx.docs_written = 0 :: non_neg_integer() | '_',

A phony metric counting docs written by the context, induced by
csrt:docs_written(length(Docs0)), in fabric_rpc:update_docs/3 as a way to
count the magnitude of docs written, as the actual document writes happen in the
#db.main_pid couch_db_updater pid and subprocess tracking is not yet
supported in CSRT.

This can be replaced with direct counting once passthrough contexts work.

#rctx.rows_read = 0 :: non_neg_integer() | '_',

Tracking couch_stats:increment_counter([fabric_rpc, changes, processed]) also Tracking couch_stats:increment_counter([fabric_rpc, view, rows_read])

A value tracking multiple possible metrics corresponding to rows streamed in
aggregate operations. This is used for view_rows/changes_rows/all_docs/etc.

#rctx.changes_returned = 0 :: non_neg_integer() | '_',

The number of fabric_rpc:changes_row/2 invocations induced by this context,
specifically tracking the number of changes rows streamed back to the client
requeest, allowing for distinguishing between the number of changes processed to
fulfill a request versus the number actually returned in the http response.

#rctx.ioq_calls = 0 :: non_neg_integer() | '_',

A phony metric counting invocations of ioq:call/3 induced by this context. As
with #rctx.docs_written, we need a proxy metric to reperesent these calls
until CSRT context passing is supported so that the ioq_server pid and return
its own delta back to the worker pid.

#rctx.js_filter = 0 :: non_neg_integer() | '_',

A phony metric counting the number of couch_query_servers:filter_docs_int/5
(eg ddoc_prompt) invocations induced by this context. This is called by way of
csrt:js_filtered(length(JsonDocs)) which both increments js_filter by 1, and
js_filtered_docs by the length of the docs so we can track magnitude of docs
and doc revs being filtered.

#rctx.js_filtered_docs = 0 :: non_neg_integer() | '_',

A phony metric counting the quantity of documents filtered by way of
couch_query_servers:filter_docs_int/5 (eg ddoc_prompt) invocations induced by
this context. This is called by way of csrt:js_filtered(length(JsonDocs))
which both increments #rctx.js_filter by 1, and #rctx.js_filtered_docs by
the length of the docs so we can track magnitude of docs and doc revs being
filtered.

#rctx.get_kv_node = 0 :: non_neg_integer() | '_',

This metric tracks the number of invocations to couch_btree:get_node/2 in
which the NodeType returned by couch_file:pread_term/2 is kv_node, instead
of kp_node.

This provides a mechanism to quantify the impact of document count and document
size as those values become larger in the logarithmic complexity btree
algorithms. size on the logarithmic complexity btree algorithms as the database
btrees grow.

#rctx.get_kp_node = 0 :: non_neg_integer() | '_'

This metric tracks the number of invocations to couch_btree:get_node/2 in
which the NodeType returned by couch_file:pread_term/2 is kp_node, instead
of kv_node.

This provides a mechanism to quantify the impact of document count and document
size as those values become larger in the logarithmic complexity btree
algorithms. size on the logarithmic complexity btree algorithms as the database
btrees grow.

%% "Example to extend CSRT"
%%write_kv_node = 0 :: non_neg_integer() | '',
%%write_kp_node = 0 :: non_neg_integer() | '
'

} = R
) when
DbName =:= DbName1 andalso
((IOQ >= Threshold) or (KVN >= Threshold) or (KPN >= Threshold) or (Docs >= Threshold) or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we used andalso probably makes sense to use orelse instead of or?

%% vs `ets:select` taking a `comp_match_spec()` is why our CSRT `matcher()`
%% type_spec funnels around both versions instead of just reference to the
%% compiled spec stored by ETS internally.
case config:get_boolean(?CSRT, "use_query_fold", false) of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How large of difference does each method have compared to another? Instead of keeping both around wonder if there is a middle-ground of querying in smaller batches of limit = 500 and doing a bunch of them in a row until we reach the users's limit. It just seems like a lot of complexity added for a method that might be called once in a while by a single operator debugging or reporting a large io usage or investigating some performance issues?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 36feba6. Although I went with 5000. The row tuples are small. I think we should be ok with bigger batches.

Comment on lines 382 to 401
update_topK(_Key, Value, #topK{size = S, capacity = S, min = Min} = Top) when Value < Min ->
Top#topK{min = Value};
% when we are at capacity evict smallest value
update_topK(Key, Value, #topK{size = S, capacity = S, max = Max, seq = Seq} = Top) when
Value > Max
->
% capacity cannot be less than 1, so we can avoid handling the case when Seq is empty
[_ | Truncated] = Seq,
Top#topK{max = Value, seq = lists:keysort(2, [{Key, Value} | Truncated])};
% when we are at capacity and value is in between min and max evict smallest value
update_topK(Key, Value, #topK{size = S, capacity = S, seq = Seq} = Top) ->
% capacity cannot be less than 1, so we can avoid handling the case when Seq is empty
[_ | Truncated] = Seq,
Top#topK{seq = lists:keysort(2, [{Key, Value} | Truncated])};
update_topK(Key, Value, #topK{size = S, min = Min, seq = Seq} = Top) when Value < Min ->
Top#topK{size = S + 1, min = Value, seq = lists:keysort(2, [{Key, Value} | Seq])};
update_topK(Key, Value, #topK{size = S, max = Max, seq = Seq} = Top) when Value > Max ->
Top#topK{size = S + 1, max = Value, seq = lists:keysort(2, [{Key, Value} | Seq])};
update_topK(Key, Value, #topK{size = S, seq = Seq} = Top) ->
Top#topK{size = S + 1, seq = lists:keysort(2, [{Key, Value} | Seq])}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: (maybe overkill here, too) gb_trees could be used perhaps as it has take_largest/1 and take_smallest/1 functions and automatically keeps the entries sorted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maximum length of the list is equal to capacity, which is equal to number of entries we want to return. This meant to be small (in order of 5-20, maybe 100 at maximum). For small number of elements the list would probably be faster. However I like an idea to use gb_tree. Especially because I wouldn't have to maintain size.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gb_trees doesn't really fit here. Because it is getting quite expensive to modify on each update. Because each value would have to contain a list of keys corresponding to that value. The update would look something like:

update_topK(Key, Value, #topK{size = S, capacity = S, max = Max, seq = Seq} = Top) when
    Value > Max
->
    {[_ | RestOfKeys], SmallestValue, Seq1} = gb_trees:take_smallest(Seq),
    Seq2 = gb_trees:insert(SmallestValue, RestOfKeys, Seq1),
    case gb_trees:take_any(Value, Seq2) of
      {Keys, Seq3} ->
        Top#topK{max = Value, seq = gb_trees:insert(Value, [Key | Keys], Seq3)};
      error ->
        Top#topK{max = Value, seq = gb_trees:insert(Value, [Key], Seq2)};
    end;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder why we would need the have a list of keys for each value separately, maybe we can keep them together? Maybe something like this could work:

topk(#{} = Map, T) when is_integer(T), T > 0 ->
    Fun = fun(K, V, Set0) ->
        Set1 = gb_sets:add({V, K}, Set0),
        case gb_sets:size(Set1) > T of
            true -> element(2, gb_sets:take_smallest(Set1));
            false -> Set1
        end
     end,
    Set = maps:fold(Fun, gb_sets:empty(), Map),
    lists:reverse([{K, V} || {V, K} <- gb_sets:to_list(Set)]).
> Map = #{a => 1.0, b => 1.0, c => 0.5, d => 0.75, e => 1.2, f => 1.2, g => 1.1, h => 3.8}.

> lists:foreach(fun(K) ->
    io:format(" topK ~p : ~p ~n", [K, topk:topk(Map, K)])
   end, lists:seq(1, 9)).

 topK 1 : [{h,3.8}]
 topK 2 : [{h,3.8},{f,1.2}]
 topK 3 : [{h,3.8},{f,1.2},{e,1.2}]
 topK 4 : [{h,3.8},{f,1.2},{e,1.2},{g,1.1}]
 topK 5 : [{h,3.8},{f,1.2},{e,1.2},{g,1.1},{b,1.0}]
 topK 6 : [{h,3.8},{f,1.2},{e,1.2},{g,1.1},{b,1.0},{a,1.0}]
 topK 7 : [{h,3.8},{f,1.2},{e,1.2},{g,1.1},{b,1.0},{a,1.0},{d,0.75}]
 topK 8 : [{h,3.8},{f,1.2},{e,1.2},{g,1.1},{b,1.0},{a,1.0},{d,0.75},{c,0.5}]
 topK 9 : [{h,3.8},{f,1.2},{e,1.2},{g,1.1},{b,1.0},{a,1.0},{d,0.75},{c,0.5}]

It should be a bit smaller and have better complexity O(n log k) vs previous O (n * k log k) (I think?) since we're not resorting the top k list on every insertion.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Nick. I did switch to gb_sets in c5b9fe0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even skip the whole update_topK function and the record and just use the topk from above as is.

The update_topK skips the update when Value < Min, we would have many small values so we could safe a lot of updates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's not worth it for this case. It's not in the main data path, it's an API that will be called very rarely by an operator when investing an issue so it's worth keeping the code simple and compact more than adding extra complexity for possible performance gains. If this is a large bottleneck in the future we can always add more optimizations later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

topk(#{} = Map, T) when is_integer(T), T > 0 ->
    Fun = fun(K, V, Set0) ->
        case gb_sets:size(Set0) >= T of
            true ->
                case V =< element(1, gb_sets:smallest(Set0)) of
                    true -> Set0;
                    false -> element(2, gb_sets:take_smallest(gb_sets:add({V, K}, Set0)))
                end;
            false ->
                gb_sets:add({V, K}, Set0)
        end
    end,
    Set = maps:fold(Fun, gb_sets:empty(), Map),
    lists:reverse([{K, V} || {V, K} <- gb_sets:to_list(Set)]).

This is a more optimized version ^

The savings when getting top 100 items with various randomly generated maps of kv:

  • 11 -> 3 msec for 10k kvs
  • 95 -> 22 msec for 100k kvs
  • 862 -> 166 msec for 1m kvs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 27d4123

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Nick!!

Copy link
Contributor

@nickva nickva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found some time to take another look. A lot of nice improvements since last time! Thanks for the doc updates and lot of fixups!

I had noticed a few more minor nits and some suggestions for simplification and made some comments in-line (not as part of the review itself).

I also tried it out locally in my dev cluster but couldn't get anything to show up in the logs.

I used these config settings:


([email protected])2> config:set("csrt", "enable", "true").
ok

([email protected])3> config:set("csrt", "enable_init_p", "true").
ok

([email protected])4> config:set("csrt", "enable_reporting", "true").
ok

([email protected])5> config:set("csrt_logger.matchers_enabled", "all_coordinators", "true").

ok
([email protected])6> config:set("csrt_logger.matchers_enabled", "docs_read", "true").
ok

([email protected])7> config:set("csrt_logger.matchers_threshold", "docs_read", "50").
ok

([email protected])8> config:set("csrt_logger.matchers_enabled", "changes_processed", "true").
ok

([email protected])9> config:set("csrt_logger.matchers_threshold", "changes_processed", "23").

Then queried a Q=4 100k+ docs db with _all_dbs and _changes every 5 seconds. There should be some report logged based on the thresholds it seems? But I probably misread the config docs somewhere...

The db info:

 http $DB/k6db_000000000000 | jq
{
  "instance_start_time": "1756866209",
  "db_name": "k6db_000000000000",
  "purge_seq": "0-g1AAAACjeJzLYWBgYMlgTmHgz8tPSTV0MDQy1zMAQsMcoARTHguQZDgApP4DQVYiAwGVDRCV-_GpTHIAkkn1BMxLZEiyhyjJAgAURCum",
  "update_seq": "115376-g1AAAAEveJzLYWBgYMlgTmHgz8tPSTV0MDQy1zMAQsMcoARTHguQZDgApP4DQVYGcxIDQ8GtXKAYe6qhoUGKpRGmRgKGNUAM2w817CLYMEODpMRUC3MSDEtyAJJJ9QhXLQUbZJFoYmJoQIqrEhmS7BGmJECcY2lskGJuiKkjCwCKNEt9",
  "sizes": {
    "file": 24961872,
    "external": 8076750,
    "active": 24208300
  },
  "props": {},
  "doc_del_count": 0,
  "doc_count": 107690,
  "disk_format_version": 8,
  "compact_running": false,
  "cluster": {
    "q": 4,
    "n": 1,
    "w": 1,
    "r": 1
  }
}

Running

 while true; do http $DB/k6db_000000000000/_changes > /dev/null ; sleep 5 ; done

or

while true; do http $DB/k6db_000000000000/_all_docs > /dev/null ; sleep 5 ; done

Logs only show

[notice] 2025-09-03T02:42:05.809427Z [email protected] <0.146.0> -------- config: [csrt] enable set to true for reason nil
[notice] 2025-09-03T02:42:13.140547Z [email protected] <0.146.0> -------- config: [csrt] enable_init_p set to true for reason nil
[notice] 2025-09-03T02:42:21.395497Z [email protected] <0.146.0> -------- config: [csrt] enable_reporting set to true for reason nil
[notice] 2025-09-03T02:42:27.055586Z [email protected] <0.146.0> -------- config: [csrt_logger.matchers_enabled] all_coordinators set to true for reason nil
[notice] 2025-09-03T02:42:34.243541Z [email protected] <0.146.0> -------- config: [csrt_logger.matchers_enabled] docs_read set to true for reason nil
[notice] 2025-09-03T02:42:42.085523Z [email protected] <0.146.0> -------- config: [csrt_logger.matchers_threshold] docs_read set to 50 for reason nil
[notice] 2025-09-03T02:42:47.549438Z [email protected] <0.146.0> -------- config: [csrt_logger.matchers_enabled] changes_processed set to true for reason nil
[notice] 2025-09-03T02:45:06.351838Z [email protected] <0.146.0> -------- config: [csrt_logger.matchers_threshold] changes_processed set to 23 for reason nil
[notice] 2025-09-03T02:45:15.073340Z [email protected] <0.10330.0> df4fe40d07 127.0.0.1:15984 127.0.0.1 adm GET /k6db_000000000000/_all_docs 200 ok 1373
[notice] 2025-09-03T02:45:21.866052Z [email protected] <0.10562.0> 044fdebadd 127.0.0.1:15984 127.0.0.1 adm GET /k6db_000000000000/_all_docs 200 ok 1439
[notice] 2025-09-03T02:45:39.550578Z [email protected] <0.10934.0> 075d287ea8 127.0.0.1:15984 127.0.0.1 adm GET /k6db_000000000000/_changes 200 ok 4171
[notice] 2025-09-03T02:45:58.241967Z [email protected] <0.11532.0> 7ee5dcd278 127.0.0.1:15984 127.0.0.1 adm GET /k6db_000000000000 200 ok 1
[notice] 2025-09-03T03:01:15.408360Z [email protected] <0.34151.0> 0191e19bf5 127.0.0.1:15984 127.0.0.1 adm GET /k6db_000000000000/_changes 200 ok 4457

@chewbranca
Copy link
Contributor Author

Thanks for taking another look, @nickva!

That's odd the reports aren't showing for you, the config settings you listed look okay, and couch_srt_logger has a config:listener to reload the matchers on changes to any of the logger config groups https://github.com/apache/couchdb/blob/couch-stats-resource-tracker-v3-rebase-main/src/couch_srt/src/couch_srt_logger.erl#L588-L598 so that's odd you're not seeing the changes picked up.

The first four commands you did should bootstrap the system into generating reports, eg these settings:

([email protected])2> config:set("csrt", "enable", "true").
ok

([email protected])3> config:set("csrt", "enable_init_p", "true").
ok

([email protected])4> config:set("csrt", "enable_reporting", "true").
ok

([email protected])5> config:set("csrt_logger.matchers_enabled", "all_coordinators", "true").

ok

I just did a fresh clone and with the following diff I'm properly getting reports generated like:

[notice] 2025-09-04T19:43:09.288686Z [email protected] <0.11224.0> 381d115cad localhost:15984 127.0.0.1 adm GET /_all_dbs 200 ok 3
[report] 2025-09-04T19:43:09.288790Z [email protected] <0.11274.0> -------- [csrt-pid-usage-lifetime db_open=5 get_kv_node=2 nonce="381d115cad" pid_ref="<0.11224.0>:#Ref<0.3500684338.1004273667.225747>" rows_read=2 started_at="2025-09-04T19:43:09.292z" type="coordinator-{chttpd_misc:handle_all_dbs_req}:GET:/_all_dbs" updated_at="2025-09-04T19:43:09.295z" username="adm"]

(chewbranca)-(jobs:0)-(/tmp/couchdb)
(! 15428)-> git diff
diff --git a/rel/overlay/etc/default.ini b/rel/overlay/etc/default.ini
index b2ffa87b7..3cfe9d1aa 100644
--- a/rel/overlay/etc/default.ini
+++ b/rel/overlay/etc/default.ini
@@ -1153,9 +1153,9 @@ url = {{nouveau_url}}
 
 ; Couch Stats Resource Tracker (CSRT)
 [csrt]
-;enable = false
-;enable_init_p = false
-;enable_reporting = false
+enable = true
+enable_init_p = true
+enable_reporting = true
 ;enable_rpc_reporting = false
 
 ; Truncate reports to not include zero values for counter fields. This is a
@@ -1223,7 +1223,7 @@ url = {{nouveau_url}}
 ; CSRT default matchers - enablement configuration
 ; The default CSRT loggers can be individually enabled below
 [csrt_logger.matchers_enabled]
-;all_coordinators = false
+all_coordinators = true
 ;all_rpc_workers = false
 ;docs_read = false
 ;rows_read = false

@iilyak
Copy link
Contributor

iilyak commented Sep 16, 2025

@nickva
I also tried it out locally in my dev cluster but couldn't get anything to show up in the logs.

I did repeat the steps you did locally.

([email protected])2> config:set("csrt", "enable", "true").
ok

([email protected])3> config:set("csrt", "enable_init_p", "true").
ok

([email protected])4> config:set("csrt", "enable_reporting", "true").
ok

([email protected])5> config:set("csrt_logger.matchers_enabled", "all_coordinators", "true").

ok
([email protected])6> config:set("csrt_logger.matchers_enabled", "docs_read", "true").
ok

([email protected])7> config:set("csrt_logger.matchers_threshold", "docs_read", "50").
ok

([email protected])8> config:set("csrt_logger.matchers_enabled", "changes_processed", "true").
ok

([email protected])9> config:set("csrt_logger.matchers_threshold", "changes_processed", "23").

Except I didn't create new database. I queried _replicator db.

while true; do curl -u adm:pass http://127.0.0.1:15984/_replicator/_changes > /dev/null ; sleep 5 ; done

This is what I see in the logs

[notice] 2025-09-16T20:30:11.007168Z [email protected] <0.10531.0> ef994f45d5 127.0.0.1:15984 127.0.0.1 adm GET /_replicator/_changes 200 ok 1
[report] 2025-09-16T20:30:11.007228Z [email protected] <0.10580.0> -------- [csrt-pid-usage-lifetime db_open=5 dbname="_replicator" nonce="ef994f45d5" pid_ref="<0.10531.0>:#Ref<0.3096609386.3958636548.180199>" started_at="2025-09-16T20:30:11.006z" type="coordinator-{chttpd_db:handle_changes_req}:GET:/_replicator/_changes" updated_at="2025-09-16T20:30:11.006z" username="adm"]
[notice] 2025-09-16T20:30:16.056722Z [email protected] <0.10689.0> 9c9fcd161b 127.0.0.1:15984 127.0.0.1 adm GET /_replicator/_changes 200 ok 2
[report] 2025-09-16T20:30:16.056874Z [email protected] <0.10738.0> -------- [csrt-pid-usage-lifetime db_open=5 dbname="_replicator" nonce="9c9fcd161b" pid_ref="<0.10689.0>:#Ref<0.3096609386.3958636546.179197>" started_at="2025-09-16T20:30:16.054z" type="coordinator-{chttpd_db:handle_changes_req}:GET:/_replicator/_changes" updated_at="2025-09-16T20:30:16.056z" username="adm"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants