[Management/Client] Trigger debug bundle runs from API/Dashboard (#4592) #4832

pappz · 2025-11-21T12:45:55Z

Bugfixes

Describe your changes

Issue ticket number and link

Stack

Checklist

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

I added/updated documentation for this change
Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

New Features
- Job management system to create, run and track per-peer jobs (e.g., bundle jobs).
- Debug bundle generation and remote upload for improved troubleshooting.
API Enhancements
- New HTTP endpoints to list, create, and fetch per-peer jobs.
- New bidirectional streaming Job API for real-time job requests/responses.
Chores
- Increased Wasm build size threshold and updated related warning message.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Bugfixes

coderabbitai · 2025-11-21T12:46:18Z

Walkthrough

Introduces a per‑peer job subsystem (create/send/receive jobs), adds debug bundle generation/upload helpers, threads logPath/profile config through client engine and run flows, refactors status conversion to use proto.FullStatus, updates protobuf/OpenAPI and wires JobManager across server, client, and tests.

Changes

Cohort / File(s)	Summary
CI/Build `\.github/workflows/wasm-build-validation.yml`	Increased Wasm size threshold from 50MB → 55MB and updated warning text.
Debug bundle (client) `client/internal/debug/debug.go`, `client/internal/debug/upload.go`, `client/internal/debug/upload_test.go`, `client/server/debug.go`, `client/proto/daemon.proto`	Renamed `LogFile`→`LogPath`; removed `DebugBundleRequest.status` proto field; moved upload logic into `UploadDebugBundle`; status seeding now uses status recorder and proto.FullStatus.
Client run / engine / connect `client/internal/connect.go`, `client/internal/engine.go`, `client/internal/engine_test.go`, `client/embed/embed.go`, `client/cmd/up.go`, `client/cmd/testutil_test.go`	`ConnectClient.Run`/`run` signatures now accept `logPath`; `EngineConfig` gains `LogPath` and `ProfileConfig`; `NewEngine` signature extended; job executor and receiveJobEvents()/handleBundle() added.
Client CLI / UI / status `client/cmd/debug.go`, `client/cmd/status.go`, `client/ui/debug.go`, `client/status/status.go`, `client/status/status_test.go`	Removed status string plumbing from debug UI; `ConvertToStatusOutputOverview` signature changed to accept `*proto.FullStatus` and `daemonVersion`; added `ToProtoFullStatus` helper; call sites updated.
Client job executor `client/jobexec/executor.go`	New Executor with `BundleJob` to wait, generate and upload debug bundles (60min cap).
Management: job core & types `management/server/job/channel.go`, `management/server/job/manager.go`, `management/server/types/job.go`	New job subsystem: Channel event queue, Manager for per-peer job channels/pending jobs, Job and Workload types, validation and conversion helpers.
Management: store & account integration `management/server/store/store.go`, `management/server/store/sql_store.go`, `management/server/account.go`, `management/server/account/manager.go`, `management/server/mock_server/account_mock.go`	Store interface and SQL store extended with peer job persistence methods; `BuildManager` gains `jobManager` parameter; account manager methods added for creating/fetching peer jobs; mock updated.
Management: gRPC/server streaming `management/internals/shared/grpc/server.go`, `management/internals/server/boot.go`, `management/internals/server/controllers.go`, `management/internals/server/modules.go`	`NewServer` now accepts `jobManager`; added ManagementService `Job` streaming handler, handshake/response receiver, per-peer job channel wiring; `BaseServer.JobManager()` added.
Management: HTTP API & OpenAPI `shared/management/http/api/openapi.yml`, `shared/management/http/api/types.gen.go`, `management/server/http/handlers/peers/peers_handler.go`, `shared/management/http/api/generate.sh`	Introduced Jobs API endpoints and new workload/job schemas; generated types for discriminated Workload union (bundle); added handlers `CreateJob`, `ListJobs`, `GetJob`; updated codegen script to oapi-codegen v2.
Shared mgmt client (grpc/mock) `shared/management/client/client.go`, `shared/management/client/grpc.go`, `shared/management/client/mock.go`, `shared/management/client/client_test.go`	Added `Job` streaming method to client interface; implemented client-side streaming/handshake/encryption helpers and mock stub; tests updated to wire JobManager.
Tests & wiring multiple tests: `management/..._test.go`, `client/..._test.go`, `shared/..._test.go`	Passed `job.NewJobManager(...)` into many test BuildManager/NewServer call sites; updated various test call signatures.
Misc / deps `go.mod`, other small server/client imports	Added `github.com/oapi-codegen/runtime` and indirect `github.com/apapsch/go-jsonmerge/v2`; adjusted imports to reflect new modules and removed unused helpers.

Sequence Diagram(s)

sequenceDiagram
    actor Mgmt as Management API
    participant AM as AccountManager / JobManager
    participant GRPC as Mgmt gRPC (Job stream)
    participant Peer as Client (Engine)
    participant Executor as Job Executor / Bundler
    Mgmt->>AM: CreatePeerJob(account, peer, job)
    AM->>GRPC: enqueue job event for peer
    GRPC->>Peer: stream JobRequest (encrypted)
    Peer->>Peer: receiveJobEvents()
    Peer->>Executor: BundleJob(...)  -- generate & upload
    Executor->>Executor: collect logs/status, create bundle
    Executor->>Mgmt: upload returns key (JobResponse)
    Peer->>GRPC: stream JobResponse (encrypted)
    GRPC->>AM: HandleResponse(job response) -> persist completion

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Areas to focus review on:

concurrency and channel/cleanup semantics in management/server/job/manager.go and channel.go
gRPC job streaming, handshake, encryption, and error paths in management/internals/shared/grpc/server.go and shared/management/client/grpc.go
Engine/jobExecutor integration and goroutine lifecycle (client/internal/engine.go, job executor)
Proto/OpenAPI changes and generated union handling (shared/management/http/api/types.gen.go, shared/management/proto/management.proto)
DB migrations and SQL store job methods (management/server/store/sql_store.go)
Call-site updates for status conversion (client/status/status.go and all callers)

Possibly related PRs

[Management/Client] Trigger debug bundle runs from API/Dashboard #4592 — overlapping refactor touching debug bundle generator, DebugBundleRequest status removal, JobManager wiring, and engine/status changes.
[client,management] Rewrite the SSH feature #4015 — related changes to client status conversion and FullStatus handling.
[management] move network map logic into new design #4774 — related server constructor/signature changes and module wiring patterns.

Suggested reviewers

pascal-fischer
crn4
mlsmaycon

Poem

🐰 Hop, hop — a job queue wakes,

Bundles stitched from logs and flakes,
Streams that whisper, channels hum,
Upload keys arrive — drum drum drum,
I nibble bugs and watch it run. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is largely incomplete. While it has the required template structure with checklist items marked (feature enhancement and refactor), it lacks substantive content under key sections including 'Describe your changes' and 'Issue ticket number and link'.	Complete the description by providing details about the changes made, referencing the related issue number, and clarifying whether documentation updates were needed for this feature.
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.13% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly describes the main objective: triggering debug bundle runs from the API/Dashboard, which aligns with the extensive changes across management and client packages.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/remote-debug-release

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

client/status/status.go (1)
234-315: ToProtoFullStatus mapping is good; add nil‑safety when reading latency/handshake

The new ToProtoFullStatus helper correctly maps:

Management and signal state (URL, connected, error).

Local peer state (IP, pub key, kernel, FQDN, Rosenpass flags, networks from route keys).

Per‑peer fields (status, timestamps, transfer stats, Rosenpass, networks, latency, SSH host key).

Relays and DNS group state, including error strings.

Downstream, mapPeers assumes Latency and LastWireguardHandshake are always set:
lastHandshake := pbPeerState.GetLastWireguardHandshake().AsTime().Local()
...
Latency: pbPeerState.GetLatency().AsDuration(),
That’s safe for FullStatus objects produced by ToProtoFullStatus, but will panic if a PeerState arrives with these fields unset (e.g., older daemon or other producer). To harden this:
var lastHandshake time.Time
if ts := pbPeerState.GetLastWireguardHandshake(); ts != nil {
    lastHandshake = ts.AsTime().Local()
}

var latency time.Duration
if d := pbPeerState.GetLatency(); d != nil {
    latency = d.AsDuration()
}
and then use those locals when building PeerStateDetailOutput.

Also applies to: 549-635
client/internal/engine.go (2)
34-53: Engine config/struct wiring for jobs is fine, but c *profilemanager.Config is unused

The new debug/job additions to EngineConfig (ProfileConfig, LogPath) and Engine (jobExecutor, jobExecutorWG) are wired in cleanly, and creating jobExecutor in NewEngine is reasonable.

However, the extra parameter c *profilemanager.Config on NewEngine is never used in the function body. In Go this is a compile‑time error. Either remove the parameter or actually thread it into the engine configuration, e.g.:
config.ProfileConfig = c
// or
engine.config.ProfileConfig = c
depending on where you intend to own that reference.

Also applies to: 83-142, 145-222, 235-276

278-366: Job stream consumption and bundle handling are generally solid; watch for restart behavior and nil profile config

The new job flow on the client side looks good overall:

receiveJobEvents hooks into mgmClient.Job with a per‑message handler that defaults responses to JobStatus_failed and branches on WorkloadParameters – currently only Bundle is handled, everything else returns ErrJobNotImplemented.

jobExecutorWG ensures Stop() waits for the Job stream goroutine to exit before tearing down the engine.

handleBundle builds debug.GeneratorDependencies from engine state, calls jobExecutor.BundleJob, and wraps the resulting upload key in a JobResponse_Bundle.

Two points to be aware of:
When mgmClient.Job returns any error (including when e.ctx is canceled during Stop()), you treat it as a hard failure and call CtxGetState(e.ctx).Wrap(ErrResetConnection) + e.clientCancel(), which triggers a full client restart. That mirrors how Sync/Signal errors are handled but also means a graceful “down” or engine stop will trigger a restart path instead of a quiet shutdown. If that’s not intended, you may want to distinguish context cancellation from other errors.
handleBundle assumes e.config.ProfileConfig and e.config.ProfileConfig.ManagementURL are non‑nil:
InternalConfig: e.config.ProfileConfig,
...
uploadKey, err := e.jobExecutor.BundleJob(..., e.config.ProfileConfig.ManagementURL.String())
If ProfileConfig or its ManagementURL can be unset (e.g., in tests or some platforms), this will panic. A defensive check that returns a failed JobResponse with a clear reason would make the behavior safer.
Also applies to: 942-1012

🧹 Nitpick comments (38)

shared/management/client/grpc.go (5)
112-160: Centralized withMgmtStream helper looks good; double‑check backoff choice and final logging

The withMgmtStream wrapper nicely unifies readiness checks, server key fetching, and retry for both Sync and Job. One thing to double‑check is whether you explicitly want streaming calls to use defaultBackoff(ctx) while other RPCs still use nbgrpc.Backoff(...); if not, consider reusing the same backoff helper for consistency, or documenting the intentional difference.

Also, withMgmtStream logs "unrecoverable error" for any non‑nil err after backoff.Retry, including context.Canceled / DeadlineExceeded. If you expect normal shutdowns via context cancellation, you may want to special‑case those errors before logging at warn to avoid noisy logs.

162-216: Job stream error handling is solid; refine connection notifications for better state reporting

The overall handleJobStream flow (open stream → handshake → recv/process/respond loop) and error classification by gRPC code look good.

A couple of refinements around connection state notifications:

notifyDisconnected(err) is called for every receive error, including codes.Unimplemented. In that case, the server is reachable but doesn’t support Job; marking management as “disconnected” can mislead consumers of ConnStateNotifier. Consider moving notifyDisconnected into the switch and skipping it for codes.Unimplemented (and possibly codes.Canceled, which usually indicates local shutdown).

The Job path never calls notifyConnected(), so this stream can only move the state toward “disconnected”. If ConnStateNotifier is used for user‑visible connectivity, you might want to call notifyConnected() once the stream and handshake succeed (or when the first job is successfully received) to keep the state transitions balanced.

On sendJobResponse failure you return the error but don’t notify disconnection; if this error typically indicates a broken stream, it may be worth also invoking notifyDisconnected(err) there.

These are behavioral tweaks rather than correctness issues but would make connection state reporting more accurate.

218-252: Job helpers mirror existing encryption patterns; consider cleaning up unused ctx parameters

sendHandshake and receiveJobRequest correctly follow the existing EncryptMessage / DecryptMessage pattern and use the WireGuard public key in the EncryptedMessage as elsewhere in this file.

Right now the ctx parameter passed into these helpers isn’t used inside them; the only effective cancellation point is the stream’s own context from realClient.Job(ctx). That’s fine behaviorally, but for clarity you could either:

Drop ctx from these helper signatures, or

Start using it explicitly (e.g., check ctx.Err() before/after blocking operations, or for future timeouts / per‑job cancellation).

Given this is internal code, I’d treat it as a readability/maintenance cleanup to do when convenient.

254-295: Ensure JobResponse is always well‑formed for correlation and logging

processJobRequest nicely guards against a nil handler result by synthesizing a JobResponse with Status: JobStatus_failed and a reason. Two minor robustness tweaks you might consider:

If the handler returns a non‑nil JobResponse but forgets to populate ID, you could default it to jobReq.ID so the management server can always correlate responses:
jobResp := msgHandler(jobReq)
if jobResp == nil {
    jobResp = &proto.JobResponse{
        ID:     jobReq.ID,
        Status: proto.JobStatus_failed,
        Reason: []byte("handler returned nil response"),
    }
} else if len(jobResp.ID) == 0 {
    jobResp.ID = jobReq.ID
}
For logging, string(jobReq.ID) / string(resp.ID) assumes the IDs are valid UTF‑8. If these are ever changed to non‑textual bytes, consider logging them as %x or via hex.EncodeToString to avoid odd output.

Not blockers, but they make the job channel a bit more defensive and easier to debug.

297-395: Sync refactor via connectToSyncStream/receiveUpdatesEvents looks consistent with existing patterns

The split into connectToSyncStream and receiveUpdatesEvents reads clean and matches the encryption/decryption contract used elsewhere (EncryptMessage(serverPubKey, c.key, req) and DecryptMessage(serverPubKey, c.key, update.Body, resp)).

A small optional improvement: GetNetworkMap now reimplements the “single Recv + decrypt SyncResponse” logic that’s very similar to what receiveUpdatesEvents does in a loop. If you find yourself touching this code again, you might consider a tiny shared helper (e.g., decryptSyncUpdate(serverPubKey, update *proto.EncryptedMessage) (*proto.SyncResponse, error)) to keep the decryption path fully DRY.

Functionally, the new sync connection flow looks correct.
client/internal/debug/upload_test.go (1)

1-1: Test now exercises the exported UploadDebugBundle correctly

Switching to package debug and calling UploadDebugBundle(context.Background(), testURL+types.GetURLPath, testURL, file) matches the new API and keeps the expectations (getURLHash(testURL) prefix, stored file content) intact.

You might consider using a context with timeout here (and/or a simple readiness wait on srv.Start) to make this integration-style test more robust under slow CI environments.

Also applies to: 41-48

client/cmd/debug.go (1)

338-360: Debug bundle generator now uses LogPath and shared default upload URL

Passing LogPath: logFilePath into debug.NewBundleGenerator matches the updated GeneratorDependencies and ensures the bundle generator knows where to look for logs. Reusing types.DefaultBundleURL as the default for --upload-bundle-url in both debug bundle and debug for keeps the CLI consistent with the upload-server configuration.

Consider clarifying in the flag help text that the default is the NetBird-hosted debug upload service, so self‑hosters know they should override it if needed.

Also applies to: 369-379

client/server/server_test.go (1)

21-21: JobManager wiring into test management server is correct

Creating a jobManager := job.NewJobManager(nil, store) and passing it into both server.BuildManager(...) and nbgrpc.NewServer(...) matches the updated constructor signatures and ensures job flows are exercised in this test setup.

The if err != nil { return nil, "", err } right after eventStore := &activity.InMemoryEventStore{} still refers to the earlier err from NewTestStoreFromSQL, which has already been handled and is guaranteed nil here. It can be removed to avoid confusion.

Also applies to: 298-329

client/internal/debug/debug.go (1)

30-34: Status seeding via FullStatus/nbstatus is coherent with anonymization strategy

Using statusRecorder.GetFullStatus() → nbstatus.ToProtoFullStatus → ConvertToStatusOutputOverview → ParseToFullDetailSummary aligns the status.txt contents with the regular netbird status output and correctly injects event history and profile name. Seeding the bundle anonymizer via seedFromStatus(g.anonymizer, &fullStatus) leverages this richer status snapshot to anonymize later artifacts more consistently.

One minor nit: seedFromStatus runs even when g.anonymize is false, which is harmless but unnecessary work; you could optionally guard it with if g.anonymize { ... } for clarity.

Also applies to: 382-405, 852-880

shared/management/http/api/generate.sh (1)

14-16: oapi-codegen v2 install path looks correct; consider pinning a version

The switch to github.com/oapi-codegen/oapi-codegen/v2/cmd/oapi-codegen@latest matches the v2 module layout and should work fine with recent Go toolchains. For more reproducible codegen in CI, you might eventually want to pin a specific version instead of @latest.

client/embed/embed.go (1)

175-185: Recorder integration and updated Run call look consistent

Wiring a peer.NewRecorder into NewConnectClient and updating Run to Run(run, "") matches the new client API and should keep embedded startup behavior intact. If you later surface a log/debug path in Options, this is the right place to thread it through instead of an empty string.
management/server/http/testing/testing_tools/channel/channel.go (1)
19-39: JobManager wiring into BuildManager is correct; reuse metrics instead of nil

Injecting a concrete job.Manager into BuildManager here is a nice improvement in test realism. Given you already construct metrics in this helper, you can pass it into the job manager instead of nil so job‑related metrics behavior is also covered in tests:
-	peersUpdateManager := update_channel.NewPeersUpdateManager(nil)
-	jobManager := job.NewJobManager(nil, store)
+	peersUpdateManager := update_channel.NewPeersUpdateManager(nil)
+	jobManager := job.NewJobManager(metrics, store)
...
-	am, err := server.BuildManager(ctx, nil, store, networkMapController, jobManager, nil, "", &activity.InMemoryEventStore{}, geoMock, false, validatorMock, metrics, proxyController, settingsManager, permissionsManager, false)
+	am, err := server.BuildManager(ctx, nil, store, networkMapController, jobManager, nil, "", &activity.InMemoryEventStore{}, geoMock, false, validatorMock, metrics, proxyController, settingsManager, permissionsManager, false)
Also applies to: 53-79
management/server/account_test.go (1)
35-50: Account test manager now wires a JobManager; prefer passing metrics instead of nil

Importing job and passing a concrete JobManager into BuildManager in createManager matches the new account manager wiring and lets tests exercise job‑related paths. Since you already compute metrics in this helper, you can avoid a nil metrics field on the JobManager and better mirror production setup by doing:
-	updateManager := update_channel.NewPeersUpdateManager(metrics)
-	requestBuffer := NewAccountRequestBuffer(ctx, store)
-	networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.cloud", port_forwarding.NewControllerMock(), &config.Config{})
-	manager, err := BuildManager(ctx, nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
+	updateManager := update_channel.NewPeersUpdateManager(metrics)
+	requestBuffer := NewAccountRequestBuffer(ctx, store)
+	networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.cloud", port_forwarding.NewControllerMock(), &config.Config{})
+	manager, err := BuildManager(ctx, nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
Also applies to: 2931-2969
management/server/route_test.go (1)
22-34: Route tests correctly inject a JobManager; reuse metrics for consistency

The additional job import and passing job.NewJobManager(nil, store) into BuildManager in createRouterManager are aligned with the new constructor and ensure route tests run with job support enabled. Since this helper already initializes metrics, you can tighten the fidelity of the test setup by wiring it into the JobManager too:
-	updateManager := update_channel.NewPeersUpdateManager(metrics)
-	requestBuffer := NewAccountRequestBuffer(ctx, store)
-	networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.selfhosted", port_forwarding.NewControllerMock(), &config.Config{})
-
-	am, err := BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
+	updateManager := update_channel.NewPeersUpdateManager(metrics)
+	requestBuffer := NewAccountRequestBuffer(ctx, store)
+	networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.selfhosted", port_forwarding.NewControllerMock(), &config.Config{})
+
+	am, err := BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
Also applies to: 1258-1302
management/server/peer_test.go (1)
35-35: Prefer initializing job.Manager with real metrics in tests

The wiring of BuildManager looks correct (argument order matches the updated signature), but all these test setups construct the job manager with job.NewJobManager(nil, s) while metrics is already available. This leaves job.Manager.metrics nil in tests, which can (a) hide metric-related bugs and (b) potentially panic later if the manager starts using metrics without nil checks.

Consider passing the same metrics instance you already create:
-	am, err := BuildManager(context.Background(), nil, s, networkMapController, job.NewJobManager(nil, s), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
+	am, err := BuildManager(context.Background(), nil, s, networkMapController, job.NewJobManager(metrics, s), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
Apply the same pattern to the other BuildManager invocations in this file for consistency.

Also applies to: 1296-1296, 1381-1381, 1534-1534, 1614-1614
management/server/nameserver_test.go (1)
19-19: Use the existing metrics instance when creating JobManager

createNSManager correctly wires the new jobManager dependency into BuildManager, but you’re passing job.NewJobManager(nil, store) even though a real metrics instance is already created and used for update_channel.NewPeersUpdateManager.

To avoid a nil metrics inside job.Manager in tests (and potential surprises if metrics are used later), prefer:
-	return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
+	return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
Also applies to: 798-799
management/server/dns_test.go (1)
17-17: Align test JobManager construction with real metrics

createDNSManager correctly injects a JobManager into BuildManager, but uses job.NewJobManager(nil, store) despite having a metrics instance already.

For more realistic tests and to avoid a nil metrics field inside job.Manager, consider:
-	return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
+	return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
Also applies to: 229-230
management/server/management_proto_test.go (1)

32-32: JobManager wiring in tests looks correct; consider reusing metrics

The test correctly instantiates job.Manager and threads it through BuildManager and nbgrpc.NewServer, matching the new signatures. To exercise metrics inside the job manager during tests, you might consider passing the metrics instance instead of nil when creating jobManager, but that's optional.

Also applies to: 342-343, 369-371, 380-381

client/cmd/testutil_test.go (1)

19-19: Consistent JobManager setup in test helper; minor polish possible

The JobManager is correctly created and passed into BuildManager and nbgrpc.NewServer, aligning with the new signatures. Two small nits you could optionally address later:

Pass the metrics instance to job.NewJobManager instead of nil if you want job metrics in tests.

The if err != nil { return nil, nil } check after eventStore := &activity.InMemoryEventStore{} is dead code and can be removed.

Also applies to: 92-96, 123-124, 129-130

management/server/account.go (1)

18-18: JobManager injection is clean; clarify non‑nil contract

Adding jobManager *job.Manager to DefaultAccountManager and threading it via BuildManager is consistent and keeps wiring explicit. However, methods like CreatePeerJob dereference am.jobManager without nil checks, so BuildManager effectively requires a non‑nil JobManager in all call sites. Consider either:

Documenting that jobManager must be non‑nil (and enforcing via tests), or

Adding a defensive nil check that returns a clear status.Internal/status.PreconditionFailed error instead of panicking if it’s miswired.

Would you like a small ast-grep/rg script to verify that all BuildManager call sites now pass a non‑nil JobManager?

Also applies to: 68-76, 179-197, 203-224

management/server/management_test.go (1)

31-31: Test server wiring with JobManager is correct

The test server now correctly constructs a JobManager and passes it through to BuildManager and nbgrpc.NewServer, matching the new APIs. As with other tests, you could optionally pass metrics instead of nil into job.NewJobManager if you want job metrics exercised, but the current setup is functionally fine.

Also applies to: 183-185, 212-228, 235-247

management/server/http/handlers/peers/peers_handler.go (1)

31-41: New peer job HTTP endpoints are well‑structured; tighten validation and rely on fixed ownership check

The new /peers/{peerId}/jobs and /peers/{peerId}/jobs/{jobId} handlers are consistent with existing patterns (auth from context, util.WriteError/WriteJSONObject, and conversion helpers). A few small points:

Once GetPeerJobByID is fixed to assert job.PeerID == peerID (see account manager comment), these handlers will correctly enforce per‑peer job ownership.

For consistency with other handlers in this file (e.g. HandlePeer, GetAccessiblePeers), you may want to add an explicit empty‑peerId check in CreateJob, ListJobs, and GetJob to return a clear InvalidArgument instead of silently passing "" through.

toSingleJobResponse cleanly maps the domain Job to the HTTP JobResponse, including optional FailedReason and typed Workload; that looks good.

Overall, the HTTP surface for jobs is in good shape once the backend ownership check is tightened.

Also applies to: 51-142, 618-638

shared/management/proto/management.proto (1)

52-53: Refine Job proto types for clarity and forward compatibility

The Job RPC and related messages look structurally sound and align with the existing EncryptedMessage pattern. A few tweaks would make them easier to use:

JobResponse.Reason as bytes is unusual for a human-readable explanation; a string would be more ergonomic unless you explicitly plan to ship arbitrary binary blobs.

JobStatus only has unknown_status, succeeded, failed. If you intend to reflect persistent job lifecycle (pending/running vs completed) it may be worth adding an explicit in‑progress state instead of overloading unknown_status as a placeholder.

Consider documenting expected encoding/format for JobRequest.ID / JobResponse.ID (stringified UUID vs raw bytes) so store/types.Job integration remains consistent.

Functionally this is fine, these are just protocol polish suggestions.

Also applies to: 66-89
client/internal/debug/upload.go (1)
15-101: Upload helper is solid; consider small robustness improvements

The end-to-end flow (get presigned URL → size check → PUT upload) looks good and side‑effect free. A few minor refinements:
In upload, prefer %w when wrapping the HTTP error to preserve the error chain:
return fmt.Errorf("upload failed: %w", err)
getUploadURL builds the query as url+"?id="+id. If url might ever include its own query string, using net/url to append id would be safer.

Both getUploadURL and upload strictly require StatusOK. If the upload service ever returns another 2xx (e.g. 201/204), this will be treated as failure. If you control that service and guarantee 200, current code is fine; otherwise consider if putResp.StatusCode/100 != 2.
None of these are blockers; the current implementation should work as intended.
client/server/debug.go (1)

27-39: Use the incoming context (and possibly avoid holding the mutex) during upload

The switch to debug.UploadDebugBundle and LogPath: s.logFile looks correct.

Two follow‑up improvements worth considering:

DebugBundle ignores the incoming context and calls UploadDebugBundle with context.Background(). That means client cancellation/timeouts won’t stop the upload. Renaming the parameter to ctx context.Context and passing ctx through would make this RPC better behaved under cancellation.

The upload is done while s.mutex is held. For large bundles or slow networks this can block other server operations behind this lock. If state safety allows it, you might generate the bundle under the lock, then release the mutex before starting the network upload.

Behavior is fine for a rarely used debug path, but these tweaks would make it more responsive.

Also applies to: 49-57

management/server/account/manager.go (1)

127-129: Job APIs on Manager look good; consider pagination/filters

The new job methods are consistent with the rest of the Manager interface (accountID/userID/peerID ordering, types.Job usage).

Two design points to keep in mind for implementations:

GetAllPeerJobs can grow unbounded over time; if jobs are expected to accumulate, consider adding pagination and/or server‑side filtering (e.g. by status or time range) at some point.

Ensure implementations consistently enforce userID authorization, since these methods expose per‑peer operational history.

No changes required here, just points to watch in the concrete manager.

client/internal/engine_test.go (1)

1591-1592: JobManager wiring in startManagement test mirrors production; consider reusing metrics

The new jobManager := job.NewJobManager(nil, store) and its injection into BuildManager and nbgrpc.NewServer properly mirror the production wiring, so tests now exercise the job pipeline end‑to‑end.

Since you already create metrics, err := telemetry.NewDefaultAppMetrics(...) a few lines later, you may want to:

Construct metrics first, then

Call job.NewJobManager(metrics, store)

This would keep metrics behavior consistent between the JobManager and the rest of the management stack even in tests. Not required for correctness, but a bit cleaner.

Also applies to: 1622-1623, 1628-1629

shared/management/http/api/openapi.yml (2)

41-121: Align bundle/workload schema with actual bundle generator config

The Bundle/Workload schemas look reasonable, but there are a couple of potential contract mismatches to double‑check:

BundleParameters exposes bundle_for and bundle_for_time while the current debug.BundleConfig only has Anonymize, IncludeSystemInfo, and LogFileCount. There is no way for callers to control IncludeSystemInfo, and the semantics of bundle_for/bundle_for_time vs. what the client actually does (currently just a wait + regular bundle) should be clarified or adjusted so the API is not misleading.

You’re using a discriminator on WorkloadRequest/WorkloadResponse with propertyName: type. Make sure the generated clients you care about support this pattern where the discriminator property lives only in the concrete types (BundleWorkloadRequest/Response) and not also declared at the base level; some generators are picky here.

I’d suggest either:

Extending BundleConfig + implementation to truly honor all declared parameters, or

Tightening the OpenAPI description and fields to only what is actually used today, to avoid surprising API consumers.

35-38: Clarify experimental nature and pagination/filtering for Jobs endpoints

The Jobs tag is marked x-experimental: true, but the individual operations are not. If you rely on tooling that inspects operation/vendor extensions rather than tags, you may want to duplicate x-experimental: true on the new /api/peers/{peerId}/jobs and /api/peers/{peerId}/jobs/{jobId} operations.

Additionally, listing jobs currently returns an unbounded array with no filtering or pagination parameters. If job counts can become large per peer, it’s worth at least documenting ordering (e.g. newest first) and planning for page/page_size/status filters, even if you defer implementing them now.

Also applies to: 2353-2456

shared/management/client/client_test.go (1)

25-26: JobManager wiring in tests looks correct; consider reusing metrics if you need job metrics

The new jobManager := job.NewJobManager(nil, store) and its injection into both BuildManager and nbgrpc.NewServer align with the updated constructor signatures and keep the tests realistic.

If you ever want to exercise job‑related metrics in tests, you could pass the already‑created metrics instead of nil to NewJobManager; otherwise this setup is fine as‑is for functional tests.

Also applies to: 76-77, 123-124, 130-132

management/server/store/sql_store.go (1)

136-207: Job CRUD helpers mostly fine; consider tightening error handling and update scope

Functionality looks correct overall (ID scoping, account/peer filters, status transitions), but a few details could be improved:

CompletePeerJob updates by id only; since ID is the primary key this is safe, but for consistency with GetPeerJobByID you may want to include account_id in the WHERE as well.

CreatePeerJob, CompletePeerJob, and MarkPendingJobsAsFailed correctly wrap errors with status.Internal, but GetPeerJobByID / GetPeerJobs return raw GORM errors on internal failure. Consider wrapping those in status.Errorf(status.Internal, ...) as done elsewhere in this store so callers don’t depend on storage-specific errors.

If types.Job.ApplyResponse only populates a subset of fields, CompletePeerJob’s Updates(job) call is fine; if it ever starts zeroing other fields, you may want to switch to an explicit field list/map to avoid unintended overwrites.

management/internals/shared/grpc/server.go (1)

184-219: Job stream handling is generally sound; consider surfacing send failures

The new Job RPC and helpers hold up well:

Handshake reads a single encrypted JobRequest and reuses the existing parseRequest path, so message framing and crypto stay consistent with Sync/Login.

Account/peer lookup mirrors the Sync flow and correctly maps missing peers to PermissionDenied / Unauthenticated.

CreateJobChannel + deferred CloseChannel provide a clear per-peer lifecycle, and startResponseReceiver cleanly demultiplexes JobResponse messages into the job manager.

One point to consider tightening:

In sendJobsLoop, any failure in sendJob is logged and then the method returns nil, so the gRPC stream terminates with an OK status even if job delivery to the client failed. For parity with handleUpdates/sendUpdate, you may want to return the error (or a mapped status error) so callers can observe transport failures on the Job stream.

Also applies to: 333-397, 450-467

management/server/job/manager.go (2)

44-151: Pending‑job cleanup semantics are coarse; minor concurrency detail worth double‑checking

Overall behavior is coherent (fail stuck jobs, enqueue new ones, complete on response), but a few aspects could be sharpened:

CreateJobChannel calls MarkPendingJobsAsFailed for (accountID, peerID) before taking the lock. That’s fine functionally, but means “stuck too long” in the DB is actually “any pending job when a fresh channel is created” — if you later add explicit TTL semantics, you may want a more targeted query.

In SendJob, you look up the Channel under RLock and then call ch.AddEvent after releasing the lock. If CreateJobChannel or CloseChannel can call ch.Close() concurrently, make sure Channel.AddEvent is written to safely detect a closed channel and return ErrJobChannelClosed rather than panic on send to a closed chan.

Both cleanup and CloseChannel use MarkPendingJobsAsFailed(accountID, peerID, reason), which marks all pending DB jobs for that peer, not just the specific jobID. That’s acceptable if you only ever have a single pending job per peer, but if you later support multiple in‑flight jobs, you’ll likely want a job‑scoped failure path in the store.

In CloseChannel, the loop over jm.pending calls MarkPendingJobsAsFailed once per event for the same peer; a minor optimization is to call it once per peerID, then delete all relevant pending entries.

These are mostly behavioral clarifications and small optimizations rather than blockers.

90-117: HandleResponse flow is correct but could simplify error‑response coupling

HandleResponse correctly:

Looks up the pending event by jobID.

Builds a types.Job from the response via ApplyResponse.

Calls Store.CompletePeerJob and always removes the entry from pending.

Given that the event.Response field is only set when CompletePeerJob succeeds and then immediately dropped from pending, you could omit event.Response entirely or set it before calling the store to simplify reasoning. Current behavior is valid; this is mainly an internal API cleanup opportunity.
client/status/status.go (1)
122-169: Overview mapping from FullStatus looks correct; consider a nil guard

Switching ConvertToStatusOutputOverview to use *proto.FullStatus directly and threading in daemonVersion matches how the proto is structured and keeps the output fields aligned with the underlying status (management, signal, local peer, relays, DNS, events, SSH).

One small robustness improvement: if pbFullStatus can ever be nil (e.g., from an older daemon or error path), pbFullStatus.Get...() calls will panic. A quick upfront check like
if pbFullStatus == nil {
    return OutputOverview{}
}
would make the function safer against unexpected inputs.
management/server/types/job.go (2)
105-138: Minor: preserve wrapped error instead of using err.Error()

In BuildWorkloadResponse, you wrap bundle build errors as:
if err := j.buildBundleResponse(&wl); err != nil {
    return nil, status.Errorf(status.InvalidArgument, err.Error())
}
Using err.Error() as the format string loses the original error as a distinct argument and makes further wrapping/debugging slightly harder. Consider:
- return nil, status.Errorf(status.InvalidArgument, err.Error())
+ return nil, status.Errorf(status.InvalidArgument, "%v", err)
to keep the original error value intact and consistent with the rest of the file.

140-162: Bundle parameter validation is clear; consider tightening semantics only if needed

The validation for BundleForTime (1–5 minutes when BundleFor is true) and LogFileCount (1–1000) is straightforward and matches the API docs. You also normalize stored parameters to the BundleParameters JSON and initialize Result to {}, which keeps BuildWorkloadResponse happy for pending/failed jobs.

If you ever want to hard‑fail obviously nonsensical inputs when BundleFor is false (e.g., negative BundleForTime), you could extend the check here, but that’s optional and not strictly required by the current contract.
shared/management/http/api/types.gen.go (1)
2112-2169: WorkloadRequest helpers match server usage; consider using the WorkloadType constant

The As*/From*/Merge* helpers and Discriminator/ValueByDiscriminator for WorkloadRequest are consistent with how NewJob and validateAndBuildBundleParams consume the workload. One small optional improvement would be to use WorkloadTypeBundle instead of the literal "bundle" to avoid drift if the constant ever changes:
-func (t *WorkloadRequest) FromBundleWorkloadRequest(v BundleWorkloadRequest) error {
-    v.Type = "bundle"
+func (t *WorkloadRequest) FromBundleWorkloadRequest(v BundleWorkloadRequest) error {
+    v.Type = WorkloadTypeBundle
Same for MergeBundleWorkloadRequest and the switch in ValueByDiscriminator.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32146e5 and 62ae8d8.

⛔ Files ignored due to path filters (4)

client/proto/daemon.pb.go is excluded by !**/*.pb.go
go.sum is excluded by !**/*.sum
shared/management/proto/management.pb.go is excluded by !**/*.pb.go
shared/management/proto/management_grpc.pb.go is excluded by !**/*.pb.go

📒 Files selected for processing (52)

.github/workflows/wasm-build-validation.yml (1 hunks)
client/cmd/debug.go (1 hunks)
client/cmd/status.go (1 hunks)
client/cmd/testutil_test.go (3 hunks)
client/cmd/up.go (1 hunks)
client/embed/embed.go (1 hunks)
client/internal/connect.go (7 hunks)
client/internal/debug/debug.go (7 hunks)
client/internal/debug/upload.go (1 hunks)
client/internal/debug/upload_test.go (2 hunks)
client/internal/engine.go (13 hunks)
client/internal/engine_test.go (9 hunks)
client/jobexec/executor.go (1 hunks)
client/proto/daemon.proto (0 hunks)
client/server/debug.go (2 hunks)
client/server/server.go (4 hunks)
client/server/server_test.go (3 hunks)
client/status/status.go (4 hunks)
client/status/status_test.go (1 hunks)
client/ui/debug.go (2 hunks)
go.mod (2 hunks)
management/internals/server/boot.go (1 hunks)
management/internals/server/controllers.go (3 hunks)
management/internals/server/modules.go (1 hunks)
management/internals/shared/grpc/server.go (11 hunks)
management/server/account.go (4 hunks)
management/server/account/manager.go (1 hunks)
management/server/account_test.go (2 hunks)
management/server/activity/codes.go (2 hunks)
management/server/dns_test.go (2 hunks)
management/server/http/handlers/peers/peers_handler.go (3 hunks)
management/server/http/testing/testing_tools/channel/channel.go (3 hunks)
management/server/job/channel.go (1 hunks)
management/server/job/manager.go (1 hunks)
management/server/management_proto_test.go (4 hunks)
management/server/management_test.go (4 hunks)
management/server/mock_server/account_mock.go (2 hunks)
management/server/nameserver_test.go (2 hunks)
management/server/peer.go (1 hunks)
management/server/peer_test.go (5 hunks)
management/server/route_test.go (2 hunks)
management/server/store/sql_store.go (3 hunks)
management/server/store/store.go (1 hunks)
management/server/types/job.go (1 hunks)
shared/management/client/client.go (1 hunks)
shared/management/client/client_test.go (3 hunks)
shared/management/client/grpc.go (8 hunks)
shared/management/client/mock.go (2 hunks)
shared/management/http/api/generate.sh (1 hunks)
shared/management/http/api/openapi.yml (2 hunks)
shared/management/http/api/types.gen.go (8 hunks)
shared/management/proto/management.proto (2 hunks)

💤 Files with no reviewable changes (1)

client/proto/daemon.proto

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-11-13T00:29:53.247Z

Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/cmd/ssh_exec_unix.go:53-74
Timestamp: 2025-11-13T00:29:53.247Z
Learning: In client/ssh/server/executor_unix.go, the method ExecuteWithPrivilegeDrop(ctx context.Context, config ExecutorConfig) has a void return type (no error return). It handles failures by exiting the process directly with appropriate exit codes rather than returning errors to the caller.

Applied to files:

client/jobexec/executor.go

📚 Learning: 2025-11-14T13:05:31.729Z

Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/ssh/server/userswitching_windows.go:89-139
Timestamp: 2025-11-14T13:05:31.729Z
Learning: In client/ssh/server/executor_windows.go, the WindowsExecutorConfig struct's Pty, PtyWidth, and PtyHeight fields are intentionally left unused for now and will be implemented in a future update.

Applied to files:

client/jobexec/executor.go

🧬 Code graph analysis (42)

shared/management/client/client.go (3)

management/server/types/job.go (1)

Job (34-58)

shared/management/http/api/types.gen.go (2)

JobRequest (708-710)

JobResponse (713-721)

shared/management/proto/management.pb.go (6)

JobRequest (388-398)

JobRequest (413-413)

JobRequest (428-430)

JobResponse (463-475)

JobResponse (490-490)

JobResponse (505-507)

management/internals/server/controllers.go (3)

management/internals/server/server.go (1)

BaseServer (45-68)

management/server/job/manager.go (2)

Manager (23-30)

NewJobManager (32-42)

management/internals/server/container.go (1)

Create (6-10)

client/status/status_test.go (1)

client/status/status.go (1)

ConvertToStatusOutputOverview (122-169)

client/internal/debug/upload_test.go (2)

client/internal/debug/upload.go (1)

UploadDebugBundle (17-28)

upload-server/types/upload.go (1)

GetURLPath (9-9)

management/server/store/store.go (1)

management/server/types/job.go (1)

Job (34-58)

management/internals/server/boot.go (1)

management/internals/shared/grpc/server.go (1)

NewServer (84-147)

client/internal/debug/upload.go (1)

upload-server/types/upload.go (3)

GetURLResponse (15-18)

ClientHeader (5-5)

ClientHeaderValue (7-7)

client/cmd/up.go (1)

util/log.go (1)

FindFirstLogPath (77-84)

management/server/account_test.go (2)

management/server/account.go (1)

BuildManager (180-268)

management/server/job/manager.go (1)

NewJobManager (32-42)

management/server/peer.go (7)

management/server/account.go (1)

DefaultAccountManager (68-113)

management/server/types/job.go (2)

Job (34-58)

Workload (60-64)

management/server/permissions/modules/module.go (1)

Peers (7-7)

management/server/permissions/operations/operation.go (1)

Delete (9-9)

shared/management/status/error.go (5)

NewPermissionValidationError (213-215)

NewPermissionDeniedError (209-211)

NewPeerNotPartOfAccountError (105-107)

Errorf (70-75)

Type (46-46)

management/server/store/store.go (3)

Store (50-211)

LockingStrengthNone (47-47)

LockingStrengthUpdate (43-43)

management/server/activity/codes.go (1)

JobCreatedByUser (183-183)

management/server/http/testing/testing_tools/channel/channel.go (2)

management/server/job/manager.go (1)

NewJobManager (32-42)

management/server/account.go (1)

BuildManager (180-268)

management/internals/server/modules.go (1)

management/server/account.go (1)

BuildManager (180-268)

shared/management/client/mock.go (3)

shared/management/http/api/types.gen.go (2)

JobRequest (708-710)

JobResponse (713-721)

shared/management/proto/management.pb.go (6)

JobRequest (388-398)

JobRequest (413-413)

JobRequest (428-430)

JobResponse (463-475)

JobResponse (490-490)

JobResponse (505-507)

management/server/types/job.go (1)

Job (34-58)

management/server/account/manager.go (1)

management/server/types/job.go (1)

Job (34-58)

client/embed/embed.go (1)

client/internal/connect.go (1)

NewConnectClient (51-62)

client/cmd/status.go (1)

client/status/status.go (1)

ConvertToStatusOutputOverview (122-169)

management/server/management_test.go (1)

management/server/job/manager.go (1)

NewJobManager (32-42)

management/server/peer_test.go (2)

management/server/account.go (1)

BuildManager (180-268)

management/server/job/manager.go (1)

NewJobManager (32-42)

client/server/server.go (1)

client/status/status.go (1)

ToProtoFullStatus (549-635)

management/server/mock_server/account_mock.go (1)

management/server/types/job.go (1)

Job (34-58)

management/server/nameserver_test.go (2)

management/server/account.go (1)

BuildManager (180-268)

management/server/job/manager.go (1)

NewJobManager (32-42)

management/server/job/channel.go (1)

management/server/job/manager.go (1)

Event (17-21)

management/server/http/handlers/peers/peers_handler.go (4)

management/server/context/auth.go (1)

GetUserAuthFromContext (25-30)

shared/management/http/util/util.go (3)

WriteError (84-120)

WriteErrorResponse (70-80)

WriteJSONObject (27-35)

shared/management/http/api/types.gen.go (3)

JobRequest (708-710)

JobResponse (713-721)

JobResponseStatus (724-724)

management/server/types/job.go (3)

NewJob (67-103)

Job (34-58)

Workload (60-64)

client/jobexec/executor.go (3)

client/internal/debug/debug.go (3)

GeneratorDependencies (238-243)

BundleConfig (232-236)

NewBundleGenerator (245-264)

client/internal/debug/upload.go (1)

UploadDebugBundle (17-28)

upload-server/types/upload.go (1)

DefaultBundleURL (11-11)

client/server/debug.go (1)

client/internal/debug/upload.go (1)

UploadDebugBundle (17-28)

client/internal/engine.go (6)

client/internal/profilemanager/config.go (1)

Config (89-160)

client/jobexec/executor.go (3)

Executor (23-24)

NewExecutor (26-28)

ErrJobNotImplemented (20-20)

shared/management/client/client.go (1)

Client (14-27)

shared/management/proto/management.pb.go (21)

JobRequest (388-398)

JobRequest (413-413)

JobRequest (428-430)

JobResponse (463-475)

JobResponse (490-490)

JobResponse (505-507)

JobStatus_failed (30-30)

JobRequest_Bundle (457-459)

JobRequest_Bundle (461-461)

JobStatus_succeeded (29-29)

BundleParameters (554-563)

BundleParameters (578-578)

BundleParameters (593-595)

JobResponse_Bundle (548-550)

JobResponse_Bundle (552-552)

SyncResponse (721-738)

SyncResponse (753-753)

SyncResponse (768-770)

BundleResult (625-631)

BundleResult (646-646)

BundleResult (661-663)

client/internal/state.go (1)

CtxGetState (31-33)

client/internal/debug/debug.go (2)

GeneratorDependencies (238-243)

BundleConfig (232-236)

management/internals/shared/grpc/server.go (6)

management/server/account/manager.go (1)

Manager (27-131)

management/server/job/manager.go (2)

Manager (23-30)

Event (17-21)

shared/management/proto/management_grpc.pb.go (1)

ManagementService_JobServer (427-431)

shared/management/proto/management.pb.go (9)

JobRequest (388-398)

JobRequest (413-413)

JobRequest (428-430)

JobResponse (463-475)

JobResponse (490-490)

JobResponse (505-507)

EncryptedMessage (322-333)

EncryptedMessage (348-348)

EncryptedMessage (363-365)

management/server/job/channel.go (2)

Channel (18-21)

ErrJobChannelClosed (15-15)

encryption/message.go (1)

EncryptMessage (10-24)

management/server/store/sql_store.go (1)

management/server/types/job.go (3)

Job (34-58)

JobStatusPending (18-18)

JobStatusFailed (20-20)

management/server/job/manager.go (3)

shared/management/proto/management.pb.go (6)

JobRequest (388-398)

JobRequest (413-413)

JobRequest (428-430)

JobResponse (463-475)

JobResponse (490-490)

JobResponse (505-507)

management/server/job/channel.go (2)

Channel (18-21)

NewChannel (23-29)

management/server/types/job.go (1)

Job (34-58)

shared/management/client/client_test.go (3)

management/server/job/manager.go (1)

NewJobManager (32-42)

management/server/account.go (1)

BuildManager (180-268)

management/internals/shared/grpc/server.go (1)

NewServer (84-147)

client/internal/connect.go (2)

client/internal/engine.go (2)

NewEngine (235-276)

EngineConfig (83-142)

client/internal/profilemanager/config.go (1)

Config (89-160)

management/server/dns_test.go (2)

management/server/account.go (1)

BuildManager (180-268)

management/server/job/manager.go (1)

NewJobManager (32-42)

shared/management/client/grpc.go (3)

shared/management/proto/management.pb.go (13)

JobRequest (388-398)

JobRequest (413-413)

JobRequest (428-430)

JobResponse (463-475)

JobResponse (490-490)

JobResponse (505-507)

EncryptedMessage (322-333)

EncryptedMessage (348-348)

EncryptedMessage (363-365)

JobStatus_failed (30-30)

SyncResponse (721-738)

SyncResponse (753-753)

SyncResponse (768-770)

shared/management/proto/management_grpc.pb.go (2)

ManagementService_JobClient (169-173)

ManagementService_SyncClient (89-92)

encryption/message.go (2)

EncryptMessage (10-24)

DecryptMessage (27-40)

management/server/types/job.go (3)

shared/management/proto/management.pb.go (22)

JobStatus (25-25)

JobStatus (57-59)

JobStatus (61-63)

JobStatus (70-72)

JobRequest (388-398)

JobRequest (413-413)

JobRequest (428-430)

BundleParameters (554-563)

BundleParameters (578-578)

BundleParameters (593-595)

BundleResult (625-631)

BundleResult (646-646)

BundleResult (661-663)

JobResponse (463-475)

JobResponse (490-490)

JobResponse (505-507)

JobStatus_succeeded (29-29)

JobStatus_failed (30-30)

JobResponse_Bundle (548-550)

JobResponse_Bundle (552-552)

JobRequest_Bundle (457-459)

JobRequest_Bundle (461-461)

shared/management/http/api/types.gen.go (8)

JobRequest (708-710)

WorkloadResponse (1948-1950)

BundleParameters (361-373)

BundleResult (376-378)

BundleWorkloadResponse (391-399)

WorkloadTypeBundle (195-195)

WorkloadRequest (1943-1945)

JobResponse (713-721)

shared/management/status/error.go (4)

Errorf (70-75)

BadRequest (36-36)

InvalidArgument (27-27)

Error (54-57)

client/server/server_test.go (3)

management/server/job/manager.go (1)

NewJobManager (32-42)

management/server/account.go (1)

BuildManager (180-268)

management/internals/shared/grpc/server.go (1)

NewServer (84-147)

client/internal/debug/debug.go (4)

util/log.go (1)

SpecialLogs (25-28)

client/internal/profilemanager/profilemanager.go (1)

NewProfileManager (56-58)

client/status/status.go (3)

ToProtoFullStatus (549-635)

ConvertToStatusOutputOverview (122-169)

ParseToFullDetailSummary (532-547)

version/version.go (1)

NetbirdVersion (18-20)

client/internal/engine_test.go (5)

client/internal/engine.go (2)

NewEngine (235-276)

EngineConfig (83-142)

shared/management/client/mock.go (1)

MockClient (13-24)

management/server/job/manager.go (1)

NewJobManager (32-42)

management/server/account.go (1)

BuildManager (180-268)

management/internals/shared/grpc/server.go (1)

NewServer (84-147)

shared/management/http/api/types.gen.go (2)

shared/management/proto/management.pb.go (9)

BundleParameters (554-563)

BundleParameters (578-578)

BundleParameters (593-595)

BundleResult (625-631)

BundleResult (646-646)

BundleResult (661-663)

JobRequest (388-398)

JobRequest (413-413)

JobRequest (428-430)

management/server/types/job.go (1)

Workload (60-64)

management/server/route_test.go (2)

management/server/account.go (1)

BuildManager (180-268)

management/server/job/manager.go (1)

NewJobManager (32-42)

client/status/status.go (2)

client/proto/daemon.pb.go (21)

FullStatus (1994-2008)

FullStatus (2021-2021)

FullStatus (2036-2038)

ManagementState (1682-1689)

ManagementState (1702-1702)

ManagementState (1717-1719)

SignalState (1621-1628)

SignalState (1641-1641)

SignalState (1656-1658)

LocalPeerState (1528-1539)

LocalPeerState (1552-1552)

LocalPeerState (1567-1569)

PeerState (1347-1369)

PeerState (1382-1382)

PeerState (1397-1399)

RelayState (1743-1750)

RelayState (1763-1763)

RelayState (1778-1780)

NSGroupState (1803-1811)

NSGroupState (1824-1824)

NSGroupState (1839-1841)

client/internal/peer/status.go (6)

FullStatus (151-161)

ManagementState (128-132)

SignalState (121-125)

LocalPeerState (106-112)

RosenpassState (135-138)

NSGroupState (142-148)

management/server/management_proto_test.go (3)

management/server/job/manager.go (1)

NewJobManager (32-42)

management/server/account.go (1)

BuildManager (180-268)

management/internals/shared/grpc/server.go (1)

NewServer (84-147)

client/cmd/testutil_test.go (3)

management/server/job/manager.go (1)

NewJobManager (32-42)

management/server/account.go (1)

BuildManager (180-268)

management/internals/shared/grpc/server.go (1)

NewServer (84-147)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: release_ui
GitHub Check: release_ui_darwin
GitHub Check: release
GitHub Check: iOS / Build
GitHub Check: Android / Build
GitHub Check: JS / Lint
GitHub Check: Windows
GitHub Check: Linux
GitHub Check: Darwin
GitHub Check: Client / Unit
GitHub Check: Client / Unit
GitHub Check: Build Cache
GitHub Check: Check External GPL/AGPL Licenses
GitHub Check: Client / Unit

coderabbitai · 2025-11-21T12:55:33Z

client/jobexec/executor.go

+const (
+	MaxBundleWaitTime = 60 * time.Minute // maximum wait time for bundle generation (1 hour)
+)


⚠️ Potential issue | 🟠 Major

Respect context cancellation during pre‑bundle wait

waitFor currently swallows ctx.Done() and only logs:

select { case <-time.After(duration): case <-ctx.Done(): log.Infof("wait cancelled: %v", ctx.Err()) }

BundleJob ignores whether the wait was cancelled and always proceeds to generate and upload the bundle. That means a cancelled job (e.g. client disconnected or management cancelled the job) will still cause heavy work on the client.

To make bundle jobs behave like other context‑aware operations, consider:

Letting waitFor return error (or bool) and, in BundleJob, short‑circuit if ctx.Err() != nil after the wait:

-func waitFor(ctx context.Context, duration time.Duration) { +func waitFor(ctx context.Context, duration time.Duration) error { log.Infof("wait for %v minutes before executing debug bundle", duration.Minutes()) select { case <-time.After(duration): - case <-ctx.Done(): - log.Infof("wait cancelled: %v", ctx.Err()) + return nil + case <-ctx.Done(): + log.Infof("wait cancelled: %v", ctx.Err()) + return ctx.Err() } + return nil } - if waitForDuration > 0 { - waitFor(ctx, waitForDuration) - } + if waitForDuration > 0 { + if err := waitFor(ctx, waitForDuration); err != nil { + return "", err + } + }

This keeps the behavior for successful waits while correctly aborting on cancellation.

Also applies to: 30-57, 59-66

coderabbitai · 2025-11-21T12:55:33Z

management/server/job/channel.go

+// todo consider the channel buffer size when we allow to run multiple jobs
+const jobChannelBuffer = 1
+
+var (
+	ErrJobChannelClosed = errors.New("job channel closed")
+)
+
+type Channel struct {
+	events chan *Event
+	once   sync.Once
+}


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Guard against send‑on‑closed channel and reconsider where timeout is enforced

Two concerns around Channel’s concurrency and semantics:

Potential panic on send after Close
Close() closes jc.events, and AddEvent always does jc.events <- event in the select case without checking for closure. If Close can race with AddEvent (which is likely when job streams disconnect while new jobs are being queued), a send on a closed channel will panic the process.
Consider adding internal state to mark the channel as closed and have AddEvent short‑circuit with ErrJobChannelClosed (or similar) instead of ever sending to a potentially closed channel, or ensure by design that only the sender closes and that no further AddEvent calls happen after Close.

Timeout at enqueue vs. job response (already noted by your TODO)
The current responseWait timeout only limits how long you wait to enqueue into the buffered channel, not how long you wait for the peer’s job response. With jobChannelBuffer = 1, this effectively prevents a second job from being queued while the first one is still pending, but does not clean up pending jobs when a peer never responds. Moving timeout handling to the job lifecycle (pending → failed) in the manager would better match the comment.

I’d treat the send‑on‑closed‑channel risk as something to fix before merging; the timeout refactor can follow.

Also applies to: 31-41, 43-59

I'll examine the full implementation to validate the concurrency concerns.

--- 🏁 Script executed: ```shell #!/bin/bash # Retrieve the full channel.go file to examine Close(), AddEvent(), and Event() implementations find . -name "channel.go" -path "*/job/*" -type f | head -5 | xargs -I {} cat {}

Length of output: 1183

Guard against send-on-closed channel and reconsider where timeout is enforced

Two concerns around Channel's concurrency and semantics:

Potential panic on send after Close
Close() uses sync.Once to close jc.events exactly once, but this only prevents double-close, not send-on-closed-channel panics. If AddEvent() is called concurrently with Close(), the select case that sends to jc.events can execute after the channel is closed, causing a panic. The sync.Once guard does not prevent this race.
Add an internal closed flag (guarded by mutex) that AddEvent checks before entering the select, or ensure by documented contract and design that AddEvent cannot be called after Close(). This must be fixed before merging.

Timeout at enqueue vs. job response (confirmed by TODO)
The responseWait timeout only limits how long to wait to enqueue into the buffered channel, not how long to wait for the peer's job response. With jobChannelBuffer = 1, this prevents a second job from being queued while the first is pending, but does not clean up pending jobs when a peer never responds. The TODO comment (lines 34–35) confirms this is a known issue. Move timeout handling to the job lifecycle (pending → failed) in the manager.

🤖 Prompt for AI Agents

In management/server/job/channel.go around lines 11–21, AddEvent can panic if it races with Close because closing the channel is guarded only by sync.Once; add an internal boolean closed guarded by a mutex (or RWMutex) that Close sets and AddEvent checks before attempting to send, returning ErrJobChannelClosed if closed to avoid send-on-closed panics; also make the send non-blocking/select with a default path to return an error if the channel is closed or full. Separately, move the timeout logic out of the enqueue path into the job lifecycle in the manager: stop using responseWait to gate channel enqueueing (keep enqueue quick or increase buffer), and implement a manager-side pending→failed transition with a timer for each job so unresponsive peers are timed out and cleaned up rather than relying on the channel buffer to enforce timeouts.

coderabbitai · 2025-11-21T12:55:34Z

management/server/peer.go

+func (am *DefaultAccountManager) CreatePeerJob(ctx context.Context, accountID, peerID, userID string, job *types.Job) error {
+	// todo: Create permissions for job
+	allowed, err := am.permissionsManager.ValidateUserPermissions(ctx, accountID, userID, modules.Peers, operations.Delete)
+	if err != nil {
+		return status.NewPermissionValidationError(err)
+	}
+	if !allowed {
+		return status.NewPermissionDeniedError()
+	}
+
+	peerAccountID, err := am.Store.GetAccountIDByPeerID(ctx, store.LockingStrengthNone, peerID)
+	if err != nil {
+		return err
+	}
+
+	if peerAccountID != accountID {
+		return status.NewPeerNotPartOfAccountError()
+	}
+
+	// check if peer connected
+	if !am.jobManager.IsPeerConnected(peerID) {
+		return status.Errorf(status.BadRequest, "peer not connected")
+	}
+
+	// check if already has pending jobs
+	// todo: The job checks here are not protected. The user can run this function from multiple threads,
+	// and each thread can think there is no job yet. This means entries in the pending job map will be overwritten,
+	// and only one will be kept, but potentially another one will overwrite it in the queue.
+	if am.jobManager.IsPeerHasPendingJobs(peerID) {
+		return status.Errorf(status.BadRequest, "peer already has pending job")
+	}
+
+	jobStream, err := job.ToStreamJobRequest()
+	if err != nil {
+		return status.Errorf(status.BadRequest, "invalid job request %v", err)
+	}
+
+	// try sending job first
+	if err := am.jobManager.SendJob(ctx, accountID, peerID, jobStream); err != nil {
+		return status.Errorf(status.Internal, "failed to send job: %v", err)
+	}
+
+	var peer *nbpeer.Peer
+	var eventsToStore func()
+
+	// persist job in DB only if send succeeded
+	err = am.Store.ExecuteInTransaction(ctx, func(transaction store.Store) error {
+		peer, err = transaction.GetPeerByID(ctx, store.LockingStrengthUpdate, accountID, peerID)
+		if err != nil {
+			return err
+		}
+		if err := transaction.CreatePeerJob(ctx, job); err != nil {
+			return err
+		}
+
+		jobMeta := map[string]any{
+			"for_peer_name": peer.Name,
+			"job_type":      job.Workload.Type,
+		}
+
+		eventsToStore = func() {
+			am.StoreEvent(ctx, userID, peer.ID, accountID, activity.JobCreatedByUser, jobMeta)
+		}
+		return nil
+	})
+	if err != nil {
+		return err
+	}
+	eventsToStore()
+	return nil
+}


⚠️ Potential issue | 🟠 Major

Peer job management: ownership check bug, nil‑safety, and ordering concerns

The new peer‑job methods are a good start, but there are a few issues worth addressing:

Job ownership validation bug (path vs data mismatch)

GetPeerJobByID ignores the peerID argument when returning the job:
job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID) // no check that job.PeerID == peerID

The HTTP handler GET /peers/{peerId}/jobs/{jobId} passes both peerID and jobID, but the account manager currently only checks that the peer belongs to the account, not that the job belongs to that peer.

Result: a caller can request /peers/{peerA}/jobs/{jobOfPeerB} (within the same account) and see the job for peerB.

Fix: after loading the job, assert job.PeerID == peerID and return status.NotFound or status.PermissionDenied otherwise:
func (am *DefaultAccountManager) GetPeerJobByID(ctx context.Context, accountID, userID, peerID, jobID string) (*types.Job, error) { // ... existing permission + peerAccountID checks ...

job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID)

job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID) if err != nil { return nil, err }

if job.PeerID != peerID {

return nil, status.NewPeerNotPartOfAccountError()

} return job, nil
}

Potential nil‑pointer on am.jobManager

CreatePeerJob assumes am.jobManager is non‑nil:
if !am.jobManager.IsPeerConnected(peerID) { ... } if am.jobManager.IsPeerHasPendingJobs(peerID) { ... } if err := am.jobManager.SendJob(...); err != nil { ... }

If BuildManager is ever called with a nil JobManager (e.g., in older tests or misconfigured environments), this will panic on first use of the job API.

Consider a defensive check up front:
if am.jobManager == nil { return status.Errorf(status.Internal, "job manager is not configured") }

Or make the non‑nil requirement explicit and enforced in construction/tests.

Send‑before‑persist ordering can desync DB and in‑memory job state

The method currently:

Checks for pending jobs.

Builds jobStream and calls SendJob, which will enqueue/send work to the peer.

Only then writes the job to the DB via CreatePeerJob inside a transaction.

If step (3) fails (DB error, constraint, etc.), CreatePeerJob returns an error but the job has already been dispatched to the peer, potentially leaving:

A job executing on the client but absent from persistent storage and from /jobs listings.

Consider flipping the order:

Persist the job first (status = pending), then send it to the peer, and if sending fails, update the job to failed with an appropriate FailedReason. That keeps DB and runtime state consistent at the cost of occasionally having “failed to send” jobs in the DB.

Permissions reused from Peers/Delete

Using modules.Peers + operations.Delete as the gate for creating and viewing jobs is strict but workable short‑term. The TODO about introducing dedicated job permissions is valid; when you add those, these sites will be the key ones to update.

Overall structure is solid; tightening ownership checks and jobManager assumptions will make the new job API much safer.

Also applies to: 389-441, 618-638

coderabbitai · 2025-11-21T12:55:34Z

management/server/types/job.go

+// ApplyResponse validates and maps a proto.JobResponse into the Job fields.
+func (j *Job) ApplyResponse(resp *proto.JobResponse) error {
+	if resp == nil {
+		return nil
+	}
+
+	j.ID = string(resp.ID)
+	now := time.Now().UTC()
+	j.CompletedAt = &now
+	switch resp.Status {
+	case proto.JobStatus_succeeded:
+		j.Status = JobStatusSucceeded
+	case proto.JobStatus_failed:
+		j.Status = JobStatusFailed
+		if len(resp.Reason) > 0 {
+			reason := string(resp.Reason)
+			if len(resp.Reason) > MaxJobReasonLength {
+				reason = string(resp.Reason[:MaxJobReasonLength]) + "... (truncated)"
+			}
+			j.FailedReason = fmt.Sprintf("Client error: '%s'", reason)
+		}
+		return nil
+	default:
+		return fmt.Errorf("unexpected job status: %v", resp.Status)
+	}
+
+	// Handle workload results (oneof)
+	var err error
+	switch r := resp.WorkloadResults.(type) {
+	case *proto.JobResponse_Bundle:
+		if j.Workload.Result, err = json.Marshal(r.Bundle); err != nil {
+			return fmt.Errorf("failed to marshal workload results: %w", err)
+		}
+	default:
+		return fmt.Errorf("unsupported workload response type: %T", r)
+	}
+	return nil
+}


⚠️ Potential issue | 🔴 Critical

Likely JSON shape mismatch between proto.BundleResult and api.BundleResult

In ApplyResponse, for a succeeded bundle job you serialize the proto result directly:

case *proto.JobResponse_Bundle: if j.Workload.Result, err = json.Marshal(r.Bundle); err != nil { return fmt.Errorf("failed to marshal workload results: %w", err) }

Later, buildBundleResponse does:

var r api.BundleResult if err := json.Unmarshal(j.Workload.Result, &r); err != nil { return fmt.Errorf("invalid result for bundle job: %w", err) }

However, the types differ:

proto.BundleResult has UploadKey String (a protobuf message type, not a plain string).

api.BundleResult has UploadKey *string.

Encoding a protobuf message with encoding/json will typically produce an object like:

{ "upload_key": { "value": "..." } }

whereas api.BundleResult expects:

{ "upload_key": "..." }

This will cause json.Unmarshal in buildBundleResponse to fail (cannot unmarshal object into Go value of type string), meaning succeeded jobs will not be convertible to HTTP WorkloadResponse.

I’d strongly recommend normalizing here by mapping the proto result into the HTTP shape instead of serializing the proto struct directly, e.g.:

case *proto.JobResponse_Bundle: - if j.Workload.Result, err = json.Marshal(r.Bundle); err != nil { - return fmt.Errorf("failed to marshal workload results: %w", err) - } + apiResult := api.BundleResult{} + if r.Bundle != nil { + // Assuming proto.String carries a `Value` or similar; adapt field access accordingly. + key := r.Bundle.UploadKey // map to string as appropriate + apiResult.UploadKey = &key + } + if j.Workload.Result, err = json.Marshal(apiResult); err != nil { + return fmt.Errorf("failed to marshal workload results: %w", err) + }

(Adjust field access/unwrap logic to whatever String actually exposes.)

This keeps the stored JSON shape aligned with api.BundleResult and avoids runtime errors in BuildWorkloadResponse.

sonarqubecloud · 2025-11-21T13:04:25Z

Quality Gate failed

Failed conditions
1 New issue
1 New Code Smells (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62ae8d8 and de78255.

📒 Files selected for processing (3)

client/internal/connect.go (7 hunks)
client/internal/engine.go (13 hunks)
client/internal/engine_test.go (9 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

client/internal/engine_test.go (5)

client/internal/engine.go (2)

NewEngine (235-276)

EngineConfig (83-142)

shared/management/client/mock.go (1)

MockClient (13-24)

management/server/job/manager.go (1)

NewJobManager (32-42)

management/server/account.go (1)

BuildManager (180-268)

management/internals/shared/grpc/server.go (1)

NewServer (84-147)

client/internal/engine.go (7)

client/internal/profilemanager/config.go (1)

Config (89-160)

client/jobexec/executor.go (3)

Executor (23-24)

NewExecutor (26-28)

ErrJobNotImplemented (20-20)

shared/management/client/client.go (1)

Client (14-27)

shared/management/proto/management.pb.go (19)

JobRequest (388-398)

JobRequest (413-413)

JobRequest (428-430)

JobResponse (463-475)

JobResponse (490-490)

JobResponse (505-507)

JobStatus_failed (30-30)

JobRequest_Bundle (457-459)

JobRequest_Bundle (461-461)

JobStatus_succeeded (29-29)

BundleParameters (554-563)

BundleParameters (578-578)

BundleParameters (593-595)

SyncResponse (721-738)

SyncResponse (753-753)

SyncResponse (768-770)

BundleResult (625-631)

BundleResult (646-646)

BundleResult (661-663)

client/internal/state.go (1)

CtxGetState (31-33)

client/internal/debug/debug.go (2)

GeneratorDependencies (238-243)

BundleConfig (232-236)

client/server/server.go (1)

New (96-107)

client/internal/connect.go (3)

client/internal/engine.go (2)

NewEngine (235-276)

EngineConfig (83-142)

client/internal/profilemanager/config.go (1)

Config (89-160)

shared/management/proto/management.pb.go (3)

PeerConfig (2082-2098)

PeerConfig (2113-2113)

PeerConfig (2128-2130)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)

GitHub Check: Relay / Unit (amd64, -race)
GitHub Check: Management / Unit (amd64, mysql)
GitHub Check: Management / Unit (amd64, postgres)
GitHub Check: Management / Unit (amd64, sqlite)
GitHub Check: Relay / Unit (386)
GitHub Check: Management / Integration (amd64, postgres)
GitHub Check: Management / Benchmark (API) (amd64, postgres)
GitHub Check: Client (Docker) / Unit
GitHub Check: Management / Benchmark (API) (amd64, sqlite)
GitHub Check: Client / Unit (amd64)
GitHub Check: Management / Benchmark (amd64, sqlite)
GitHub Check: Management / Benchmark (amd64, postgres)
GitHub Check: Client / Unit (386)
GitHub Check: Client / Unit
GitHub Check: release
GitHub Check: release_ui_darwin
GitHub Check: Android / Build
GitHub Check: Linux
GitHub Check: Client / Unit
GitHub Check: Darwin
GitHub Check: Windows
GitHub Check: JS / Lint
GitHub Check: Client / Unit

🔇 Additional comments (16)

client/internal/connect.go (5)

65-66: LGTM! logPath parameter threading is clean.

The public Run signature now accepts logPath and forwards it correctly. Mobile variants appropriately default to empty string, which is sensible given platform-specific logging constraints.

Also applies to: 87-87, 105-105

108-108: LGTM!

Internal run signature correctly updated to receive logPath.

251-251: LGTM!

Call to createEngineConfig correctly passes logPath parameter.

275-275: LGTM!

NewEngine call correctly updated with profile config parameter, enabling debug bundle generation support.

414-414: LGTM! EngineConfig enriched for debug bundle support.

The function signature and field assignments correctly thread logPath and ProfileConfig through to EngineConfig. Note that the entire profilemanager.Config is stored in ProfileConfig—this appears intentional for debug bundle generation, which may need access to ManagementURL and other config details.

Also applies to: 449-452

client/internal/engine_test.go (3)

28-28: LGTM!

Import added to support JobManager usage in test setup.

256-256: LGTM! Test updates are consistent.

All NewEngine calls correctly updated with two additional nil parameters (checks, profileConfig). Using nil is appropriate for tests that don't exercise debug bundle or checks functionality.

Also applies to: 418-424, 637-643, 802-808, 1004-1010, 1536-1536

1595-1595: LGTM! Test infrastructure correctly wired.

JobManager created and threaded through BuildManager and NewServer, enabling job subsystem testing. Using nil for metrics is appropriate for test context.

Also applies to: 1626-1626, 1632-1632

client/internal/engine.go (8)

34-34: LGTM!

Imports added to support debug bundle generation and job execution.

Also applies to: 52-52

138-141: LGTM!

EngineConfig fields added to support debug bundle generation. Clear comment documents the purpose.

206-207: LGTM! Proper concurrency primitives added.

syncRespMux provides dedicated locking for sync response persistence, separate from syncMsgMux. jobExecutorWG ensures clean shutdown of job-related goroutines.

Also applies to: 220-221

234-234: LGTM!

NewEngine signature correctly extended with profile config parameter. jobExecutor initialized during construction for handling management-service job requests.

Also applies to: 254-254

335-335: LGTM! Proper shutdown ordering.

jobExecutorWG.Wait() ensures job-related goroutines complete before resource cleanup. The preceding e.cancel() will signal job goroutines to exit via context cancellation.

523-523: LGTM! receiveJobEvents follows established patterns.

The function correctly:

Increments jobExecutorWG before spawning goroutine

Handles job requests synchronously via callback

Triggers client restart on Job stream errors (consistent with Management/Signal event handlers)

Returns proper error codes for unimplemented job types

Also applies to: 942-977

803-814: LGTM! Proper lock separation for sync response persistence.

The code correctly uses syncRespMux (not syncMsgMux) to protect sync response persistence state. RLock for reading the flag, then separate write lock for storing the response—proper lock upgrade pattern.

1874-1886: LGTM! Thread-safe sync response persistence.

Both functions correctly use syncRespMux for protection. GetLatestSyncResponse properly:

Reads state under RLock

Releases lock before expensive clone operation

Returns a clone to prevent external mutation

Also applies to: 1889-1911

coderabbitai · 2025-11-21T13:08:31Z

client/internal/engine.go

+func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) {
+	log.Infof("handle remote debug bundle request: %s", params.String())
+	syncResponse, err := e.GetLatestSyncResponse()
+	if err != nil {
+		log.Warnf("get latest sync response: %v", err)
+	}
+
+	bundleDeps := debug.GeneratorDependencies{
+		InternalConfig: e.config.ProfileConfig,
+		StatusRecorder: e.statusRecorder,
+		SyncResponse:   syncResponse,
+		LogPath:        e.config.LogPath,
+	}
+
+	bundleJobParams := debug.BundleConfig{
+		Anonymize:         params.Anonymize,
+		IncludeSystemInfo: true,
+		LogFileCount:      uint32(params.LogFileCount),
+	}
+
+	waitFor := time.Duration(params.BundleForTime) * time.Minute
+
+	uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String())
+	if err != nil {
+		return nil, err
+	}
+
+	response := &mgmProto.JobResponse_Bundle{
+		Bundle: &mgmProto.BundleResult{
+			UploadKey: uploadKey,
+		},
+	}
+	return response, nil
+}


⚠️ Potential issue | 🟡 Minor

Potential nil pointer dereference on Line 1001.

The code accesses e.config.ProfileConfig.ManagementURL.String() without checking if ProfileConfig is nil. While tests pass nil, they don't exercise this path. For robustness, consider adding a nil check.

Apply this defensive check:

+ if e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil { + return nil, errors.New("profile config or management URL not available for bundle generation") + } + uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String())

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) {

log.Infof("handle remote debug bundle request: %s", params.String())

syncResponse, err := e.GetLatestSyncResponse()

if err != nil {

log.Warnf("get latest sync response: %v", err)

}

bundleDeps := debug.GeneratorDependencies{

InternalConfig: e.config.ProfileConfig,

StatusRecorder: e.statusRecorder,

SyncResponse: syncResponse,

LogPath: e.config.LogPath,

}

bundleJobParams := debug.BundleConfig{

Anonymize: params.Anonymize,

IncludeSystemInfo: true,

LogFileCount: uint32(params.LogFileCount),

}

waitFor := time.Duration(params.BundleForTime) * time.Minute

uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String())

if err != nil {

return nil, err

}

response := &mgmProto.JobResponse_Bundle{

Bundle: &mgmProto.BundleResult{

UploadKey: uploadKey,

},

}

return response, nil

}

func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) {

log.Infof("handle remote debug bundle request: %s", params.String())

syncResponse, err := e.GetLatestSyncResponse()

if err != nil {

log.Warnf("get latest sync response: %v", err)

}

bundleDeps := debug.GeneratorDependencies{

InternalConfig: e.config.ProfileConfig,

StatusRecorder: e.statusRecorder,

SyncResponse: syncResponse,

LogPath: e.config.LogPath,

}

bundleJobParams := debug.BundleConfig{

Anonymize: params.Anonymize,

IncludeSystemInfo: true,

LogFileCount: uint32(params.LogFileCount),

}

waitFor := time.Duration(params.BundleForTime) * time.Minute

if e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil {

return nil, errors.New("profile config or management URL not available for bundle generation")

}

uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String())

if err != nil {

return nil, err

}

response := &mgmProto.JobResponse_Bundle{

Bundle: &mgmProto.BundleResult{

UploadKey: uploadKey,

},

}

return response, nil

}

🤖 Prompt for AI Agents

In client/internal/engine.go around lines 979 to 1012, the call to e.config.ProfileConfig.ManagementURL.String() can panic if ProfileConfig or its ManagementURL is nil; add a defensive nil-check before calling String() and pass a safe fallback (e.g., empty string or configured default) into e.jobExecutor.BundleJob. Specifically, ensure e.config and e.config.ProfileConfig are non-nil and that ProfileConfig.ManagementURL is non-nil, compute a local managementURL string variable accordingly, and use that variable in the BundleJob call so the function never dereferences a nil pointer.

[Management/Client] Trigger debug bundle runs from API/Dashboard (#4592)

62ae8d8

Bugfixes

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

Merge branch 'main' into feature/remote-debug-release

de78255

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Management/Client] Trigger debug bundle runs from API/Dashboard (#4592) #4832

[Management/Client] Trigger debug bundle runs from API/Dashboard (#4592) #4832

Uh oh!

pappz commented Nov 21, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 21, 2025

Uh oh!

coderabbitai bot Nov 21, 2025

Uh oh!

coderabbitai bot Nov 21, 2025

Uh oh!

coderabbitai bot Nov 21, 2025

Uh oh!

sonarqubecloud bot commented Nov 21, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Management/Client] Trigger debug bundle runs from API/Dashboard (#4592) #4832

Are you sure you want to change the base?

[Management/Client] Trigger debug bundle runs from API/Dashboard (#4592) #4832

Uh oh!

Conversation

pappz commented Nov 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Issue ticket number and link

Stack

Checklist

Documentation

Docs PR URL (required if "docs added" is checked)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Nov 21, 2025

Quality Gate failed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pappz commented Nov 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 21, 2025 •

edited

Loading