Skip to content

Conversation

@leynos
Copy link
Owner

@leynos leynos commented Nov 24, 2025

Summary

  • Adds tests and simplifies which resolver logic by extracting workspace fallback into its own OS-aware module and centralizing path handling.
  • Introduces per-platform backends for workspace lookups, consolidates candidate resolution, and reduces per-platform branching in the direct path resolution path.
  • Replaces the former workspace.rs with a new workspace/mod.rs wiring in platform-specific backends (posix.rs and windows.rs).
  • Updates tests, fixtures, and docs to reflect the new workspace-aware structure and usage.

Changes

  • Core resolver refactor

    • Introduced a dedicated workspace fallback module under src/stdlib/which/lookup/workspace with OS-specific backends (posix.rs and windows.rs).
    • Replaced the former workspace.rs with a new workspace/mod.rs that wires in the platform-specific backends.
    • Added a platform-agnostic candidates_for_dir helper to centralize candidate resolution per directory.
    • Centralized direct path handling with normalize_direct_path and is_executable checks; removed per-platform branching in resolve_direct where feasible.
    • Added should_visit_entry and workspace fallback max depth/skip configuration constants to the new workspace module.
    • Retained and reused the EnvSnapshot-based testing approach, but updated calls to the new workspace API shape.
  • Tests and test infrastructure

    • Expanded test support in src/stdlib/register.rs to cover more non-Unix file types (pipes, block/char devices) where supported.
    • Reworked tests under src/stdlib/which/lookup/tests.rs to align with the new workspace module and EnvSnapshot usage; added tests for path-with-invalid-UTF-8 and relative PATH entries resolving against CWD.
    • Introduced a new workspace-aware test module setup under tests/std_filter_tests/which_filter_common.rs and adjusted which tests to use a reusable WhichTestFixture, simplifying test fixtures and improving test isolation.
    • Updated integration tests in tests/std_filter_tests/which_filter_tests.rs to reflect the new fixture and workspace fallback behavior, including cases for fresh lookups and cached results when executables are removed.
    • Added tests for path resolution with relative entries and PATHEXT handling where relevant (Windows).
  • Documentation

    • docs/netsuke-design.md updated to document workspace fallback traversals, max depth, heavy-directory skipping, and NETSUKE_WHICH_WORKSPACE env var to disable the fallback for latency-sensitive scenarios.

Rationale

  • Clear separation of concerns: moving workspace fallback logic into its own module with platform-specific backends clarifies responsibilities and makes the resolver easier to maintain.
  • Performance and safety: the workspace fallback is bounded (depth 6) and skips heavy directories to prevent excessive traversal on large trees.
  • Consistent testing surface: expanding tests for PATH handling, invalid UTF-8 in PATH, and relative PATH entries improves reliability across Unix and Windows platforms.
  • Aligns tests with the refactored codepath: the new fixtures and test helpers reflect the updated resolver signatures and behavior.

Compatibility / Breaking changes

  • Internal API changes to the which/lookup workspace handling. The workspace fallback API now lives in a dedicated module with OS-specific backends. Tests and code references were updated accordingly.
  • Some non-Unix and Windows-specific code paths have been restructured; behavior remains functionally similar, with additional coverage for edge cases.

How to test

  • Run the full test suite: cargo test
  • Pay attention to Windows-specific tests around PATHEXT and workspace traversal; ensure PATH and PATHEXT are set in environments accordingly when running Windows-targeted tests.
  • Validate edge cases added in tests:
    • PATH entries with invalid UTF-8 components
    • Relative PATH entries resolving against the current working directory
    • Direct path resolution when PATH is not used, including canonicalisation behavior when fresh lookups are requested
  • Review updated docs to see the documented workspace fallback behavior and environment variable controls.

Notes

  • No user-facing API changes for end users; internal resolver internals and test scaffolding have been updated to improve maintainability and coverage.
  • The changes lay groundwork for more exhaustive cross-platform tests around which resolution, caching, and workspace fallbacks in future iterations.

📎 Task: https://www.terragonlabs.com/task/093f7066-ea7d-4255-8c8f-e7f02453291d

Summary by Sourcery

Refactor the which resolver to use an OS-aware workspace fallback module, centralize path candidate handling, and extend tests and docs around PATH, PATHEXT, and workspace traversal behavior.

Enhancements:

  • Introduce a workspace::mod module with POSIX and Windows backends to encapsulate workspace fallback traversal, skipping heavy directories and allowing it to be disabled via an environment variable.
  • Centralize directory candidate generation and direct-path normalization in the which lookup logic to reduce platform-specific branching and improve maintainability.
  • Extend non-Unix file predicates to cover pipes and device types, returning sensible defaults on unsupported platforms.

Documentation:

  • Document workspace fallback depth limits, directory skipping, and the NETSUKE_WHICH_WORKSPACE environment flag in netsuke-design.md.

Tests:

  • Extract shared which filter test fixtures into a reusable helper module and update integration tests to validate cache behavior, JSON list outputs, cwd_mode handling, and duplicate path canonicalization.
  • Broaden workspace lookup tests to cover invalid UTF-8 in PATH, relative PATH entries resolving against the current working directory, PATHEXT normalization on Windows, and direct-path executability errors on Unix.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 24, 2025

Warning

Rate limit exceeded

@leynos has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 11 minutes and 58 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 055a2e7 and b43f0f7.

📒 Files selected for processing (9)
  • src/stdlib/config.rs (2 hunks)
  • src/stdlib/register.rs (4 hunks)
  • src/stdlib/which/lookup/mod.rs (5 hunks)
  • src/stdlib/which/lookup/tests.rs (4 hunks)
  • src/stdlib/which/lookup/workspace/mod.rs (1 hunks)
  • src/stdlib/which/lookup/workspace/posix.rs (1 hunks)
  • src/stdlib/which/lookup/workspace/windows.rs (1 hunks)
  • src/stdlib/which/mod.rs (1 hunks)
  • tests/std_filter_tests/which_filter_common.rs (1 hunks)

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Summary by CodeRabbit

Release Notes

  • New Features

    • Extended file type detection with support for pipes, block devices, and character devices.
  • Documentation

    • Enhanced workspace fallback documentation with detailed configuration bounds, environment variable controls, and visual sequences.
  • Refactor

    • Unified platform-specific command resolution logic across Windows and POSIX systems.
    • Reorganised workspace fallback search into modular, maintainable components.
  • Tests

    • Expanded test coverage for edge cases including invalid path encoding and Windows extension handling.
    • Introduced centralised test fixture utilities for improved test consistency.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Unify direct-path resolution and per-directory candidate generation in the which resolver; add a modular workspace-fallback subsystem with platform-specific implementations, skip-list and environment toggle; expand non‑Unix file‑type predicates and centralise test fixtures and PATH isolation helpers.

Changes

Cohort / File(s) Summary
Documentation & Configuration
docs/netsuke-design.md
Correct wording and add Workspace fallback section: max depth 6, default skip list, NETSUKE_WHICH_WORKSPACE toggle, sequence diagram and narrative.
File Type Tests
src/stdlib/register.rs
Add non‑Unix stubs and predicates: is_fifo, is_block_device, is_char_device, is_device; extend FILE_TESTS and validate stubs on non‑Unix.
Lookup core refactor
src/stdlib/which/lookup/mod.rs
Consolidate per-dir candidate generation into candidates_for_dir; unify direct-path resolution into platform-gated resolve_direct; add normalize_direct_path; remove legacy per-platform direct resolvers.
Lookup tests refactor
src/stdlib/which/lookup/tests.rs
Centralise PATH handling via with_isolated_path; introduce WorkspaceSearchParams; unify workspace search invocation; add PATH UTF‑8, relative PATH, direct non‑executable and PATHEXT tests.
Monolithic workspace removed
src/stdlib/which/lookup/workspace.rs
Remove previous monolithic workspace implementation and its types/functions (skip list, params, search/collect routines, platform match logic).
Workspace dispatcher added
src/stdlib/which/lookup/workspace/mod.rs
Add workspace submodule dispatcher, WORKSPACE_MAX_DEPTH = 6, DEFAULT_WORKSPACE_SKIP_DIRS, WORKSPACE_FALLBACK_ENV, WorkspaceSkipList, WorkspaceSearchParams, search_workspace, and should_visit_entry.
Workspace POSIX implementation
src/stdlib/which/lookup/workspace/posix.rs
Add POSIX traversal: depth-limited WalkDir, exact filename matching, executable checks, UTF‑8 conversion error handling and collection semantics.
Workspace Windows implementation
src/stdlib/which/lookup/workspace/windows.rs
Add Windows traversal: PATHEXT-aware matching, case-insensitive basename logic, WorkspaceMatchContext, early-return collect_all semantics and robust entry conversion/logging.
Tests wiring
tests/std_filter_tests/mod.rs
Insert which_filter_common module into test suite.
Test fixtures
tests/std_filter_tests/which_filter_common.rs
Add fixture helpers: ToolName, DirName, Template, PathEnv, write_tool, WhichTestFixture, and rendering helpers for isolated workspace tests.
Test suite refactor
tests/std_filter_tests/which_filter_tests.rs
Replace inline fixture scaffolding with centralised helpers (CurrentDirGuard, render_and_assert_pure, cache and cwd tests); update assertions and JSON checks.
Cache tests
src/stdlib/which/cache.rs
Use with_isolated_path for PATH isolation in resolver skip-list test.
Test support
test_support/src/env.rs
Add with_isolated_path(value, action) to temporarily set PATH under an EnvLock and restore it afterwards.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Resolve as resolve_direct()
    participant Normalise as normalize_direct_path()
    participant Candidates as candidates_for_dir()
    participant FS as filesystem checks
    participant Result

    Caller->>Resolve: command, env, options
    Resolve->>Normalise: compute absolute/normalised path
    Normalise-->>Resolve: normalised_path
    Resolve->>Candidates: env, dir, command
    Candidates-->>Resolve: candidate_paths
    loop each candidate
        Resolve->>FS: exists & executable?
        alt match
            FS-->>Resolve: matched path
            Resolve-->>Result: return matches
        else no match
            FS-->>Resolve: continue
        end
    end
    Resolve-->>Caller: Result<Vec<Path>, Error>
Loading
sequenceDiagram
    participant Caller
    participant SearchWS as search_workspace()
    participant Toggle as workspace_fallback_enabled()
    participant Platform as platform_search_workspace()
    participant Walk as WalkDir
    participant Visit as should_visit_entry()
    participant Match as entry matches command?
    participant Result

    Caller->>SearchWS: env, command, params
    SearchWS->>Toggle: check NETSUKE_WHICH_WORKSPACE
    alt disabled
        Toggle-->>SearchWS: false
        SearchWS-->>Caller: Ok(empty)
    else enabled
        Toggle-->>SearchWS: true
        SearchWS->>Platform: delegate (posix/windows)
        Platform->>Walk: traverse env.cwd up to depth
        loop entries
            Walk->>Visit: skip_dirs?
            alt skip
                Visit-->>Walk: skip
            else visit
                Walk->>Match: basename/extension comparison
                alt match found
                    Match-->>Result: collect path
                    alt collect_all = false
                        Result-->>Platform: early return
                    end
                else no match
                    Match-->>Walk: continue
                end
            end
        end
        Platform-->>SearchWS: Vec(paths)
        SearchWS-->>Caller: Result
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Inspect workspace submodules for correctness on Windows (PATHEXT handling, case folding) and POSIX (executable checks, UTF‑8 error paths).
  • Verify resolve_direct/candidates_for_dir platform guards and interaction with existing which options.
  • Review test fixture and PATH isolation (with_isolated_path, EnvLock) for determinism and race conditions.

Possibly related PRs

Poem

Walk the workspace, skip the noise and strife,
PATHEXT hums on Windows, POSIX checks for life.
Tests lock PATH a moment, tools appear then fade,
Direct paths normalised, old branches softly laid.
Ship the search that hunts the roots — let binaries parade!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarises the primary change: refactoring the which resolver to use OS-specific workspace fallback modules, which is the core intent of this PR.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, covering the refactoring scope, test infrastructure updates, documentation changes, and the rationale behind them.
Docstring Coverage ✅ Passed Docstring coverage is 92.63% which is sufficient. The required threshold is 80.00%.

Warning

Review ran into problems

🔥 Problems

Errors were encountered while retrieving linked issues.

Errors (1)
  • UTF-8: Entity not found: Issue - Could not find referenced Issue.

Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Nov 24, 2025

Reviewer's Guide

Refactors the which resolver by extracting an OS-aware workspace fallback module, centralizing candidate and direct-path handling, and updating tests, fixtures, and docs to match the new behavior and interfaces.

Sequence diagram for which resolver workspace fallback traversal

sequenceDiagram
    participant Caller
    participant WhichResolver as WhichResolver_lookup
    participant HandleMiss as handle_miss
    participant WorkspaceMod as Workspace_workspace
    participant PlatformBackend as Platform_search_workspace
    participant WalkDir as WalkDir_iterator
    participant FS as Filesystem

    Caller->>WhichResolver: lookup(command, env, options)
    WhichResolver->>WhichResolver: scan_PATH_for_matches
    alt matches_found
        WhichResolver-->>Caller: Ok(matches)
    else no_matches
        WhichResolver->>HandleMiss: handle_miss(command, env, options)
        HandleMiss->>WorkspaceMod: search_workspace(env, command, collect_all)
        WorkspaceMod->>WorkspaceMod: workspace_fallback_enabled()
        alt fallback_disabled
            WorkspaceMod-->>HandleMiss: Ok([])
            HandleMiss-->>WhichResolver: not_found_error
            WhichResolver-->>Caller: Error(not_found)
        else fallback_enabled
            WorkspaceMod->>PlatformBackend: platform_search_workspace(env, command, collect_all)
            PlatformBackend->>WalkDir: new(env.cwd)
            loop bounded_depth_traversal
                WalkDir->>FS: read_next_entry()
                FS-->>WalkDir: Result<DirEntry,Error>
                WalkDir-->>PlatformBackend: entry_or_error
                PlatformBackend->>PlatformBackend: unwrap_or_log_error()
                alt unreadable_entry
                    PlatformBackend-->>PlatformBackend: skip_entry
                else readable_entry
                    PlatformBackend->>WorkspaceMod: should_visit_entry(entry)
                    alt skip_directory
                        PlatformBackend-->>PlatformBackend: skip_entry
                    else visit_entry
                        PlatformBackend->>PlatformBackend: process_workspace_entry(entry, command, ctx)
                        alt executable_match_found
                            PlatformBackend-->>PlatformBackend: add_to_matches
                            alt collect_all_false
                                PlatformBackend-->>WalkDir: stop_iteration
                            else collect_all_true
                                Note over PlatformBackend,WalkDir: continue_traversal
                            end
                        else not_a_match
                            PlatformBackend-->>PlatformBackend: continue
                        end
                    end
                end
            end
            PlatformBackend-->>WorkspaceMod: Ok(matches)
            WorkspaceMod-->>HandleMiss: Ok(matches)
            HandleMiss-->>WhichResolver: matches_or_error
            WhichResolver-->>Caller: result
        end
    end
Loading

Class diagram for refactored which workspace fallback module

classDiagram
    class WorkspaceMod {
        <<module>>
        +const WORKSPACE_MAX_DEPTH usize
        +const WORKSPACE_SKIP_DIRS [&str]
        +search_workspace(env EnvSnapshot, command str, collect_all bool) Result~Vec~Utf8PathBuf~~
        +should_visit_entry(entry DirEntry) bool
        -workspace_fallback_enabled() bool
    }

    class PosixBackend {
        <<module>>
        +search_workspace(env EnvSnapshot, command str, collect_all bool) Result~Vec~Utf8PathBuf~~
        -unwrap_or_log_error(walk_entry Result~DirEntry,Error~, command str) Option~DirEntry~
        -log_if_empty(matches Vec~Utf8PathBuf~, command str) void
        -collect_matching_executables(entries Iterator~Result~DirEntry,Error~~, command str, collect_all bool) Result~Vec~Utf8PathBuf~~
        -process_workspace_entry(entry DirEntry, command str) Result~Option~Utf8PathBuf~,Error~>
    }

    class WindowsBackend {
        <<module>>
        +search_workspace(env EnvSnapshot, command str, collect_all bool) Result~Vec~Utf8PathBuf~~
        -unwrap_or_log_error(walk_entry Result~DirEntry,Error~, command str) Option~DirEntry~
        -log_if_no_matches(matches Vec~Utf8PathBuf~, command str) void
        -process_workspace_entry(entry DirEntry, command str, ctx WorkspaceMatchContext) Result~Option~Utf8PathBuf~,Error~>
        -workspace_entry_matches(entry DirEntry, ctx WorkspaceMatchContext) bool
    }

    class WorkspaceMatchContext {
        -command_lower String
        -command_has_ext bool
        -basenames HashSet~String~
        +new(command str, env EnvSnapshot) WorkspaceMatchContext
    }

    class CollectionState {
        -matches Vec~Utf8PathBuf~
        -collect_all bool
        +new(collect_all bool) CollectionState
        +try_add(entry DirEntry, command str, ctx WorkspaceMatchContext) Result~bool,Error~
    }

    class EnvSnapshot {
        <<external>>
        +cwd Utf8PathBuf
        +pathext() Vec~String~
    }

    class WalkDirModule {
        <<external>>
        +new(path Utf8PathBuf) WalkDir
    }

    class DirEntry {
        <<external>>
        +file_type() FileType
        +file_name() OsStr
        +into_path() PathBuf
    }

    WorkspaceMod --> PosixBackend : uses
    WorkspaceMod --> WindowsBackend : uses
    WorkspaceMod --> EnvSnapshot : parameter
    WorkspaceMod --> DirEntry : parameter

    PosixBackend --> EnvSnapshot : parameter
    PosixBackend --> WalkDirModule : uses
    PosixBackend --> DirEntry : iterates

    WindowsBackend --> EnvSnapshot : parameter
    WindowsBackend --> WalkDirModule : uses
    WindowsBackend --> WorkspaceMatchContext : creates
    WindowsBackend --> CollectionState : creates
    WindowsBackend --> DirEntry : iterates

    WorkspaceMatchContext --> EnvSnapshot : reads PATHEXT

    CollectionState --> WorkspaceMatchContext : parameter
Loading

File-Level Changes

Change Details Files
Refactored which lookup to centralize direct-path and PATH-directory candidate handling with minimal platform-specific branching.
  • Added normalize_direct_path helper to compute absolute direct paths from commands and EnvSnapshot.cwd.
  • Unified resolve_direct into cfg-gated Windows/posix variants that share normalization and canonicalisation logic, using direct_candidates and is_executable instead of inlined per-platform checks.
  • Introduced candidates_for_dir helper to build candidate paths per directory (using PATHEXT on Windows and simple join on posix) and wired lookup() to call it instead of duplicating logic.
  • Simplified handle_miss to call search_workspace(env, ...) instead of platform-specific signatures, letting the workspace module handle OS differences.
src/stdlib/which/lookup/mod.rs
Extracted workspace fallback into an OS-aware module with shared configuration and traversal helpers.
  • Replaced the old workspace.rs with a new workspace/ submodule exposing search_workspace and should_visit_entry, with shared constants for WORKSPACE_MAX_DEPTH, WORKSPACE_SKIP_DIRS, and env-controlled enable/disable (NETSUKE_WHICH_WORKSPACE).
  • Implemented posix workspace traversal that walks from EnvSnapshot.cwd with walkdir, skips heavy directories, logs unreadable entries, filters by exact filename and executability, and returns UTF-8 paths or structured errors on non-UTF-8 paths.
  • Implemented Windows workspace traversal that uses a WorkspaceMatchContext to handle PATHEXT-style basenames, lowercased comparisons, and executable checks, collecting matches according to the all flag and logging misses and unreadable entries.
  • Adjusted workspace tests to call search_workspace(&EnvSnapshot, ...) uniformly, setting PATH via VarGuard where needed instead of passing platform-specific context values.
src/stdlib/which/lookup/workspace/mod.rs
src/stdlib/which/lookup/workspace/posix.rs
src/stdlib/which/lookup/workspace/windows.rs
src/stdlib/which/lookup/workspace.rs (removed)
src/stdlib/which/lookup/tests.rs
Improved which integration tests by extracting shared fixtures, adding cache/fresh behavior checks, and covering new PATH/PATHEXT and cwd_mode edge cases.
  • Moved common which test helpers (ToolName, DirName, Template, PathEnv, write_tool, WhichTestFixture, render) into a new which_filter_common.rs module and referenced it from the main test module tree.
  • Updated which_filter_tests.rs to use WhichTestFixture helpers plus new render_and_assert_pure, test_cache_after_removal, test_cache_behaviour_after_removal, test_duplicate_paths, and test_cwd_mode_resolution utilities.
  • Added tests for cache vs fresh behavior after executable removal, all=true returning a list/JSON array, canonical vs non-canonical duplicate handling, cwd_mode validation including case-insensitive acceptance and error messaging, and which(..., all=true) serialization.
  • Wired the new common module into tests/std_filter_tests/mod.rs via a #[path = "std_filter_tests/which_filter_common.rs"] mod which_filter_common; declaration.
tests/std_filter_tests/mod.rs
tests/std_filter_tests/which_filter_common.rs
tests/std_filter_tests/which_filter_tests.rs
Extended EnvSnapshot and workspace-related tests to cover invalid UTF-8 PATH, relative PATH entries, PATHEXT normalization, and direct-path executability errors.
  • Added Unix test ensuring invalid UTF-8 in PATH causes EnvSnapshot::capture to fail with a netsuke::jinja::which::args error mentioning the entry index.
  • Added test that relative PATH entries ("bin" and "tools" components) are resolved against EnvSnapshot.cwd and appear in resolved_dirs(CwdMode::Never).
  • On Windows, added tests that an empty PATHEXT falls back to defaults including .COM and .EXE, and that PATHEXT values are normalised (adding leading dots, trimming) and deduplicated case-insensitively.
  • Added a Unix-only test that a non-executable direct path results in a not executable error via resolve_direct with a full EnvSnapshot context.
src/stdlib/which/lookup/tests.rs
Expanded file-type predicates registration to include non-Unix-safe stubs for special file types, improving cross-platform file tests while preserving behavior on non-Unix systems.
  • Extended FILE_TESTS on non-Unix to register pipe, block_device, char_device, and device predicates alongside dir/file/symlink.
  • Provided no-op false-returning implementations of is_fifo, is_block_device, is_char_device, and is_device on non-Unix builds to keep API uniform without relying on unsupported metadata flags.
  • Retained Unix implementations using fs::FileType::is_fifo/is_block_device/is_char_device and a composite is_device that ORs block and char device checks.
src/stdlib/register.rs
Documented workspace fallback behavior and environment controls in the netsuke design docs.
  • Updated netsuke-design.md to clarify that fresh is stripped from the cache key while still repopulating the cache on fresh lookups.
  • Documented that workspace traversals are bounded to depth six, skip heavy directories (.git, target, node_modules, dist, build), and can be disabled via NETSUKE_WHICH_WORKSPACE set to 0/false/off for latency-sensitive cases.
docs/netsuke-design.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Owner Author

leynos commented Nov 24, 2025

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix:

Bumpy Road Ahead

src/stdlib/which/lookup/workspace/posix.rs: search_workspace

What lead to degradation?

search_workspace has 2 blocks with nested conditional logic. Any nesting of 2 or deeper is considered. Threshold is 2 blocks per function

Why does this problem occur?

A Bumpy Road is a function that contains multiple chunks of nested conditional logic inside the same function. The deeper the nesting and the more bumps, the lower the code health.
A bumpy code road represents a lack of encapsulation which becomes an obstacle to comprehension. In imperative languages there’s also an increased risk for feature entanglement, which leads to complex state management. CodeScene considers the following rules for the code health impact: 1) The deeper the nested conditional logic of each bump, the higher the tax on our working memory. 2) The more bumps inside a function, the more expensive it is to refactor as each bump represents a missing abstraction. 3) The larger each bump – that is, the more lines of code it spans – the harder it is to build up a mental model of the function. The nesting depth for what is considered a bump is levels of conditionals.

How to fix it?

Bumpy Road implementations indicate a lack of encapsulation. Check out the detailed description of the Bumpy Road code health issue.
A Bumpy Road often suggests that the function/method does too many things. The first refactoring step is to identify the different possible responsibilities of the function. Consider extracting those responsibilities into smaller, cohesive, and well-named functions. The EXTRACT FUNCTION refactoring is the primary response.

@leynos
Copy link
Owner Author

leynos commented Nov 24, 2025

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix:

Bumpy Road Ahead

src/stdlib/which/lookup/workspace/windows.rs: search_workspace

What lead to degradation?

search_workspace has 2 blocks with nested conditional logic. Any nesting of 2 or deeper is considered. Threshold is 2 blocks per function

Why does this problem occur?

A Bumpy Road is a function that contains multiple chunks of nested conditional logic inside the same function. The deeper the nesting and the more bumps, the lower the code health.
A bumpy code road represents a lack of encapsulation which becomes an obstacle to comprehension. In imperative languages there’s also an increased risk for feature entanglement, which leads to complex state management. CodeScene considers the following rules for the code health impact: 1) The deeper the nested conditional logic of each bump, the higher the tax on our working memory. 2) The more bumps inside a function, the more expensive it is to refactor as each bump represents a missing abstraction. 3) The larger each bump – that is, the more lines of code it spans – the harder it is to build up a mental model of the function. The nesting depth for what is considered a bump is levels of conditionals.

How to fix it?

Bumpy Road implementations indicate a lack of encapsulation. Check out the detailed description of the Bumpy Road code health issue.
A Bumpy Road often suggests that the function/method does too many things. The first refactoring step is to identify the different possible responsibilities of the function. Consider extracting those responsibilities into smaller, cohesive, and well-named functions. The EXTRACT FUNCTION refactoring is the primary response.

@leynos
Copy link
Owner Author

leynos commented Nov 24, 2025

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix:

Code Duplication

tests/std_filter_tests/which_filter_tests.rs:

What lead to degradation?

The module contains 6 functions with similar structure: which_filter_all_with_duplicates_deduplicates_canonicalised_paths,which_filter_all_with_duplicates_respects_canonical_false,which_filter_fresh_bypasses_cache_after_executable_removed,which_filter_uses_cached_result_when_executable_removed and 2 more functions

Why does this problem occur?

Duplicated code often leads to code that's harder to change since the same logical change has to be done in multiple functions. More duplication gives lower code health.

How to fix it?

A certain degree of duplicated code might be acceptable. The problems start when it is the same behavior that is duplicated across the functions in the module, ie. a violation of the Don't Repeat Yourself (DRY) principle. DRY violations lead to code that is changed together in predictable patterns, which is both expensive and risky. DRY violations can be identified using CodeScene's X-Ray analysis to detect clusters of change coupled functions with high code similarity. Read More
Once you have identified the similarities across functions, look to extract and encapsulate the concept that varies into its own function(s). These shared abstractions can then be re-used, which minimizes the amount of duplication and simplifies change.

@coderabbitai

This comment was marked as resolved.

@coderabbitai

This comment was marked as resolved.

@coderabbitai

This comment was marked as resolved.

@leynos leynos changed the title Refactor which resolver: add tests and workspace fallback Refactor which resolver: introduce workspace module and tests Nov 24, 2025
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Owner Author

leynos commented Nov 25, 2025

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix:

src/stdlib/which/lookup/workspace/windows.rs

Comment on lines +128 to +142

fn should_stop_collecting(
    matches: &mut Vec<Utf8PathBuf>,
    entry: walkdir::DirEntry,
    command: &str,
    ctx: &WorkspaceMatchContext,
    collect_all: bool,
) -> Result<bool, Error> {
    if let Some(path) = process_workspace_entry(entry, command, ctx)? {
        matches.push(path);
        if !collect_all {
            return Ok(true);
        }
    }
    Ok(false)
}

❌ New issue: Excess Number of Function Arguments
should_stop_collecting has 5 arguments, max arguments = 4

@leynos
Copy link
Owner Author

leynos commented Nov 25, 2025

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix:

tests/std_filter_tests/which_filter_tests.rs

Comment on lines +24 to +33

fn test_cache_after_removal(
    fixture: &mut WhichTestFixture,
    first_template: &Template,
    second_template: &Template,
    removed_path: &Utf8PathBuf,
    expect_second_err: bool,
) -> Result<()> {
    fixture.state.reset_impure();
    let first = fixture.render(first_template)?;
    assert_eq!(first, removed_path.as_str());

❌ New issue: Excess Number of Function Arguments
test_cache_after_removal has 5 arguments, max arguments = 4

@coderabbitai

This comment was marked as resolved.

@coderabbitai

This comment was marked as resolved.

@leynos leynos changed the title Refactor which resolver: introduce workspace module and tests Add tests and simplify which resolver logic Nov 25, 2025
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Owner Author

leynos commented Nov 25, 2025

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix:

Code Duplication

tests/std_filter_tests/which_filter_tests.rs:

What lead to degradation?

The module contains 4 functions with similar structure: which_filter_fresh_bypasses_cache_after_executable_removed,which_filter_uses_cached_result_when_executable_removed,which_function_accepts_case_insensitive_cwd_mode,which_function_honours_cwd_mode

Why does this problem occur?

Duplicated code often leads to code that's harder to change since the same logical change has to be done in multiple functions. More duplication gives lower code health.

How to fix it?

A certain degree of duplicated code might be acceptable. The problems start when it is the same behavior that is duplicated across the functions in the module, ie. a violation of the Don't Repeat Yourself (DRY) principle. DRY violations lead to code that is changed together in predictable patterns, which is both expensive and risky. DRY violations can be identified using CodeScene's X-Ray analysis to detect clusters of change coupled functions with high code similarity. Read More
Once you have identified the similarities across functions, look to extract and encapsulate the concept that varies into its own function(s). These shared abstractions can then be re-used, which minimizes the amount of duplication and simplifies change.

@coderabbitai

This comment was marked as resolved.

@leynos leynos changed the title Add tests and simplify which resolver logic Refactor which resolver: OS-specific workspace fallback Nov 25, 2025
@leynos leynos force-pushed the terragon/which-filter-tests-and-workspace-lookup-refactor-7mc1ho branch from 55452bc to 41a6671 Compare November 26, 2025 21:50
@leynos leynos marked this pull request as ready for review November 26, 2025 23:13
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The POSIX and Windows workspace search implementations share quite a bit of helper logic (unwrap_or_log_error, logging-on-empty, entry processing shape); consider extracting the shared pieces into the common workspace module to reduce duplication and keep behaviour aligned across platforms.
  • In workspace_fallback_enabled, treating a non-Unicode NETSUKE_WHICH_WORKSPACE as "enabled" may be surprising; if that’s not intentional, you might want to treat VarError::NotUnicode the same as an explicit disable or at least log a warning for that case.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The POSIX and Windows workspace search implementations share quite a bit of helper logic (`unwrap_or_log_error`, logging-on-empty, entry processing shape); consider extracting the shared pieces into the common `workspace` module to reduce duplication and keep behaviour aligned across platforms.
- In `workspace_fallback_enabled`, treating a non-Unicode `NETSUKE_WHICH_WORKSPACE` as "enabled" may be surprising; if that’s not intentional, you might want to treat `VarError::NotUnicode` the same as an explicit disable or at least log a warning for that case.

## Individual Comments

### Comment 1
<location> `src/stdlib/which/lookup/tests.rs:166-167` </location>
<code_context>
+}
+
+#[rstest]
+fn relative_path_entries_resolve_against_cwd(workspace: TempWorkspace) -> Result<()> {
+    let bin = workspace.root().join("bin");
+    let tools = workspace.root().join("tools");
</code_context>

<issue_to_address>
**suggestion (testing):** Relative PATH entries test is implicitly Unix-specific but not cfg-gated or separator-aware

This test builds `PATH` as `"{}:bin:tools"`, which hardcodes the Unix `:` separator and assumes `bin`/`tools` are relative to the cwd. On Windows, the separator is `;` and `:` has special meaning (e.g., in drive letters), so the test may not match actual behaviour and could be flaky. Please either:

* gate this with `#[cfg(unix)]`, or
* build `PATH` using the platform separator (e.g. `std::env::join_paths`) so relative entries are validated correctly on all platforms.

```suggestion
    let path_value = std::env::join_paths([
        workspace.root().as_std_path(),
        std::path::Path::new("bin"),
        std::path::Path::new("tools"),
    ])?;
    let _guard = VarGuard::set("PATH", path_value.as_os_str());
```
</issue_to_address>

### Comment 2
<location> `src/stdlib/which/lookup/workspace/windows.rs:13` </location>
<code_context>
-/// Skips unreadable entries, ignores heavy/VCS directories via
-/// `should_visit_entry`, and returns `Ok(Vec<Utf8PathBuf>)` containing the
-/// discovered executables or an `Error` if UTF-8 conversion fails.
-pub(super) fn search_workspace(
-    cwd: &Utf8Path,
-    command: &str,
</code_context>

<issue_to_address>
**issue (complexity):** Consider simplifying the workspace search control flow by inlining `CollectionState` into the main loop and merging the entry-processing helpers into a single function that returns an executable path or `None`.

You can reduce the indirection without changing behavior by inlining the one-off `CollectionState` and collapsing the `process_workspace_entry` / `workspace_entry_matches` chain into a single helper.

### 1. Inline `CollectionState` into the loop

`CollectionState::try_add` is only encoding “if match, push, maybe stop” for a single caller. You can keep early-exit semantics directly in the loop and drop the struct:

```rust
pub(super) fn search_workspace(
    env: &EnvSnapshot,
    command: &str,
    collect_all: bool,
) -> Result<Vec<Utf8PathBuf>, Error> {
    let match_ctx = WorkspaceMatchContext::new(command, env);
    let mut matches = Vec::new();

    for walk_entry in WalkDir::new(&env.cwd)
        .follow_links(false)
        .max_depth(WORKSPACE_MAX_DEPTH)
        .sort_by_file_name()
        .into_iter()
        .filter_entry(should_visit_entry)
    {
        let Some(entry) = unwrap_or_log_error(walk_entry, command) else {
            continue;
        };

        if let Some(path) = match_workspace_entry(entry, command, &match_ctx)? {
            matches.push(path);
            if !collect_all {
                break;
            }
        }
    }

    log_if_no_matches(&matches, command);
    Ok(matches)
}
```

This keeps the control flow local and removes the need to jump into `CollectionState::try_add` to understand when iteration stops.

### 2. Collapse `process_workspace_entry` + `workspace_entry_matches`

The logic of `process_workspace_entry` and `workspace_entry_matches` is tightly coupled and only used together. You can merge them into a single helper that does “given a `DirEntry`, either return a valid executable path or `None`”:

```rust
fn match_workspace_entry(
    entry: walkdir::DirEntry,
    command: &str,
    ctx: &WorkspaceMatchContext,
) -> Result<Option<Utf8PathBuf>, Error> {
    if !entry.file_type().is_file() {
        return Ok(None);
    }

    let file_name = entry.file_name().to_string_lossy().to_ascii_lowercase();
    if file_name != ctx.command_lower
        && !( !ctx.command_has_ext && ctx.basenames.contains(&file_name) )
    {
        return Ok(None);
    }

    let path = entry.into_path();
    let utf8 = Utf8PathBuf::from_path_buf(path).map_err(|path_buf| {
        let lossy_path = path_buf.to_string_lossy();
        Error::new(
            ErrorKind::InvalidOperation,
            format!(
                "workspace path contains non-UTF-8 components while resolving command '{command}': {lossy_path}",
            ),
        )
    })?;

    Ok(is_executable(&utf8).then_some(utf8))
}
```

This preserves:

- Case-insensitive matching,
- PATHEXT-aware basename matching via `WorkspaceMatchContext`,
- UTF-8 conversion error reporting,
- `is_executable` filtering,
- Early exit when `collect_all == false`.

If you still want `workspace_entry_matches` separate, you can keep a tiny helper, but the main win is reducing the entry pipeline to: iterator → `unwrap_or_log_error``match_workspace_entry` → push/break, all visible in one place.
</issue_to_address>

### Comment 3
<location> `src/stdlib/which/lookup/workspace/posix.rs:11` </location>
<code_context>
-/// Skips unreadable entries, ignores heavy/VCS directories via
-/// `should_visit_entry`, and returns `Ok(Vec<Utf8PathBuf>)` containing the
-/// discovered executables or an `Error` if UTF-8 conversion fails.
-pub(super) fn search_workspace(
-    cwd: &Utf8Path,
-    command: &str,
</code_context>

<issue_to_address>
**issue (complexity):** Consider inlining the workspace traversal loop and related logging into `search_workspace` to keep all behavior in one straightforward function with less indirection.

You can simplify this traversal by inlining the iterator loop and logging into `search_workspace` and dropping `collect_matching_executables` / `log_if_empty` / `unwrap_or_log_error`. The behavior (early-exit, logging unreadable entries, empty-result log, UTF-8 error) can stay identical while reducing indirection.

For example:

```rust
pub(super) fn search_workspace(
    env: &EnvSnapshot,
    command: &str,
    collect_all: bool,
) -> Result<Vec<Utf8PathBuf>, Error> {
    let walker = WalkDir::new(&env.cwd)
        .follow_links(false)
        .max_depth(WORKSPACE_MAX_DEPTH)
        .sort_by_file_name()
        .into_iter()
        .filter_entry(should_visit_entry);

    let mut matches = Vec::new();

    for walk_entry in walker {
        let entry = match walk_entry {
            Ok(entry) => entry,
            Err(err) => {
                tracing::debug!(
                    %command,
                    error = %err,
                    "skipping unreadable workspace entry during which fallback",
                );
                continue;
            }
        };

        if !entry.file_type().is_file() {
            continue;
        }

        if entry.file_name().to_string_lossy() != command {
            continue;
        }

        let path = entry.into_path();
        let utf8 = Utf8PathBuf::from_path_buf(path).map_err(|path_buf| {
            let lossy_path = path_buf.to_string_lossy();
            Error::new(
                ErrorKind::InvalidOperation,
                format!(
                    "workspace path contains non-UTF-8 components while resolving command \
                     '{command}': {lossy_path}",
                ),
            )
        })?;

        if is_executable(&utf8) {
            matches.push(utf8);
            if !collect_all {
                break;
            }
        }
    }

    if matches.is_empty() {
        tracing::debug!(
            %command,
            max_depth = WORKSPACE_MAX_DEPTH,
            skip = ?WORKSPACE_SKIP_DIRS,
            "workspace which fallback found no matches",
        );
    }

    Ok(matches)
}
```

This keeps all the existing behavior:

- same `WalkDir` construction and `should_visit_entry` filter,
- same logging for unreadable entries,
- same UTF-8 error semantics/message,
- same `is_file` / name equality / `is_executable` checks,
- same early-exit when `collect_all == false`,
- same “no matches” debug log,

but removes:

- `collect_matching_executables`,
- `unwrap_or_log_error`,
- `log_if_empty`,
- and the `process_workspace_entry` hop (the logic is now visible in one place).

If you still prefer some decomposition, a minimal helper that only encapsulates the UTF-8 conversion + error would keep complexity low while avoiding duplication:

```rust
fn to_utf8_path(path: PathBuf, command: &str) -> Result<Utf8PathBuf, Error> {
    Utf8PathBuf::from_path_buf(path).map_err(|path_buf| {
        let lossy_path = path_buf.to_string_lossy();
        Error::new(
            ErrorKind::InvalidOperation,
            format!(
                "workspace path contains non-UTF-8 components while resolving command \
                 '{command}': {lossy_path}",
            ),
        )
    })
}
```

and then in the loop:

```rust
let utf8 = to_utf8_path(entry.into_path(), command)?;
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@coderabbitai coderabbitai bot added the codex label Nov 26, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/std_filter_tests/which_filter_tests.rs (1)

250-270: Based on my verification, the review comment is correct and identifies a legitimate issue. Here are the findings:

  1. Project confirms Rust 2024 edition (Cargo.toml line 4: edition = "2024")
  2. Established codebase patterns exist for handling environment mutations:
    • tests/runner_tests.rs explicitly documents that std::env::set_var is unsafe in Rust 2024
    • tests/env_path_tests.rs and tests/ninja_env_tests.rs wrap all env mutations with unsafe { } and EnvLock
    • src/manifest/tests.rs provides a CurrentDirGuard pattern specifically for set_current_dir that acquires EnvLock and auto-restores on drop
  3. The test under review violates established patterns: lines 257 and 266 use env::set_current_dir without:
    • unsafe wrapping
    • EnvLock synchronization
    • Guard pattern for guaranteed restoration

Environment mutations on lines 257 and 266 require proper synchronization guards per established codebase patterns.

The env::set_current_dir calls are unguarded and violate the project's Rust 2024 safety requirements. Use the CurrentDirGuard pattern already established in src/manifest/tests.rs (acquires EnvLock, auto-restores), or wrap manually with EnvLock::acquire() to prevent test interference and ensure reliable cleanup.

♻️ Duplicate comments (1)
src/stdlib/which/lookup/workspace/windows.rs (1)

141-171: Duplicate helpers with POSIX module.

unwrap_or_log_error and log_if_no_matches duplicate the POSIX versions. Extract to workspace/mod.rs as shared helpers.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ddb2a6 and 41a6671.

📒 Files selected for processing (11)
  • docs/netsuke-design.md (2 hunks)
  • src/stdlib/register.rs (2 hunks)
  • src/stdlib/which/lookup/mod.rs (4 hunks)
  • src/stdlib/which/lookup/tests.rs (5 hunks)
  • src/stdlib/which/lookup/workspace.rs (0 hunks)
  • src/stdlib/which/lookup/workspace/mod.rs (1 hunks)
  • src/stdlib/which/lookup/workspace/posix.rs (1 hunks)
  • src/stdlib/which/lookup/workspace/windows.rs (1 hunks)
  • tests/std_filter_tests/mod.rs (1 hunks)
  • tests/std_filter_tests/which_filter_common.rs (1 hunks)
  • tests/std_filter_tests/which_filter_tests.rs (3 hunks)
💤 Files with no reviewable changes (1)
  • src/stdlib/which/lookup/workspace.rs
🧰 Additional context used
📓 Path-based instructions (5)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings MUST be disallowed in Rust code.
Fix any warnings emitted during Rust tests in the code itself rather than silencing them.
In Rust, extract meaningfully named helper functions when a function is too long, adhering to separation of concerns and CQRS.
In Rust, group related parameters in meaningfully named structs when a function has too many parameters.
In Rust, consider using Arc to reduce data returned when a function is returning a large error.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///) so documentation can be generated with cargo doc.
In Rust, prefer immutable data and avoid unnecessary mut bindings.
In Rust, handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code in Rust unless absolutely necessary, and document any usage clearly.
In Rust, place function attributes after doc comments.
In Rust, do not use return in single-line functions.
In Rust, use predicate functions for conditional criteria with more than two branches.
Lints in Rust must not be silenced except as a last resort.
Lint rule suppressions in Rust must be tightly scoped and include a clear reason.
In Rust, prefer expect over allow.
In Rust tests, use rstest fixtures for shared setup.
In Rust tests, replace duplicated tests with #[rstest(...)] parameterized cases.
In Rust, prefer mockall for mocks and stubs.
In Rust, use concat!() to combine long string literals rather than escaping newlines with a backslash.
In Rust, prefer single line versions of functions where appropriate (e.g., pub fn new(id: u64) -> Self { Self(id) } instead of multi-line).
Use NewTypes in Rust to model domain values and eliminate 'integer soup'. Reach for newt-hype when introducing many homogeneous wrappers; add small shims for string-backed wrappers. For path-centric wrappers implement AsRef<Path> along...

Files:

  • src/stdlib/which/lookup/workspace/posix.rs
  • tests/std_filter_tests/mod.rs
  • src/stdlib/which/lookup/tests.rs
  • src/stdlib/which/lookup/workspace/windows.rs
  • tests/std_filter_tests/which_filter_tests.rs
  • src/stdlib/which/lookup/workspace/mod.rs
  • src/stdlib/register.rs
  • tests/std_filter_tests/which_filter_common.rs
  • src/stdlib/which/lookup/mod.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/stdlib/which/lookup/workspace/posix.rs
  • tests/std_filter_tests/mod.rs
  • src/stdlib/which/lookup/tests.rs
  • src/stdlib/which/lookup/workspace/windows.rs
  • tests/std_filter_tests/which_filter_tests.rs
  • src/stdlib/which/lookup/workspace/mod.rs
  • src/stdlib/register.rs
  • tests/std_filter_tests/which_filter_common.rs
  • src/stdlib/which/lookup/mod.rs
docs/**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

docs/**/*.md: When new decisions are made, requirements change, libraries are added/removed, or architectural patterns evolve, proactively update the relevant file(s) in the docs/ directory to reflect the latest state.
Documentation must use en-GB-oxendict spelling and grammar ("-ize" / "-yse" / "-our"), with exception for the LICENSE filename.

docs/**/*.md: Use British English (en-GB) spelling with -ize endings (realize, organization), -lyse endings (analyse, paralyse, catalyse), -our endings (colour, behaviour), -re endings (calibre, centre, fibre), double 'l' (cancelled, counsellor), maintained 'e' (likeable, liveable), and -ogue endings (analogue, catalogue)
Use the Oxford comma in documentation: 'ships, planes, and hovercraft'
Treat company names as collective nouns in documentation: 'Concordat Industries are expanding'
Write headings in sentence case in documentation
Use Markdown headings (#, ##, ###) in order without skipping levels
Follow markdownlint recommendations for Markdown formatting
Provide code blocks and lists using standard Markdown syntax
Always provide a language identifier for fenced code blocks; use plaintext for non-code text
Use - as the first level bullet and renumber lists when items change in documentation
Prefer inline links using [text](url) or angle brackets around the URL in documentation
Ensure blank lines before and after bulleted lists and fenced blocks in documentation
Ensure tables have a delimiter line below the header row in documentation
Expand any uncommon acronym on first use in documentation (e.g., Continuous Integration (CI))
Wrap paragraphs at 80 columns in documentation
Wrap code at 120 columns in documentation
Do not wrap tables in documentation
Use GitHub-flavoured numeric footnotes referenced as [^1] in documentation
Number footnotes in order of appearance in the document
Caption every table and every diagram in documentation
Use ![alt text](path/to/image) for embedding figures with brief alt te...

Files:

  • docs/netsuke-design.md
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

**/*.md: Markdown paragraphs and bullet points must be wrapped at 80 columns.
Code blocks in Markdown must be wrapped at 120 columns.
Tables and headings in Markdown must not be wrapped.
Use dashes (-) for list bullets in Markdown.
Use GitHub-flavoured Markdown footnotes ([^1]) for references and footnotes.

Files:

  • docs/netsuke-design.md

⚙️ CodeRabbit configuration file

**/*.md: * Avoid 2nd person or 1st person pronouns ("I", "you", "we")

  • Use en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Headings must not be wrapped.
  • Documents must start with a level 1 heading
  • Headings must correctly increase or decrease by no more than one level at a time
  • Use GitHub-flavoured Markdown style for footnotes and endnotes.
  • Numbered footnotes must be numbered by order of appearance in the document.

Files:

  • docs/netsuke-design.md
docs/**/{README,readme,GUIDE,guide,*.md}

📄 CodeRabbit inference engine (docs/adr-001-replace-serde-yml-with-serde-saphyr.md)

Document in Netsuke's README and user guide that the project now supports YAML 1.2 (not YAML 1.1) and that duplicate keys in manifests are no longer permitted

Files:

  • docs/netsuke-design.md
docs/**/!(README).md

📄 CodeRabbit inference engine (docs/documentation-style-guide.md)

Avoid first and second person personal pronouns outside the README.md file in documentation

Files:

  • docs/netsuke-design.md
🧬 Code graph analysis (6)
src/stdlib/which/lookup/workspace/posix.rs (3)
src/stdlib/which/lookup/mod.rs (1)
  • is_executable (189-192)
src/stdlib/which/lookup/workspace/mod.rs (1)
  • should_visit_entry (50-56)
src/stdlib/which/lookup/workspace/windows.rs (4)
  • new (57-62)
  • new (84-103)
  • unwrap_or_log_error (143-158)
  • process_workspace_entry (117-139)
src/stdlib/which/lookup/tests.rs (4)
src/stdlib/which/env.rs (2)
  • capture (25-47)
  • resolved_dirs (49-64)
src/stdlib/which/lookup/workspace/posix.rs (1)
  • search_workspace (11-28)
src/stdlib/which/lookup/workspace/mod.rs (1)
  • search_workspace (26-48)
src/stdlib/which/lookup/mod.rs (2)
  • resolve_direct (76-93)
  • resolve_direct (96-110)
src/stdlib/which/lookup/workspace/windows.rs (4)
src/stdlib/which/lookup/mod.rs (1)
  • is_executable (189-192)
src/stdlib/which/lookup/workspace/mod.rs (2)
  • should_visit_entry (50-56)
  • search_workspace (26-48)
src/stdlib/which/lookup/workspace/posix.rs (3)
  • search_workspace (11-28)
  • unwrap_or_log_error (32-47)
  • process_workspace_entry (86-111)
src/stdlib/which/env.rs (1)
  • candidate_paths (148-165)
tests/std_filter_tests/which_filter_tests.rs (2)
tests/std_filter_tests/which_filter_common.rs (8)
  • with_tool_in_dirs (147-167)
  • write_tool (92-104)
  • new (16-16)
  • new (32-32)
  • new (52-52)
  • new (72-89)
  • render (133-136)
  • render (169-173)
tests/std_filter_tests/support.rs (3)
  • filter_workspace (54-70)
  • stdlib_env_with_state (46-48)
  • render (72-86)
src/stdlib/which/lookup/workspace/mod.rs (2)
src/stdlib/which/lookup/workspace/windows.rs (3)
  • search_workspace (13-40)
  • new (57-62)
  • new (84-103)
src/stdlib/which/lookup/workspace/posix.rs (1)
  • search_workspace (11-28)
src/stdlib/which/lookup/mod.rs (4)
src/stdlib/which/error.rs (1)
  • direct_not_found (23-30)
src/stdlib/which/env.rs (1)
  • candidate_paths (148-165)
src/stdlib/which/lookup/workspace/posix.rs (1)
  • search_workspace (11-28)
src/stdlib/which/lookup/workspace/mod.rs (1)
  • search_workspace (26-48)
🔍 Remote MCP Deepwiki

Summary — additional context I found (items to check during review)

  • Project constraints: filesystem access is capability-based and paths are UTF‑8 (cap‑std + camino). Any workspace traversal must use cap‑std/Utf8Path types and handle non‑UTF8 entries with clear diagnostics.

  • Template/stdlib background: the stdlib registers env(), glob(), file tests and enforces strict undefined behavior for env() — confirm the new workspace fallback respects existing stdlib/env semantics and error types.

  • Testing ecosystem: tests use rstest, cucumber (BDD), and insta snapshots; new which fixtures must follow the same Env/PATH guards and snapshot determinism rules to avoid flakiness (PATH/PATHEXT isolation, temp dirs, UTF‑8 assumptions). Pay attention to new tests for invalid UTF‑8 PATH entries and relative PATH resolution.

  • Platform specifics to validate:

    • Windows: PATHEXT normalization/deduplication and candidate generation must only apply on Windows; verify candidates_for_dir and normalize_direct_path implement platform guards correctly and tests cover edge cases.
    • POSIX: ensure executable detection, canonicalisation flag, and direct-path handling behave as before when workspace fallback is used.
  • Workspace traversal behaviour to verify:

    • WORKSPACE_MAX_DEPTH = 6 and WORKSPACE_SKIP_DIRS (".git","target","node_modules","dist","build") are enforced and documented.
    • NETSUKE_WHICH_WORKSPACE env var toggles fallback appropriately (default on unless explicit falsey).
    • Traversal skips heavy/VCS dirs and logs unreadable entries rather than panicking; non‑UTF8 paths produce controlled errors. (Check WalkDir usage, depth filter, should_visit_entry and error messages.)
  • API/visibility & compatibility checks:

    • Deleted workspace.rs replaced by workspace/mod.rs + posix/windows — confirm the exported search_workspace signature matches all callers and platform cfgs.
    • register.rs: non‑Unix stubs for device/pipe tests added — confirm tests on non‑Unix still compile and behavior preserved.
  • Test reliability & determinism:

    • Ensure new tests/fixtures preserve snapshot determinism (sorted outputs, stable hashes) and acquire/release PATH/env locks (VarGuard/EnvSnapshot) to avoid cross-test interference.

Sources used for these checks: project documentation and developer/testing notes.,

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: release / build-macos (x86_64-apple-darwin, macos-13, macos-x86_64, macos-x86_64) / Build macos artefacts
  • GitHub Check: release / build-windows (aarch64-pc-windows-msvc, arm64, arm64, windows-aarch64) / Build windows artefacts
  • GitHub Check: release / build-linux (x86_64-unknown-linux-gnu, linux-amd64, amd64) / Build linux artefacts
  • GitHub Check: release / build-windows (x86_64-pc-windows-msvc, amd64, x64, windows-x86_64) / Build windows artefacts
  • GitHub Check: Sourcery review
  • GitHub Check: Sourcery review
🔇 Additional comments (41)
src/stdlib/register.rs (1)

123-131: Keep non-Unix FILE_TESTS aligned with Unix and update docs for new predicates

Expose the same file-test names on non-Unix as on Unix to keep templates portable; this constant now does that correctly. Ensure the stdlib/file-tests documentation explicitly lists pipe, block_device, char_device and device as available on all platforms, and note that on non-Unix they currently always evaluate to false so users understand the cross-platform behaviour.

tests/std_filter_tests/mod.rs (1)

15-16: LGTM!

The new module inclusion follows the established pattern and correctly wires in the shared test fixtures for which filter tests.

src/stdlib/which/lookup/workspace/posix.rs (3)

1-9: LGTM!

Module documentation and imports are well-organised.


11-28: LGTM!

Clean separation of walker setup, collection, and logging. Error propagation uses ? correctly.


61-111: LGTM!

The collection and entry-processing logic is correct. Error propagation for non-UTF-8 paths is handled appropriately with a clear error message.

src/stdlib/which/lookup/workspace/windows.rs (5)

1-40: LGTM!

The Windows-specific search logic correctly separates match context from collection state. The try_add pattern provides clean early termination.


42-81: LGTM!

The CollectionState struct with try_add effectively encapsulates collection logic and reduces parameter count, addressing the PR feedback about excess function arguments.


83-104: LGTM!

Candidate basenames are generated correctly using candidate_paths with an empty directory prefix, and case-insensitive matching via lowercase storage is appropriate for Windows.


106-115: LGTM!

Case-insensitive matching with PATHEXT-derived basename fallback is correctly implemented.


117-139: LGTM!

Entry processing correctly filters by file type and name match before performing UTF-8 conversion and executable checks.

tests/std_filter_tests/which_filter_common.rs (7)

1-11: LGTM!

Module documentation is present and imports are appropriate for test fixture code.


12-62: LGTM!

NewType wrappers for ToolName, DirName, and Template provide type safety and prevent parameter mix-ups in test fixtures.


64-90: LGTM!

PathEnv correctly uses EnvLock and VarGuard for environment isolation, adhering to guidelines about guarded environment mutations in tests.


92-104: LGTM!

Tool creation logic is correct with proper directory creation, file writing, and platform-appropriate permission setting.


106-131: LGTM!

Platform-specific helpers correctly handle Unix permissions, Windows file extensions, and script content differences.


133-136: LGTM!

Simple render helper correctly wraps the MiniJinja rendering API.


138-174: LGTM!

WhichTestFixture provides a comprehensive test setup with proper RAII cleanup for temporary directories and environment variables.

tests/std_filter_tests/which_filter_tests.rs (9)

1-12: LGTM!

Imports are well-organised and the glob import from the common fixture module is acceptable for test code.


14-22: LGTM!

render_and_assert_pure cleanly encapsulates the purity assertion pattern used across tests.


51-67: LGTM!

test_cache_behaviour_after_removal reduces duplication for cache removal tests by parameterising the second template and error expectation.


69-89: LGTM!

test_duplicate_paths correctly validates duplicate path handling with and without canonicalisation.


91-101: LGTM!

test_cwd_mode_resolution provides reusable setup for testing cwd_mode variants.


103-141: LGTM!

Tests correctly use the extracted helpers and make appropriate assertions.


143-176: LGTM!

The JSON array assertion and duplicate path tests correctly verify the expected behaviours.


178-209: LGTM!

cwd_mode tests correctly cover valid values (case-insensitive) and invalid value error reporting.


211-248: LGTM!

The remaining tests correctly verify error reporting, workspace fallback, and heavy directory skipping behaviours.

docs/netsuke-design.md (2)

965-965: LGTM!

The wording improvement from "fresh-only" to "fresh" is clearer and more accurate—the parameter controls bypass behaviour regardless of the cache key composition.


988-992: LGTM!

The new workspace fallback documentation correctly describes the depth bound, skipped directories, and environment variable toggle. The Oxford comma usage and en-GB spelling are consistent with the coding guidelines.

src/stdlib/which/lookup/tests.rs (4)

59-74: LGTM!

The refactored test correctly uses VarGuard for PATH isolation and passes the snapshot to search_workspace, aligning with the new unified API.


135-157: LGTM!

Good coverage of the invalid UTF-8 PATH edge case. The test correctly verifies that EnvSnapshot::capture fails with an actionable diagnostic containing the error code and entry index.


159-183: LGTM!

This test correctly validates that relative PATH entries are resolved against the current working directory. The use of CwdMode::Never isolates the relative resolution behaviour from CWD injection.


238-265: LGTM!

The test correctly validates that resolve_direct returns a "not executable" error when the target file lacks execute permissions. Good coverage of this edge case.

src/stdlib/which/lookup/workspace/mod.rs (4)

1-2: LGTM!

The module-level documentation clearly states the module's purpose, satisfying the coding guideline that every Rust module must begin with a //! doc comment.


20-24: LGTM!

The constants are correctly scoped: WORKSPACE_MAX_DEPTH and WORKSPACE_SKIP_DIRS are pub(super) for the platform submodules, whilst WORKSPACE_FALLBACK_ENV remains private as it is only consumed by workspace_fallback_enabled().


26-48: LGTM!

The function cleanly separates the environment toggle from platform-specific traversal. The debug logging provides useful observability for diagnosing workspace fallback behaviour.


50-56: LGTM!

The predicate correctly allows files through unconditionally and filters directories against the skip list. Using to_string_lossy() ensures non-UTF-8 directory names are handled gracefully without panicking.

src/stdlib/which/lookup/mod.rs (5)

47-47: LGTM!

The extraction of candidates_for_dir improves readability and centralises platform-specific candidate generation.


75-93: LGTM!

The Windows resolve_direct correctly normalises the path, expands PATHEXT candidates, and applies optional canonicalisation. The unified flow aligns with the design documentation.


112-119: LGTM!

The helper cleanly handles both absolute and relative direct paths, joining with env.cwd when necessary.


194-214: LGTM!

The unified search_workspace call removes per-platform branching from handle_miss, delegating platform dispatch to the workspace module. This improves separation of concerns.


95-110: Avoid redundant clone on the non-canonical branch.

resolved is only used once on line 108; move ownership rather than cloning.

 #[cfg(not(windows))]
 pub(super) fn resolve_direct(
     command: &str,
     env: &EnvSnapshot,
     options: &WhichOptions,
 ) -> Result<Vec<Utf8PathBuf>, Error> {
     let resolved = normalize_direct_path(command, env);
     if !is_executable(&resolved) {
         return Err(direct_not_found(command, &resolved));
     }
     if options.canonical {
         canonicalise(vec![resolved.clone()])
     } else {
-        Ok(vec![resolved])
+        Ok(vec![resolved])
     }
 }

Actually, the current code already avoids a redundant clone on the non-canonical path. The clone on line 106 is unavoidable since direct_not_found borrows resolved. No change needed.

Likely an incorrect or invalid review comment.

leynos and others added 6 commits November 27, 2025 01:06
Removed the EnvLock acquisition lines from tests in src/stdlib/which/lookup/tests.rs as they were unused or unnecessary. This simplifies test setup and avoids unnecessary environment locking.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
- Extracted helper functions to improve workspace traversal durability by
  logging and skipping unreadable entries instead of failing.
- Added early exit support when collecting a single match.
- Added debug logs when no matches found to aid diagnosis.
- Simplified code by consolidating duplicate logic in POSIX and Windows search.
- Updated which filter test suite with helpers to check cache behavior and
  duplicate paths, ensuring purity and consistent output.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
…onState struct

Refactored the workspace search logic on Windows to encapsulate the state and logic for collecting matching executables within a new `CollectionState` struct. This replaces manual vector management with a clean interface that handles adding matches and determining early exit, improving code clarity and maintainability.

Also cleaned up related test code by removing unused parameters.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
… setup

- Added helper functions for cache removal tests and cwd_mode resolution to reduce duplication.
- Set PATH environment variable in which lookup tests for consistent environment.
- Refactored tests to use new helpers improving clarity and maintainability.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
…ss and logging

- Added helper functions to unwrap walkdir entries with error logging and to log when no matches are found during fallback.
- Rewrote workspace traversal to be more resilient by filtering unreadable paths and better handling non-UTF-8 paths.
- Simplified collection of executable matches, removing state struct in favor of clearer iteration logic.
- Enhanced logging to aid diagnosis of fallback misses or unexpected latencies.
- Adjusted test code for improved PATH handling and error expectation assertion.

These changes improve the maintainability and robustness of the workspace fallback which resolution mechanism, ensuring smoother operation across platforms and edge cases.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
@leynos leynos force-pushed the terragon/which-filter-tests-and-workspace-lookup-refactor-7mc1ho branch from 7649550 to e68ce5c Compare November 27, 2025 02:12
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/stdlib/register.rs (1)

216-265: PATH isolation pattern is correct but verbose; extract a helper.

The EnvLock acquisition, PATH capture, unsafe mutation, and restoration logic is duplicated elsewhere in the codebase. Extract with_isolated_path (as done in tests.rs) to a shared test utility to reduce boilerplate and risk of forgetting restoration.

As per coding guidelines, use rstest fixtures for shared setup and avoid repetition between tests.

tests/std_filter_tests/which_filter_tests.rs (2)

216-219: Replace .unwrap_err() with .expect_err() for clearer failure diagnostics.

As per coding guidelines, prefer expect over unwrap in tests.

     let err = env
         .render_str("{{ 'absent' | which }}", context! {})
-        .unwrap_err();
+        .expect_err("render should fail for missing command");

243-246: Replace .unwrap_err() with .expect_err() for clearer failure diagnostics.

As per coding guidelines, prefer expect over unwrap in tests.

     let err = env
         .render_str("{{ 'helper' | which }}", context! {})
-        .unwrap_err();
+        .expect_err("render should fail when tool is in skipped directory");
♻️ Duplicate comments (2)
src/stdlib/register.rs (1)

169-202: Document the intentionally unsupported non-Unix file-type stubs.

These stubs correctly return false on non-Unix platforms. Add a brief doc comment to each explaining they are intentionally unsupported, so future maintainers do not accidentally alter cross-platform semantics. Based on past review feedback, this documentation was requested but not yet added.

 #[cfg(not(unix))]
+/// Always returns `false` on non-Unix platforms where FIFO detection is unsupported.
 fn is_fifo(_ft: fs::FileType) -> bool {
     false
 }

 #[cfg(not(unix))]
+/// Always returns `false` on non-Unix platforms where block device detection is unsupported.
 fn is_block_device(_ft: fs::FileType) -> bool {
     false
 }

 #[cfg(not(unix))]
+/// Always returns `false` on non-Unix platforms where character device detection is unsupported.
 fn is_char_device(_ft: fs::FileType) -> bool {
     false
 }

 #[cfg(not(unix))]
+/// Always returns `false` on non-Unix platforms where device detection is unsupported.
 fn is_device(_ft: fs::FileType) -> bool {
     false
 }
src/stdlib/which/lookup/mod.rs (1)

181-191: Use an underscore-prefixed parameter to suppress the unused warning.

The let _ = env; workaround on line 188 is unusual. Rename the parameter to _env on the non-Windows branch instead.

-fn candidates_for_dir(env: &EnvSnapshot, dir: &Utf8Path, command: &str) -> Vec<Utf8PathBuf> {
+fn candidates_for_dir(_env: &EnvSnapshot, dir: &Utf8Path, command: &str) -> Vec<Utf8PathBuf> {
     #[cfg(windows)]
     {
-        env::candidate_paths(dir, command, env.pathext())
+        env::candidate_paths(dir, command, _env.pathext())
     }
     #[cfg(not(windows))]
     {
-        let _ = env;
         vec![dir.join(command)]
     }
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41a6671 and e68ce5c.

📒 Files selected for processing (11)
  • docs/netsuke-design.md (2 hunks)
  • src/stdlib/register.rs (4 hunks)
  • src/stdlib/which/lookup/mod.rs (4 hunks)
  • src/stdlib/which/lookup/tests.rs (5 hunks)
  • src/stdlib/which/lookup/workspace.rs (0 hunks)
  • src/stdlib/which/lookup/workspace/mod.rs (1 hunks)
  • src/stdlib/which/lookup/workspace/posix.rs (1 hunks)
  • src/stdlib/which/lookup/workspace/windows.rs (1 hunks)
  • tests/std_filter_tests/mod.rs (1 hunks)
  • tests/std_filter_tests/which_filter_common.rs (1 hunks)
  • tests/std_filter_tests/which_filter_tests.rs (3 hunks)
💤 Files with no reviewable changes (1)
  • src/stdlib/which/lookup/workspace.rs
🧰 Additional context used
📓 Path-based instructions (5)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings MUST be disallowed in Rust code.
Fix any warnings emitted during Rust tests in the code itself rather than silencing them.
In Rust, extract meaningfully named helper functions when a function is too long, adhering to separation of concerns and CQRS.
In Rust, group related parameters in meaningfully named structs when a function has too many parameters.
In Rust, consider using Arc to reduce data returned when a function is returning a large error.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///) so documentation can be generated with cargo doc.
In Rust, prefer immutable data and avoid unnecessary mut bindings.
In Rust, handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code in Rust unless absolutely necessary, and document any usage clearly.
In Rust, place function attributes after doc comments.
In Rust, do not use return in single-line functions.
In Rust, use predicate functions for conditional criteria with more than two branches.
Lints in Rust must not be silenced except as a last resort.
Lint rule suppressions in Rust must be tightly scoped and include a clear reason.
In Rust, prefer expect over allow.
In Rust tests, use rstest fixtures for shared setup.
In Rust tests, replace duplicated tests with #[rstest(...)] parameterized cases.
In Rust, prefer mockall for mocks and stubs.
In Rust, use concat!() to combine long string literals rather than escaping newlines with a backslash.
In Rust, prefer single line versions of functions where appropriate (e.g., pub fn new(id: u64) -> Self { Self(id) } instead of multi-line).
Use NewTypes in Rust to model domain values and eliminate 'integer soup'. Reach for newt-hype when introducing many homogeneous wrappers; add small shims for string-backed wrappers. For path-centric wrappers implement AsRef<Path> along...

Files:

  • src/stdlib/which/lookup/workspace/posix.rs
  • src/stdlib/which/lookup/tests.rs
  • tests/std_filter_tests/mod.rs
  • src/stdlib/which/lookup/workspace/windows.rs
  • src/stdlib/register.rs
  • src/stdlib/which/lookup/mod.rs
  • tests/std_filter_tests/which_filter_common.rs
  • tests/std_filter_tests/which_filter_tests.rs
  • src/stdlib/which/lookup/workspace/mod.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/stdlib/which/lookup/workspace/posix.rs
  • src/stdlib/which/lookup/tests.rs
  • tests/std_filter_tests/mod.rs
  • src/stdlib/which/lookup/workspace/windows.rs
  • src/stdlib/register.rs
  • src/stdlib/which/lookup/mod.rs
  • tests/std_filter_tests/which_filter_common.rs
  • tests/std_filter_tests/which_filter_tests.rs
  • src/stdlib/which/lookup/workspace/mod.rs
docs/**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

docs/**/*.md: When new decisions are made, requirements change, libraries are added/removed, or architectural patterns evolve, proactively update the relevant file(s) in the docs/ directory to reflect the latest state.
Documentation must use en-GB-oxendict spelling and grammar ("-ize" / "-yse" / "-our"), with exception for the LICENSE filename.

docs/**/*.md: Use British English (en-GB) spelling with -ize endings (realize, organization), -lyse endings (analyse, paralyse, catalyse), -our endings (colour, behaviour), -re endings (calibre, centre, fibre), double 'l' (cancelled, counsellor), maintained 'e' (likeable, liveable), and -ogue endings (analogue, catalogue)
Use the Oxford comma in documentation: 'ships, planes, and hovercraft'
Treat company names as collective nouns in documentation: 'Concordat Industries are expanding'
Write headings in sentence case in documentation
Use Markdown headings (#, ##, ###) in order without skipping levels
Follow markdownlint recommendations for Markdown formatting
Provide code blocks and lists using standard Markdown syntax
Always provide a language identifier for fenced code blocks; use plaintext for non-code text
Use - as the first level bullet and renumber lists when items change in documentation
Prefer inline links using [text](url) or angle brackets around the URL in documentation
Ensure blank lines before and after bulleted lists and fenced blocks in documentation
Ensure tables have a delimiter line below the header row in documentation
Expand any uncommon acronym on first use in documentation (e.g., Continuous Integration (CI))
Wrap paragraphs at 80 columns in documentation
Wrap code at 120 columns in documentation
Do not wrap tables in documentation
Use GitHub-flavoured numeric footnotes referenced as [^1] in documentation
Number footnotes in order of appearance in the document
Caption every table and every diagram in documentation
Use ![alt text](path/to/image) for embedding figures with brief alt te...

Files:

  • docs/netsuke-design.md
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

**/*.md: Markdown paragraphs and bullet points must be wrapped at 80 columns.
Code blocks in Markdown must be wrapped at 120 columns.
Tables and headings in Markdown must not be wrapped.
Use dashes (-) for list bullets in Markdown.
Use GitHub-flavoured Markdown footnotes ([^1]) for references and footnotes.

Files:

  • docs/netsuke-design.md

⚙️ CodeRabbit configuration file

**/*.md: * Avoid 2nd person or 1st person pronouns ("I", "you", "we")

  • Use en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Headings must not be wrapped.
  • Documents must start with a level 1 heading
  • Headings must correctly increase or decrease by no more than one level at a time
  • Use GitHub-flavoured Markdown style for footnotes and endnotes.
  • Numbered footnotes must be numbered by order of appearance in the document.

Files:

  • docs/netsuke-design.md
docs/**/{README,readme,GUIDE,guide,*.md}

📄 CodeRabbit inference engine (docs/adr-001-replace-serde-yml-with-serde-saphyr.md)

Document in Netsuke's README and user guide that the project now supports YAML 1.2 (not YAML 1.1) and that duplicate keys in manifests are no longer permitted

Files:

  • docs/netsuke-design.md
docs/**/!(README).md

📄 CodeRabbit inference engine (docs/documentation-style-guide.md)

Avoid first and second person personal pronouns outside the README.md file in documentation

Files:

  • docs/netsuke-design.md
🧬 Code graph analysis (7)
src/stdlib/which/lookup/workspace/posix.rs (2)
src/stdlib/which/lookup/mod.rs (1)
  • is_executable (197-200)
src/stdlib/which/lookup/workspace/mod.rs (3)
  • log_if_no_matches (168-181)
  • should_visit_entry (122-128)
  • unwrap_or_log_error (149-164)
src/stdlib/which/lookup/workspace/windows.rs (3)
src/stdlib/which/lookup/mod.rs (1)
  • is_executable (197-200)
src/stdlib/which/lookup/workspace/mod.rs (3)
  • log_if_no_matches (168-181)
  • should_visit_entry (122-128)
  • unwrap_or_log_error (149-164)
src/stdlib/which/env.rs (1)
  • candidate_paths (148-165)
src/stdlib/register.rs (1)
test_support/src/env_lock.rs (1)
  • acquire (17-22)
src/stdlib/which/lookup/mod.rs (3)
src/stdlib/which/error.rs (1)
  • direct_not_found (23-30)
src/stdlib/which/env.rs (1)
  • candidate_paths (148-165)
src/stdlib/which/lookup/workspace/posix.rs (1)
  • search_workspace (14-43)
tests/std_filter_tests/which_filter_common.rs (3)
test_support/src/env_lock.rs (1)
  • acquire (17-22)
tests/std_filter_tests/support.rs (2)
  • filter_workspace (54-70)
  • stdlib_env_with_state (46-48)
tests/std_filter_tests/command_filters.rs (1)
  • state (65-67)
tests/std_filter_tests/which_filter_tests.rs (4)
tests/std_filter_tests/command_filters.rs (1)
  • state (65-67)
tests/std_filter_tests/which_filter_common.rs (11)
  • as_str (17-17)
  • as_str (33-33)
  • as_str (53-53)
  • with_tool_in_dirs (147-167)
  • write_tool (92-104)
  • new (16-16)
  • new (32-32)
  • new (52-52)
  • new (72-89)
  • render (133-136)
  • render (169-173)
tests/std_filter_tests/support.rs (3)
  • filter_workspace (54-70)
  • stdlib_env_with_state (46-48)
  • render (72-86)
src/stdlib/which/mod.rs (1)
  • new (36-46)
src/stdlib/which/lookup/workspace/mod.rs (2)
src/stdlib/which/lookup/workspace/posix.rs (1)
  • search_workspace (14-43)
src/stdlib/which/lookup/workspace/windows.rs (2)
  • search_workspace (16-46)
  • new (56-75)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: release / build-macos (x86_64-apple-darwin, macos-13, macos-x86_64, macos-x86_64) / Build macos artefacts
  • GitHub Check: release / build-windows (aarch64-pc-windows-msvc, arm64, arm64, windows-aarch64) / Build windows artefacts
  • GitHub Check: release / build-linux (x86_64-unknown-linux-gnu, linux-amd64, amd64) / Build linux artefacts
  • GitHub Check: release / build-windows (x86_64-pc-windows-msvc, amd64, x64, windows-x86_64) / Build windows artefacts
  • GitHub Check: Sourcery review
  • GitHub Check: build-test (ubuntu-latest, stable)
🔇 Additional comments (41)
docs/netsuke-design.md (2)

965-965: LGTM!

The correction from "fresh-only" to "fresh" clarifies that fresh is the boolean parameter controlling bypass behaviour.


988-991: LGTM!

The workspace fallback documentation clearly describes traversal bounds (depth of six), the skip-list directories, and the NETSUKE_WHICH_WORKSPACE toggle. This aligns with the implementation in the workspace modules.

src/stdlib/which/lookup/workspace/posix.rs (3)

1-12: LGTM!

The module doc comment is present, shared helpers are correctly imported from the parent module (addressing previous duplication concerns), and the import structure is clean.


14-43: LGTM!

The search_workspace function has clear control flow: iterate entries, skip errors with logging, process matches, and exit early when collect_all is false. This addresses past review feedback about inlining the traversal logic.


45-69: LGTM!

The helper correctly filters non-files, checks filename equality, handles UTF-8 conversion with an informative error message, and returns the path only when executable. The use of then_some is idiomatic.

src/stdlib/which/lookup/workspace/windows.rs (4)

1-14: LGTM!

Module doc comment present, and shared helpers are correctly imported from the parent module, maintaining consistency with the POSIX implementation.


16-46: LGTM!

The search_workspace function mirrors the POSIX variant's structure while adding Windows-specific WorkspaceMatchContext for PATHEXT-aware matching. The control flow is clear and addresses past feedback about inlining the collection logic.


48-76: LGTM!

WorkspaceMatchContext correctly precomputes the lowercase command and candidate basenames from PATHEXT when the command lacks an extension. The use of HashSet for basenames enables O(1) lookups during traversal.


78-110: LGTM!

The workspace_entry_matches predicate and process_workspace_entry helper correctly implement case-insensitive matching with PATHEXT support. The UTF-8 error handling mirrors the POSIX implementation for consistency.

src/stdlib/which/lookup/tests.rs (5)

7-11: LGTM!

The missing EnvLock import has been added, addressing the past review comment about the Windows-gated tests failing to compile.


39-67: LGTM!

The params helper and with_isolated_path centralise PATH isolation logic, reducing duplication across tests. The lock acquisition, unsafe env mutation, and restoration pattern is correct.


266-295: LGTM!

This test now uses std::env::join_paths to construct the PATH value, addressing the past review feedback about hardcoding Unix : separators. This makes the test cross-platform correct.


297-346: LGTM!

The Windows-specific PATHEXT tests cover important edge cases: empty PATHEXT falling back to defaults, and normalisation/deduplication of malformed entries. The assertions verify case-insensitive handling and deduplication correctly.


348-375: LGTM!

This Unix-specific test verifies that resolve_direct correctly rejects non-executable files with an informative error message. The test setup (write file, chmod 0o644) is appropriate.

src/stdlib/which/lookup/mod.rs (3)

35-70: LGTM!

The lookup function is well-structured with clear separation between direct path resolution and PATH-based search. The workspace fallback integration via handle_miss is clean.


83-101: LGTM!

The Windows resolve_direct implementation correctly handles PATHEXT via direct_candidates and propagates errors appropriately.


120-143: LGTM!

normalize_direct_path and direct_candidates provide clean platform-specific helpers for direct path resolution.

src/stdlib/which/lookup/workspace/mod.rs (7)

1-9: LGTM!

Module doc comment present and imports are appropriately scoped.


34-86: LGTM!

WorkspaceSkipList is well-designed with normalisation, deduplication, and a deterministic Hash implementation leveraging IndexSet's insertion-order iteration.


88-96: LGTM!

Platform-appropriate case handling: Windows lowercases for case-insensitive matching; POSIX preserves case.


98-120: LGTM!

Clean delegation pattern with appropriate diagnostic logging before and after the workspace fallback check.


122-128: LGTM!

The lossy UTF-8 conversion is acceptable here—paths with invalid encoding will simply not match any skip directory and will be visited, which is a safe fallback.


130-145: Design choice: fail-closed on invalid UTF-8.

The past review suggested defaulting to enabled when the env var contains non-UTF-8 data. This implementation instead disables the fallback and warns. Both approaches are valid—this "fail-closed" approach is more conservative and prevents unexpected workspace traversal when configuration is malformed. The warning provides good diagnostics.


147-181: LGTM!

unwrap_or_log_error and log_if_no_matches provide clean, reusable helpers that keep the platform-specific implementations focused on matching logic. Debug-level logging is appropriate for traversal diagnostics.

tests/std_filter_tests/mod.rs (1)

15-16: LGTM!

The new which_filter_common module is correctly placed before which_filter_tests, ensuring shared fixtures are available to the test module.

tests/std_filter_tests/which_filter_tests.rs (9)

1-12: LGTM!

Module documentation and imports are well-organised.


14-22: LGTM!

The helper cleanly encapsulates the reset-render-assert pattern and correctly propagates errors.


24-49: LGTM!

The cache removal test helper correctly uses expect_err and handles both success and error scenarios appropriately.


51-67: LGTM!

Good consolidation of fixture setup with a clear doc comment explaining purpose.


69-89: LGTM!

The duplicate paths helper cleanly tests both canonicalised and non-canonicalised scenarios.


91-101: LGTM!

The cwd_mode resolution helper is well-documented and correctly isolated.


103-158: LGTM!

Tests correctly leverage the new helpers and validate JSON output structure.


160-209: LGTM!

The duplicate path and cwd_mode tests are well-structured with clear assertions.


250-270: LGTM!

The workspace root override test correctly saves and restores the current directory, and the use of set_current_dir is acceptable here as the unsafe-environment guidelines specifically target env::set_var/env::remove_var.

tests/std_filter_tests/which_filter_common.rs (7)

1-11: LGTM!

Module documentation and imports are well-organised.


12-62: LGTM!

The newtypes follow the recommended pattern with concise single-line methods and appropriate trait implementations.


64-90: LGTM!

The PathEnv struct correctly implements RAII-based environment isolation with proper lock acquisition before modification.


92-104: LGTM!

The write_tool function correctly propagates errors with context.


106-131: LGTM!

Platform-specific helpers are correctly gated with cfg attributes and follow single-line function conventions where appropriate.


133-136: LGTM!

The error conversion is acceptable for test fixtures where the error message is sufficient.


169-173: LGTM!

The render method mirrors the standalone function and correctly propagates errors.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/stdlib/which/lookup/mod.rs (1)

181-191: Rename parameter to _env instead of using let _ = env;.

The coding guidelines state to prefer underscore-prefixed parameters over let _ = ... assignments for unused variables. This was flagged in a prior review and marked as addressed, but the workaround persists.

-fn candidates_for_dir(env: &EnvSnapshot, dir: &Utf8Path, command: &str) -> Vec<Utf8PathBuf> {
+fn candidates_for_dir(
+    #[cfg(windows)] env: &EnvSnapshot,
+    #[cfg_attr(not(windows), allow(unused_variables))]
+    #[cfg(not(windows))] _env: &EnvSnapshot,
+    dir: &Utf8Path,
+    command: &str,
+) -> Vec<Utf8PathBuf> {
     #[cfg(windows)]
     {
         env::candidate_paths(dir, command, env.pathext())
     }
     #[cfg(not(windows))]
     {
-        let _ = env;
         vec![dir.join(command)]
     }
 }

Alternatively, use a simpler approach with #[expect]:

 fn candidates_for_dir(env: &EnvSnapshot, dir: &Utf8Path, command: &str) -> Vec<Utf8PathBuf> {
     #[cfg(windows)]
     {
         env::candidate_paths(dir, command, env.pathext())
     }
+    #[expect(unused_variables, reason = "env only used on Windows for PATHEXT")]
     #[cfg(not(windows))]
     {
-        let _ = env;
         vec![dir.join(command)]
     }
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e68ce5c and 055a2e7.

📒 Files selected for processing (7)
  • src/stdlib/register.rs (2 hunks)
  • src/stdlib/which/cache.rs (2 hunks)
  • src/stdlib/which/lookup/mod.rs (4 hunks)
  • src/stdlib/which/lookup/workspace/mod.rs (1 hunks)
  • test_support/src/env.rs (1 hunks)
  • tests/std_filter_tests/which_filter_common.rs (1 hunks)
  • tests/std_filter_tests/which_filter_tests.rs (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings MUST be disallowed in Rust code.
Fix any warnings emitted during Rust tests in the code itself rather than silencing them.
In Rust, extract meaningfully named helper functions when a function is too long, adhering to separation of concerns and CQRS.
In Rust, group related parameters in meaningfully named structs when a function has too many parameters.
In Rust, consider using Arc to reduce data returned when a function is returning a large error.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///) so documentation can be generated with cargo doc.
In Rust, prefer immutable data and avoid unnecessary mut bindings.
In Rust, handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code in Rust unless absolutely necessary, and document any usage clearly.
In Rust, place function attributes after doc comments.
In Rust, do not use return in single-line functions.
In Rust, use predicate functions for conditional criteria with more than two branches.
Lints in Rust must not be silenced except as a last resort.
Lint rule suppressions in Rust must be tightly scoped and include a clear reason.
In Rust, prefer expect over allow.
In Rust tests, use rstest fixtures for shared setup.
In Rust tests, replace duplicated tests with #[rstest(...)] parameterized cases.
In Rust, prefer mockall for mocks and stubs.
In Rust, use concat!() to combine long string literals rather than escaping newlines with a backslash.
In Rust, prefer single line versions of functions where appropriate (e.g., pub fn new(id: u64) -> Self { Self(id) } instead of multi-line).
Use NewTypes in Rust to model domain values and eliminate 'integer soup'. Reach for newt-hype when introducing many homogeneous wrappers; add small shims for string-backed wrappers. For path-centric wrappers implement AsRef<Path> along...

Files:

  • src/stdlib/which/cache.rs
  • tests/std_filter_tests/which_filter_tests.rs
  • test_support/src/env.rs
  • tests/std_filter_tests/which_filter_common.rs
  • src/stdlib/register.rs
  • src/stdlib/which/lookup/mod.rs
  • src/stdlib/which/lookup/workspace/mod.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/stdlib/which/cache.rs
  • tests/std_filter_tests/which_filter_tests.rs
  • test_support/src/env.rs
  • tests/std_filter_tests/which_filter_common.rs
  • src/stdlib/register.rs
  • src/stdlib/which/lookup/mod.rs
  • src/stdlib/which/lookup/workspace/mod.rs
🧬 Code graph analysis (5)
src/stdlib/which/cache.rs (2)
src/stdlib/which/mod.rs (1)
  • new (36-46)
src/stdlib/which/lookup/workspace/mod.rs (1)
  • from_names (57-67)
test_support/src/env.rs (3)
test_support/src/env_lock.rs (1)
  • acquire (17-22)
test_support/src/env_var_guard.rs (1)
  • original (68-70)
test_support/src/env_guard.rs (4)
  • set_var (20-20)
  • set_var (36-38)
  • remove_var (28-28)
  • remove_var (40-42)
tests/std_filter_tests/which_filter_common.rs (2)
test_support/src/env_lock.rs (1)
  • acquire (17-22)
tests/std_filter_tests/support.rs (1)
  • filter_workspace (54-70)
src/stdlib/which/lookup/mod.rs (2)
src/stdlib/which/error.rs (1)
  • direct_not_found (23-30)
src/stdlib/which/env.rs (1)
  • candidate_paths (148-165)
src/stdlib/which/lookup/workspace/mod.rs (2)
src/stdlib/which/lookup/workspace/posix.rs (1)
  • search_workspace (14-43)
src/stdlib/which/lookup/workspace/windows.rs (2)
  • search_workspace (16-46)
  • new (56-75)
🔍 Remote MCP

Let me try a different approach by searching for general patterns:Now let me search for information about nested conditionals and function refactoring in Rust:Based on my research, I now have sufficient information to provide a comprehensive review context summary. Let me compile this:

Summary of Relevant Context for PR #241 Review

Platform-Specific Code Organization Best Practices

When organizing platform-specific code, Rust developers benefit from using #[cfg] attributes to keep platform-specific code organized and separated, which makes codebases easier to maintain. The #[cfg_attr] attribute combined with the #[path] attribute on mod statements can simplify platform-specific implementations, allowing you to specify module paths rather than using Rust's usual lookup.

The PR's approach of using separate posix.rs and windows.rs modules with conditional module loading aligns with established patterns in the Rust ecosystem.

File Type Extensions on Unix Platforms

Rust provides Unix-specific extensions for fs::FileType through a trait that adds support for special Unix file types such as block/character devices, pipes, and sockets. The PR's additions of is_fifo, is_block_device, is_char_device, and is_device predicates in src/stdlib/register.rs mirror these standard library extensions, appropriately providing non-Unix stubs that return false on non-Unix platforms.

Refactoring Nested Conditionals and Complex Logic

Standard practice in Rust refactoring involves extracting functions to hold logic that isn't involved with setup or error handling, making the main function concise and easy to verify by inspection. Refactoring deeply nested conditionals should use guard clauses and extract complex conditional logic into clearly named helper functions to improve readability and maintainability.

The PR review comments correctly identify that nested conditionals in posix.rs and windows.rs workspace search functions could benefit from helper extraction (unwrap_or_log_error, process_workspace_entry, collectors).

WalkDir Traversal Patterns

WalkDir supports flexible recursion depth control via max_depth method and uses the builder pattern to set arguments before conversion to an iterator. WalkDir provides an efficient cross-platform implementation of recursive directory traversal with options to control iteration, such as following symbolic links and efficiently skipping directories.

The PR's use of WalkDir for workspace traversal aligns with these patterns, and the proposed WORKSPACE_MAX_DEPTH constant (6) is a reasonable limit for controlled searching.

Test Fixture Patterns in Rust

In Rust testing, fixtures initialized once for all tests can use the #[once] attribute to be called just once with references shared to all tests. Test fixtures are a common pattern where test infrastructure encapsulates setup requirements for integration tests.

The PR's introduction of WhichTestFixture in tests/std_filter_tests/which_filter_common.rs follows established fixture patterns, providing centralized setup for environment state, temporary workspaces, and tool creation.

Function Parameter Count Best Practices

While Rust doesn't have strict rules on function argument limits, the PR review comments noting that should_stop_collecting in windows.rs exceeds typical argument counts (5) reflect general maintainability concerns. Encapsulating state in structs (like the suggested CollectionState) is an established pattern to manage parameter proliferation.

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: release / build-macos (x86_64-apple-darwin, macos-13, macos-x86_64, macos-x86_64) / Build macos artefacts
  • GitHub Check: release / build-linux (x86_64-unknown-linux-gnu, linux-amd64, amd64) / Build linux artefacts
  • GitHub Check: release / build-windows (x86_64-pc-windows-msvc, amd64, x64, windows-x86_64) / Build windows artefacts
  • GitHub Check: release / build-windows (aarch64-pc-windows-msvc, arm64, arm64, windows-aarch64) / Build windows artefacts
  • GitHub Check: Sourcery review
  • GitHub Check: build-test (ubuntu-latest, stable)
🔇 Additional comments (36)
src/stdlib/which/cache.rs (2)

122-122: LGTM!

Import updated to use the centralised with_isolated_path helper from test_support::env, eliminating duplication.


184-229: LGTM!

The test correctly uses with_isolated_path to isolate PATH mutations, ensuring deterministic behaviour. The closure properly propagates errors and the test logic is preserved.

test_support/src/env.rs (1)

121-137: LGTM!

The with_isolated_path helper correctly acquires EnvLock to serialise mutations, captures the original PATH, runs the action, and restores the environment. The SAFETY comments are appropriate and the pattern is consistent with other utilities in this module.

tests/std_filter_tests/which_filter_tests.rs (10)

15-41: LGTM!

CurrentDirGuard correctly captures and restores the working directory, using EnvLock to serialise mutations. The Drop implementation appropriately logs failures rather than panicking.


43-51: LGTM!

The render_and_assert_pure helper encapsulates the common pattern of resetting impurity, rendering, and asserting purity. Clean and reusable.


53-78: LGTM!

The test_cache_after_removal helper correctly validates cache invalidation behaviour. The expect_err usage on line 70 is now correct per the resolved past review comment.


80-96: LGTM!

Good extraction of common setup for cache removal tests. Reduces boilerplate while maintaining clarity.


98-118: LGTM!

The test_duplicate_paths helper cleanly encapsulates the canonical/non-canonical deduplication test logic.


120-130: LGTM!

The test_cwd_mode_resolution helper correctly validates cwd_mode handling across different case variants.


132-170: LGTM!

Tests correctly leverage the new helpers, reducing duplication while maintaining clear assertions.


172-206: LGTM!

The which_filter_all_returns_list test correctly validates JSON array output, and the duplicate path tests appropriately exercise the test_duplicate_paths helper.


207-238: LGTM!

The cwd_mode tests correctly validate both valid and invalid inputs, with appropriate error message assertions.


240-296: LGTM!

Remaining tests correctly exercise workspace fallback, heavy directory skipping, and workspace root override scenarios.

tests/std_filter_tests/which_filter_common.rs (6)

1-10: LGTM!

Module doc comment correctly describes the module's purpose. Imports are appropriate and minimal.


12-62: LGTM!

NewTypes ToolName, DirName, and Template correctly encapsulate domain values, following the coding guidelines for semantic wrappers. Single-line function format is used appropriately.


92-131: LGTM!

Platform-specific helpers correctly handle executable creation across Unix and Windows. Error context is appropriately added for filesystem operations.


133-136: LGTM!

The render helper is minimal and correctly converts MiniJinja errors for test use.


138-175: LGTM!

WhichTestFixture correctly encapsulates test setup with appropriate error context. The create_dir_all call now includes context as per the resolved past review comment.


64-90: I need to verify the critical deadlock claim in this review comment. Let me start by examining the actual code and understanding the synchronisation mechanisms being used.
<function_calls>

find . -type f -name "*.rs" | grep -E "(which_filter|env)" | head -20

</function_calls>

Let me search for the test file and understand the code structure:
<function_calls>

find . -type f -name "which_filter_common.rs"

</function_calls>

Let me search more broadly:
<function_calls>

find . -type f -name ".rs" -path "/tests/*" | head -20

</function_calls>

Let me use git to understand the repository structure:
<function_calls>

git ls-files | grep -i "which_filter" | head -20

</function_calls>

Let me check all test files:
<function_calls>

git ls-files | grep ".rs$" | grep -i test | head -30

</function_calls>

Let me search for files containing "PathEnv":
<function_calls>

rg -l "PathEnv" --type rs

</function_calls>

Let me get a broader view of the repository structure:
<function_calls>

git ls-files | head -50

</function_calls>

Let me search for EnvLock and VarGuard:
<function_calls>

rg -l "EnvLock|VarGuard" --type rs

</function_calls>

Let me try a different approach - search all Rust files:
<function_calls>

find . -name "*.rs" -type f 2>/dev/null | head -20

</function_calls>

Let me check if this is a git repository and examine its structure:
<function_calls>

pwd && ls -la | head -20

</function_calls>

Let me search for the test file more directly:
<function_calls>

find . -type f -name "*.rs" 2>/dev/null

</function_calls>

Let me check what repository we're working with:
<function_calls>

cat .git/config 2>/dev/null || echo "No .git/config found"

</function_calls>

Let me try listing files in the root:
<function_calls>

ls -1

</function_calls>

Now let me search for Rust files:
<function_calls>

find . -type f -name "*.rs" 2>/dev/null | head -30

</function_calls>

Let me search using rg for the specific identifiers mentioned:
<function_calls>

rg "PathEnv" --type rs -l

</function_calls>

Let me try a simpler listing:
<function_calls>

ls -la tests/ 2>/dev/null || echo "No tests directory"

</function_calls>

Let me check the root directory structure more carefully:
<function_calls>

find . -maxdepth 3 -type d | head -30

</function_calls>

Let me try to find any Rust files at all:
<function_calls>

find . -name "*.rs" -type f 2>&1 | grep -v "Permission denied" | head -30

</function_calls>

Let me check if the repo needs to be cloned first by looking at git status:
<function_calls>

git status 2>&1 | head -20

</function_calls>

Let me try to understand the current working directory:
<function_calls>

pwd && git rev-parse --show-toplevel 2>/dev/null || echo "Not in git repo"

</function_calls>

Let me check git log to see what repository this is:
<function_calls>

git log --oneline -1

</function_calls>

Let me try looking at the git remote:
<function_calls>

git remote -v

</function_calls>

Now let me search for the specific files mentioned in the review:
<function_calls>

find . -path "*/tests/std_filter_tests/which_filter_common.rs"

</function_calls>

Let me search for any file with "which_filter" in its name:
<function_calls>

find . -name "which_filter" 2>/dev/null

</function_calls>

Let me check what files exist in tests directory:
<function_calls>

ls -R tests/ 2>/dev/null | head -50

</function_calls>

Let me just list all .rs files:
<function_calls>

find . -type f -name "*.rs" 2>/dev/null

</function_calls>

Let me check if this is perhaps a case where the repo is empty or the files need to be created:
<function_calls>

git ls-tree -r HEAD | grep -i "which_filter|PathEnv" || echo "Not found in HEAD"

</function_calls>

Let me check the current branch and status:
<function_calls>

git branch -a && git status

</function_calls>

Let me list all files in the repository:
<function_calls>

git ls-files

</function_calls>

Let me search for PathEnv in all tracked files:
<function_calls>

git grep "PathEnv" || echo "PathEnv not found in repository"

</function_calls>

Let me search for EnvLock:
<function_calls>

git grep "EnvLock" || echo "EnvLock not found"

</function_calls>

Let me check if there are any Rust files at all:
<function_calls>

git ls-files | grep ".rs$" | head -20

</function_calls>

This appears to be an empty repository or the files don't exist yet. Let me check the review comment more carefully - the code snippet provided in the review shows the actual code, so I should analyze based on the code provided rather than searching for files. Let me examine what the review comment is claiming and whether the logic is sound based on the code snippet shown.

Looking at the review comment, it provides the code snippet directly. Let me analyze the claims made in the review:

Let me search for VarGuard more carefully across the entire git history:
<function_calls>

git log --all --source --full-history -S "VarGuard" --oneline | head -10

</function_calls>

Let me check if there are any uncommitted changes or untracked files:
<function_calls>

git diff --name-only && echo "---" && git diff --cached --name-only && echo "---" && git ls-files --others --exclude-standard

</function_calls>

Since I cannot locate the actual files in the repository, let me reason about this based on the code snippet provided and the coding guidelines. Let me create a comprehensive analysis based on the principles of Rust synchronization:
<function_calls>

cat > /tmp/analysis.md << 'EOF'

Analysis of the Deadlock Claim

Review Claim

The review claims that PathEnv::new() will deadlock because:

  1. Line 73: let lock = EnvLock::acquire();
  2. Line 80: let path_guard = VarGuard::set(...) - which allegedly calls EnvLock::acquire() internally
  3. Rust Mutex is not reentrant, so this would deadlock

Key Evidence Needed

  1. Does VarGuard::set() actually acquire EnvLock?
  2. If so, what is the exact order of lock acquisition?
  3. Are locks properly released in drop?

Analysis

Pattern Recognition

The code pattern suggests:

  • EnvLock is a global environment lock (serialization across threads)
  • VarGuard is a RAII guard that manages setting/restoring environment variables
  • The _lock field is kept alive for the entire PathEnv lifetime

Question: Would this deadlock?

If VarGuard::set() acquires EnvLock:

  • Thread T1 calls PathEnv::new()
  • T1 acquires EnvLock (creates lock)
  • T1 calls VarGuard::set("PATH", ...)
  • If VarGuard::set() tries to acquire EnvLock again from the same thread T1
  • Since Rust Mutex is NOT reentrant, this would deadlock (T1 waits for T1 to release lock)

BUT: Is this actually a problem?

The fact that:

  1. The code keeps _lock alive in the struct
  2. The code passes both lock and guards to Ok(Self {...})

Suggests that either:
A. The original author intended to keep the lock held while the guard is active
B. Or the author was confused about the lock semantics

Standard pattern for test environment variables:

// Typical pattern:
1. Acquire global lock
2. Set environment variable
3. Keep both lock and guard alive
4. On drop, lock is released, guard restores env var

This pattern makes sense IF set_var() does NOT acquire a lock (it assumes caller holds the lock).

Alternative: VarGuard::set() doesn't acquire lock

If VarGuard::set() doesn't acquire EnvLock:

  • The _lock field is required to hold the lock while the guard lives
  • Removing it would be the bug (VarGuard::set() could race with other threads)

Conclusion

The deadlock issue depends entirely on whether VarGuard::set() acquires EnvLock.

If YES -> deadlock (remove _lock acquisition)
If NO -> no deadlock (keep _lock, the review is wrong)
EOF
cat /tmp/analysis.md


</function_calls>

Now I need to search through git history or other branches to find the VarGuard implementation:
<function_calls>

git for-each-ref --format='%(refname:short)' refs/heads refs/remotes | while read branch; do
echo "Checking branch: $branch"
git show "$branch:tests/std_filter_tests/which_filter_common.rs" 2>/dev/null && break
done | head -100

</function_calls>

Let me try a different approach - check all branches:
<function_calls>

git branch -r

</function_calls>

Let me check what the current HEAD is:
<function_calls>

git symbolic-ref HEAD

</function_calls>

Let me look for any Cargo.toml to understand the project:
<function_calls>

find . -name "Cargo.toml" -type f

</function_calls>

Let me check the .gitignore to see what might be excluded:
<function_calls>

cat .gitignore 2>/dev/null | head -30

</function_calls>

Since the repository appears to be empty or the files don't exist in the current state, let me analyze the review comment based on Rust synchronization principles and the code snippet provided:
<function_calls>

python3 << 'EOF'

Analysis of the deadlock claim based on Rust patterns

analysis = """
DEADLOCK ANALYSIS FOR PathEnv::new()

Code Pattern Present:

  1. Explicit EnvLock::acquire() call held in variable lock
  2. VarGuard::set() calls (which the review claims also acquires EnvLock)
  3. The lock is stored in _lock field of returned struct

CRITICAL OBSERVATIONS:

  1. The _lock field is INTENTIONALLY kept alive

    • If it was truly redundant, the original author would not have included it
    • The pattern of keeping the lock alive in a struct field suggests:
      a) The lock is needed to protect the duration of the guard's operation
      b) When _lock is dropped, lock is released
      c) When path_guard is dropped, it restores the env var
  2. Possible scenarios:

    SCENARIO A: VarGuard::set() does NOT acquire EnvLock

    • Then _lock IS necessary (protects against concurrent env var changes)
    • The review's suggestion to remove _lock would be WRONG
    • No deadlock exists

    SCENARIO B: VarGuard::set() DOES acquire EnvLock

    • Then acquiring lock first, then calling set() which acquires lock again = DEADLOCK
    • The review's suggestion to remove lock acquisition is CORRECT
    • But then, why keep _lock field if not used?
  3. The review provides insufficient evidence

    • The review's own scratchpad quotes EnvLock::acquire() implementation
    • But does NOT show VarGuard::set() implementation
    • This is a critical gap in the verification
  4. Red flag in the review comment:

    • The review comment says "VarGuard::set at line 80 calls set_var which acquires EnvLock"
    • But provides no source code evidence of this
    • This is an ASSUMPTION, not verified fact
  5. Standard Rust test pattern:

    • In tests, environment variable changes often require careful synchronization
    • The pattern of:
      • Acquiring a global lock
      • Changing env vars
      • Keeping lock alive
      • Restoring on drop
    • This is a standard pattern for test isolation
    • It usually means set_var() does NOT take a lock (assumes caller holds it)

CONCLUSION:
The review makes a CRITICAL ASSUMPTION without verification:

  • It assumes VarGuard::set() calls set_var() which acquires EnvLock
  • No proof is provided for this assumption
  • If this assumption is wrong, the review is INCORRECT and potentially dangerous
  • If this assumption is right, the code would already be broken (tests would hang)
    """

print(analysis)
EOF


</function_calls>

Let me try to search for any recent git objects that might contain these identifiers:
<function_calls>

git rev-list --all | head -20

</function_calls>

Let me check if there's a pull request branch or similar:
<function_calls>

git describe --all 2>/dev/null || git name-rev --name-only HEAD

</function_calls>

Let me see if I can find references in reflog:
<function_calls>

git reflog | head -20

</function_calls>

Based on my analysis, since I cannot locate the actual source files in the repository to verify the claims, let me document my findings:

Critical claim unverified - cannot confirm deadlock without VarGuard implementation details.

The review assumes VarGuard::set() calls set_var() which internally acquires EnvLock, leading to a double-lock deadlock. However, this critical assumption lacks supporting evidence:

  1. The review provides no source code for VarGuard::set() or set_var() implementations
  2. The code pattern of keeping _lock in the struct field suggests it's intentionally held to protect the guard's lifetime
  3. If VarGuard::set() independently acquired EnvLock, the code would deadlock immediately on first use, but the code appears functional based on the context provided

Without access to the actual VarGuard::set() and set_var() implementations, the deadlock claim cannot be verified. The code pattern suggests _lock is intentional, not redundant. Manual verification of the VarGuard implementation is required to determine if the review's suggestion is correct or if it would introduce a race condition by removing the lock that protects the guard's lifetime.

src/stdlib/register.rs (4)

130-138: LGTM!

The non-Unix FILE_TESTS array correctly mirrors the Unix variant, exposing stubs that return false for platform-unsupported file types. This maintains API parity across platforms.


169-206: LGTM!

Non-Unix stubs are correctly documented with concise inline comments explaining the intentional false return values. This addresses the prior review feedback and preserves cross-platform semantic consistency.


258-273: LGTM!

The non-Unix file-type stub test correctly validates that all four predicates return false for a regular file. The #[cfg(not(unix))] guard ensures this only compiles on non-Unix platforms where the stubs are active.


218-256: Review comment is incorrect—no changes needed.

The closure wrapping is not unnecessary nesting but a required API contract of with_isolated_path. This function (defined in test_support/src/env.rs:123) takes a closure specifically to:

  1. Maintain the EnvLock scope throughout the action (line 124: let _lock = EnvLock::acquire())
  2. Guarantee environment restoration after execution (lines 131–134)
  3. Properly serialise mutations with shared guards as required by coding guidelines

The test already complies with guidelines:

  • Returns Result<()> (line 218)
  • Resides in #[cfg(test)] module (line 208)
  • Environment mutations are safely guarded via EnvLock (the shared guard mechanism)

Flattening this structure would break the safety invariants of the guard pattern and is not feasible given the API design.

src/stdlib/which/lookup/mod.rs (5)

83-101: LGTM!

The Windows resolve_direct correctly normalizes the path, generates PATHEXT-aware candidates, and filters for executables before returning or canonicalizing. The early return on empty matches with direct_not_found provides clear error semantics.


103-118: LGTM!

The non-Windows resolve_direct variant is appropriately simpler—single executable check without PATHEXT expansion. The previous unnecessary .clone() has been removed.


120-127: LGTM!

normalize_direct_path cleanly handles absolute vs relative path resolution against cwd. The logic is minimal and correct.


202-209: LGTM!

HandleMissContext groups related parameters into a struct, adhering to the guideline to group related parameters in meaningfully named structs when a function has too many parameters.


247-268: LGTM!

canonicalise correctly de-duplicates using IndexSet, handles filesystem errors with proper Error propagation, and validates UTF-8 paths. The implementation is clean and idiomatic.

src/stdlib/which/lookup/workspace/mod.rs (8)

1-10: LGTM!

Module-level documentation clearly explains the purpose. Imports are well-organised with standard library, external crates, and internal modules properly grouped.


22-33: LGTM!

Constants are appropriately scoped—WORKSPACE_MAX_DEPTH is pub(super) for internal use, DEFAULT_WORKSPACE_SKIP_DIRS is pub(crate) for broader configuration access. The default skip list covers common heavy directories.


38-48: LGTM!

WorkspaceSkipList and WorkspaceSearchParams are well-designed data structures. Using IndexSet preserves insertion order while providing deduplication, which is appropriate for the skip list.


99-121: LGTM!

search_workspace correctly checks the environment toggle before delegating to the platform-specific implementation. Debug logging provides good observability for diagnosing fallback behaviour.


123-129: LGTM!

should_visit_entry correctly allows non-directory entries (files) through unconditionally while filtering directories against the skip list. Using to_string_lossy() handles non-UTF-8 filenames gracefully.


131-146: LGTM!

workspace_fallback_enabled correctly handles all env::var outcomes. The NotUnicode case now logs a warning as requested in prior review feedback, and defaults to enabled for backward compatibility.


148-165: LGTM!

unwrap_or_log_error provides robust error handling for unreadable entries during traversal, logging at debug level rather than failing the entire search. This is appropriate for filesystem traversal where permission errors are expected.


167-182: LGTM!

log_if_no_matches provides helpful diagnostics when workspace fallback yields no results, aiding debugging of unexpected misses.

leynos and others added 3 commits November 27, 2025 13:23
- Introduce a new workspace fallback search for the `which` resolver,
  bounded by a max depth and skipping heavy directories like `.git`,
  `target`, `node_modules`, `dist`, and `build`, configurable via the
  `NETSUKE_WHICH_WORKSPACE` environment variable.
- Refactor workspace fallback into separate modules for Windows and POSIX
  systems, removing the previous unified implementation.
- Enhance direct path resolution in `which::lookup::mod` to unify Windows
  and POSIX behavior, normalizing paths, and handling candidate paths
  properly.
- Extend file type detection for non-Unix platforms with additional device
  file tests.
- Add extensive integration tests for the `which` filter/function covering
  cache behavior, canonicalisation, all-results returns, invalid arguments,
  and workspace fallback correctness.
- Update documentation to clarify behavior of workspace fallback and resolver caching.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Removed the EnvLock acquisition lines from tests in src/stdlib/which/lookup/tests.rs as they were unused or unnecessary. This simplifies test setup and avoids unnecessary environment locking.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
- Replace fixed DEFAULT_WORKSPACE_SKIP_DIRS with WORKSPACE_SKIP_DIRS constant.
- Introduce WorkspaceSkipList to manage skip directory names, supporting normalization, de-duplication and configurable input.
- Update workspace which lookup functions to accept WorkspaceSkipList to skip heavy or irrelevant directories.
- Propagate skip directory configuration from StdlibConfig through register_with_config and WhichConfig.
- Improve tests to use the new skip directory configuration.

This enhances the flexibility and correctness of workspace fallback search by allowing configurable skip lists for directories to exclude.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
@leynos leynos merged commit 7a272c4 into main Nov 27, 2025
15 checks passed
@leynos leynos deleted the terragon/which-filter-tests-and-workspace-lookup-refactor-7mc1ho branch November 27, 2025 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants