-
Notifications
You must be signed in to change notification settings - Fork 5
WIP: Add basic filter command as suggested in #834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
| Category | Issue | Status |
|---|---|---|
| Incorrect exception type caught for argparse errors ▹ view | ||
| Magic String Attribute Lookup ▹ view | ||
| Command execution logged at DEBUG level ▹ view | ||
| Naive string splitting for command parsing ▹ view | ||
| Memory inefficient stdin processing ▹ view | ||
| Inefficient O(n*m) project path lookup ▹ view | ||
| Expensive path resolution per file ▹ view | ||
| Mixed Responsibilities in Entry Point ▹ view | ||
| Unclear list variable names ▹ view | ||
| Non-descriptive tuple return type ▹ view |
Files scanned
| File Path | Reviewed |
|---|---|
| dfetch/log.py | ✅ |
| dfetch/util/util.py | ✅ |
| dfetch/main.py | ✅ |
| dfetch/util/cmdline.py | ✅ |
| dfetch/commands/filter.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Check out our docs on how you can make Korbit work best for you and your team.
dfetch/__main__.py
Outdated
| if args.verbose or not getattr(args.func, "SILENT", False): | ||
| logger.print_title() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic String Attribute Lookup 
Tell me more
What is the issue?
The use of a magic string 'SILENT' as an attribute lookup makes the code's intent unclear without additional context.
Why this matters
Future maintainers will need to search for where SILENT is defined and understand its purpose. This creates cognitive overhead and potential maintenance issues.
Suggested change ∙ Feature Preview
# Define a constant at module level
SILENT_COMMAND_FLAG = 'SILENT'
# Use in the code
if args.verbose or not getattr(args.func, SILENT_COMMAND_FLAG, False):
logger.print_title()Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| if not isinstance(cmd, list): | ||
| cmd = cmd.split(" ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naive string splitting for command parsing 
Tell me more
What is the issue?
String splitting on single space fails for commands with multiple consecutive spaces or complex arguments.
Why this matters
This naive splitting approach will create empty strings in the command list when there are multiple spaces, potentially causing subprocess execution failures or incorrect argument parsing.
Suggested change ∙ Feature Preview
Use shlex.split() instead of str.split(" ") to properly handle shell-like command parsing with quoted arguments and multiple spaces:
import shlex
if not isinstance(cmd, list):
cmd = shlex.split(cmd)Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
dfetch/commands/filter.py
Outdated
| for project_path in project_paths: | ||
| try: | ||
| file.relative_to(project_path) | ||
| return project_path | ||
| except ValueError: | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inefficient O(n*m) project path lookup 
Tell me more
What is the issue?
The file-in-project check performs O(n) linear search through all project paths for each file, resulting in O(n*m) complexity where n is files and m is projects.
Why this matters
With many files and projects, this nested loop creates quadratic time complexity that will significantly slow down filtering operations as the number of projects grows.
Suggested change ∙ Feature Preview
Pre-sort project paths by depth (deepest first) and use early termination, or consider using a trie-based structure for path prefix matching to reduce average case complexity.
Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
dfetch/commands/filter.py
Outdated
| block_outside: list[str] = [] | ||
|
|
||
| for path_or_arg in input_list: | ||
| arg_abs_path = Path(pwd / path_or_arg.strip()).resolve() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expensive path resolution per file 
Tell me more
What is the issue?
Path resolution with resolve() is called for every input file, which involves expensive filesystem operations including symlink resolution and path canonicalization.
Why this matters
The resolve() method performs multiple filesystem syscalls per file, creating significant I/O overhead that scales linearly with the number of input files and can become a bottleneck for large file sets.
Suggested change ∙ Feature Preview
Cache resolved paths or use absolute path construction without full resolution when symlink handling isn't critical:
arg_abs_path = (pwd / path_or_arg.strip()).absolute()Only call resolve() when necessary for symlink handling.
Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| help="Arguments to pass to the command", | ||
| ) | ||
|
|
||
| def __call__(self, args: argparse.Namespace) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixed Responsibilities in Entry Point 
Tell me more
What is the issue?
The call method mixes configuration, business logic, and output handling in a single method.
Why this matters
This violates the Single Responsibility Principle and makes the code less maintainable and harder to test individual components.
Suggested change ∙ Feature Preview
Split the call method into separate methods for configuration, filtering, and output handling:
def __call__(self, args: argparse.Namespace) -> None:
self._configure_logging(args)
filtered_args = self._process_filtering(args)
self._handle_output(args, filtered_args)Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
dfetch/commands/filter.py
Outdated
| block_inside: list[str] = [] | ||
| block_outside: list[str] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear list variable names 
Tell me more
What is the issue?
The variable names 'block_inside' and 'block_outside' are not immediately clear about what they represent in the context of file filtering.
Why this matters
Unclear variable names force readers to trace through the code to understand their purpose, increasing cognitive load.
Suggested change ∙ Feature Preview
files_inside_projects: list[str] = []
files_outside_projects: list[str] = []Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
dfetch/commands/filter.py
Outdated
| def _filter_files( | ||
| self, pwd: Path, topdir: Path, project_paths: set[Path], input_list: list[str] | ||
| ) -> tuple[list[str], list[str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-descriptive tuple return type 
Tell me more
What is the issue?
The return type annotation using tuple[list[str], list[str]] is not descriptive enough to understand what the two lists represent.
Why this matters
Generic tuple return types make it difficult to understand the meaning of each component without looking at the implementation.
Suggested change ∙ Feature Preview
from typing import NamedTuple
class FilterResult(NamedTuple):
files_inside_projects: list[str]
files_outside_projects: list[str]
def _filter_files(
self, pwd: Path, topdir: Path, project_paths: set[Path], input_list: list[str]
) -> FilterResult:Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
2dbdf3c to
b97688f
Compare
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughA new Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as dfetch filter
participant Resolve as Argument Resolution
participant Filter as File Filtering
participant Exec as Command Execution
User->>CLI: dfetch filter [--dfetched|-D] [cmd args...]
CLI->>Resolve: _get_arguments() + _resolve_args()
Resolve->>Resolve: Read stdin or CLI args
Resolve->>Resolve: Expand to all non-.git files if empty
Resolve-->>CLI: Map args to resolved Paths
CLI->>Filter: _filter_files(topdir, project_paths, input_paths, block_strategy)
Filter->>Filter: Determine blocklist based on FilterType<br/>(BLOCK_IF_INSIDE, BLOCK_IF_OUTSIDE, BLOCK_ONLY_PATH_TRAVERSAL)
Filter-->>CLI: Filtered argument list
alt cmd provided
CLI->>Exec: run_on_cmdline_uncaptured(filtered_args)
Exec-->>User: Execute command with filtered args
else no cmd
CLI-->>User: Print filtered args to stdout
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
Support for pre-commit hooks Fixes #19
b97688f to
c2f12c3
Compare
926f21e to
022d7b7
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
dfetch/log.py (1)
60-62: Consider enhancing the docstring with parameter details.The function wraps
coloredlogs.set_levelbut doesn't document which level values are accepted. Consider adding an Args section that specifies valid levels (e.g., "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL").Apply this diff to improve the documentation:
def set_level(level: str) -> None: - """Set the level of the logger.""" + """Set the level of the logger. + + Args: + level: The logging level (e.g., "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"). + """ coloredlogs.set_level(level)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
.github/workflows/run.yml(2 hunks).pre-commit-config.yaml(2 hunks)CHANGELOG.rst(1 hunks)dfetch/__main__.py(4 hunks)dfetch/commands/command.py(1 hunks)dfetch/commands/filter.py(1 hunks)dfetch/log.py(1 hunks)dfetch/util/cmdline.py(1 hunks)dfetch/util/util.py(1 hunks)doc/asciicasts/filter.cast(1 hunks)doc/generate-casts/filter-demo.sh(1 hunks)doc/generate-casts/generate-casts.sh(1 hunks)doc/manual.rst(2 hunks)features/filter-projects.feature(1 hunks)features/steps/generic_steps.py(5 hunks)
🧰 Additional context used
🪛 Shellcheck (0.11.0)
doc/generate-casts/filter-demo.sh
[warning] 9-9: Use 'pushd ... || exit' or 'pushd ... || return' in case pushd fails.
(SC2164)
[warning] 21-21: PROMPT_TIMEOUT appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 26-26: Use 'popd ... || exit' or 'popd ... || return' in case popd fails.
(SC2164)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: test (macos-latest, 3.13)
- GitHub Check: test (windows-latest, 3.14)
- GitHub Check: test (windows-latest, 3.13)
- GitHub Check: test (macos-latest, 3.10)
- GitHub Check: test (windows-latest, 3.9)
- GitHub Check: test (windows-latest, 3.11)
- GitHub Check: test (windows-latest, 3.12)
- GitHub Check: test (windows-latest, 3.10)
- GitHub Check: test (macos-latest, 3.14)
- GitHub Check: test-cygwin
- GitHub Check: DevContainer Build & Test
- GitHub Check: build (macos-latest)
- GitHub Check: build (windows-latest)
- GitHub Check: build (ubuntu-latest)
- GitHub Check: test
🔇 Additional comments (11)
CHANGELOG.rst (1)
18-18: LGTM!The changelog entry correctly documents the new filter command and references the related issue.
dfetch/util/util.py (1)
66-73: LGTM!The change to always yield
strimproves consistency in the API. The function now accepts bothstrandPathas input but consistently returnsstr, which aligns well with the new filter command's string-based path handling..pre-commit-config.yaml (2)
17-18: LGTM!The integration of the new
filtercommand into pre-commit hooks is well-structured and consistent. The--not-dfetchedflag correctly filters out dfetch-managed files from being processed by isort, black, and codespell, which addresses the requirement from issue #19.Also applies to: 24-25, 104-105
107-107: Clarify the commented exclude line.The commented exclusion for the asciinema player file is unclear. Should this be uncommented, or is it left as a reference? If it's no longer needed due to the filter command handling exclusions, consider removing it entirely to avoid confusion.
features/steps/generic_steps.py (2)
34-56: LGTM!The
tee_stdoutcontext manager is well-implemented. It correctly duplicates stdout to both the original stream and an in-memory buffer, which is essential for capturing output from the new filter command that writes directly to stdout.
67-78: LGTM!The updated
call_commandfunction correctly uses the newtee_stdoutcontext manager to capture both the traditional captured output and the raw stdout. This dual-capture approach properly supports testing commands that may write directly to stdout versus using the logging framework.dfetch/commands/command.py (1)
34-43: LGTM!The
silent()method is a well-designed addition to the Command base class. It provides a clean mechanism for commands (like the new Filter command) to opt out of printing the dfetch title, which is appropriate for commands designed to be used in pipelines or scripts..github/workflows/run.yml (1)
46-46: LGTM!The integration of the filter command into CI workflows demonstrates both usage modes: standalone (line 46) and with piped input from
find(line 60). This validates that the command works correctly in automated environments.Also applies to: 60-60
dfetch/util/cmdline.py (1)
78-83: Use shell-aware splitting to preserve complex commands.Duplicating the
cmd.split(" ")logic means quoted arguments or consecutive spaces still break (e.g.,--flag="two words"will be split into three tokens). Since this helper is new, please switch toshlex.split(and mirror the change inrun_on_cmdline) so both helpers can execute real-world commands without mangling their argv.-import logging +import logging +import shlex @@ - if not isinstance(cmd, list): - cmd = cmd.split(" ") + if not isinstance(cmd, list): + cmd = shlex.split(cmd) @@ - if not isinstance(cmd, list): - cmd = cmd.split(" ") + if not isinstance(cmd, list): + cmd = shlex.split(cmd)dfetch/__main__.py (1)
71-72: Fix crash when dfetch is invoked without a subcommand.
parser.set_defaults(func=_help)meansargs.funccan be the plain_helpfunction, which has nosilent()attribute. Callingargs.func.silent()now raisesAttributeError, so running plaindfetch(or any path that leaves the default handler in place) crashes before we can show the help text. Please guard this call (e.g., viagetattr(args.func, "silent", lambda: False)()) so non-command invocations keep working.- if args.verbose or not args.func.silent(): + silent_check = getattr(args.func, "silent", lambda: False) + if args.verbose or not silent_check(): logger.print_title()dfetch/commands/filter.py (1)
99-105: Allow combined dfetched/not-dfetched mode to work as advertised.If a user supplies both
--dfetchedand--not-dfetched, they’re clearly asking to forward every file while still blocking path traversal. Today that combination falls into the second branch and drops the dfetched files instead, making the “allow both” use-case impossible and leavingFilterType.BLOCK_ONLY_PATH_TRAVERSALeffectively dead code. Reorder the conditions (or add an explicit check for both flags) so the combined mode yieldsBLOCK_ONLY_PATH_TRAVERSAL.- if args.dfetched and not args.not_dfetched: - block_type = FilterType.BLOCK_IF_OUTSIDE - elif args.not_dfetched: - block_type = FilterType.BLOCK_IF_INSIDE - else: - block_type = FilterType.BLOCK_ONLY_PATH_TRAVERSAL + if args.dfetched and args.not_dfetched: + block_type = FilterType.BLOCK_ONLY_PATH_TRAVERSAL + elif args.dfetched: + block_type = FilterType.BLOCK_IF_OUTSIDE + elif args.not_dfetched: + block_type = FilterType.BLOCK_IF_INSIDE + else: + block_type = FilterType.BLOCK_ONLY_PATH_TRAVERSAL
|
|
||
| Filter | ||
| ------ | ||
| .. argparse:: | ||
| :module: dfetch.__main__ | ||
| :func: create_parser | ||
| :prog: dfetch | ||
| :path: filter | ||
|
|
||
| .. automodule:: dfetch.commands.filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove duplicate Filter section.
The Filter section is duplicated in the documentation. It appears at lines 118-130 (after Freeze) and again here at lines 155-164 (after Import). The second occurrence should be removed to avoid duplication and potential documentation build issues.
Apply this diff to remove the duplicate:
.. automodule:: dfetch.commands.import_
-
-Filter
-------
-.. argparse::
- :module: dfetch.__main__
- :func: create_parser
- :prog: dfetch
- :path: filter
-
-.. automodule:: dfetch.commands.filter📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Filter | |
| ------ | |
| .. argparse:: | |
| :module: dfetch.__main__ | |
| :func: create_parser | |
| :prog: dfetch | |
| :path: filter | |
| .. automodule:: dfetch.commands.filter |
🤖 Prompt for AI Agents
In doc/manual.rst around lines 155 to 164, the "Filter" section is a duplicate
of the earlier section (lines ~118-130); remove the entire duplicate block
(lines 155-164) so only the first "Filter" section remains and update any
surrounding spacing or TOC references if necessary to keep formatting
consistent.
Support for pre-commit hooks
Fixes #19
Description by Korbit AI
What change is being made?
Add a basic dfetch filter command that can list or pass through files to a command, integrate it into the CLI, and update supporting utilities, logging, and tooling configuration (pre-commit hooks, changelog, docs).
Why are these changes being made?
To provide a first-class file-filtering capability that can operate on manifest-scoped projects or stdin/args, and to wire it into the existing CLI and supporting utilities for robust usage and testing. This PR also updates tooling integration and documentation to reflect the new feature.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation