Skip to content

Conversation

@stewones
Copy link
Owner

@stewones stewones commented Sep 8, 2025

Summary by CodeRabbit

  • New Features

    • Directory ingestion now automatically skips binary/non-text files to avoid ingesting non-text content.
  • Bug Fixes

    • Improved robustness of file-reading during ingestion and enhanced logging to include execution context and totals.
  • Tests

    • Updated tests to validate binary-detection behavior and ensure proper test cleanup.
  • Chores

    • Bumped package version to 0.3.1.
    • Removed obsolete commented coverage-report steps from CI.
    • Broadened .gitignore to ignore all .sqlite files and removed an unused spec field.

@codecov
Copy link

codecov bot commented Sep 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (e74cb55) to head (4963770).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##              main        #3   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           18        18           
  Lines         2180      2185    +5     
  Branches       434       442    +8     
=========================================
+ Hits          2180      2185    +5     
Flag Coverage Δ
unittests 100.00% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/core/mcp.ts 100.00% <100.00%> (ø)
src/lib/store.ts 100.00% <100.00%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e74cb55...4963770. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai
Copy link

coderabbitai bot commented Sep 8, 2025

Walkthrough

Replaces extension-based file filtering with a heuristic binary detector and asynchronous recursive traversal in McpTool; updates tests to mock binary reads and path behavior; bumps package version to 0.3.1; removes two commented coverage-report steps from CI workflow; expands .gitignore sqlite pattern and enhances ingestion log messages; removes sourceId from an Angular docs tool spec.

Changes

Cohort / File(s) Summary
CI workflow cleanup
.github/workflows/ci.yml
Removed two commented coverage-report steps (markdown preparation and PR comment); retained active Codecov upload.
Version bump
package.json
Package version updated from 0.3.0 to 0.3.1.
Core: binary-safe directory ingestion
src/core/mcp.ts
Added isBinaryFile heuristic (reads first 8KB; null-byte and non-printable ratio checks). Rewrote fetchFromDirectory to an async recursive walk that excludes detected binaries and reads non-binary files as UTF-8. Minor ServerResult typing formatting changes.
Tests updated for binary detection
test/src/core/mcp.test.ts
Reworked fs/path mocks to simulate binary vs text reads (supports Buffer returns when encoding is null), added node:path mock, added open/read/close mocks for readSync usage, adjusted assertions and cleanup unmocking.
Ignore patterns
.gitignore
Changed .sqlite ignore pattern to *.sqlite to ignore all SQLite files by extension.
Logging enhancements
src/lib/store.ts
Augmented two log statements in SqliteEmbedStore.ingest to include [mcpId/toolId] context and chunk totals in messages.
Spec field removal
src/mcps/angular/tools/docs/index.ts
Removed sourceId: 'angular-llm-context' property from the spec object.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant McpTool
  participant FS as FileSystem

  Caller->>McpTool: fetchFromDirectory(contextDir)
  activate McpTool
  loop recursive directory walk
    McpTool->>FS: read first 8KB (no encoding)
    FS-->>McpTool: Buffer
    alt binary detected (null byte or high non-printable ratio)
      McpTool-->>McpTool: skip file
    else non-binary
      McpTool->>FS: readFile(file, "utf-8")
      FS-->>McpTool: text content
      McpTool-->>McpTool: append content block
    end
  end
  McpTool-->>Caller: return ServerResult with collected docs
  deactivate McpTool
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

I hopped through folders, sniffing bytes and light,
Found sneaky nulls hiding from my sight 🐇
Async paws skip the binary treat,
Only UTF-8 crumbs make it to my seat.
CI trimmed, version bumped—quiet delight.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  - Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.
  - Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/context-dir-resolution

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/src/core/mcp.test.ts (1)

66-71: Fix meta assertion: code now includes dir in ingest meta

init() passes meta.dir (undefined when no contextDir). The exact-match assertion will fail. Allow extra keys or include dir: undefined.

-		expect(ingestSpy).toHaveBeenCalledWith(
-			{ id: 'source-1', meta: { name: 'Bar-MCP', url: 'http://example.com', file: undefined } },
+		expect(ingestSpy).toHaveBeenCalledWith(
+			{ id: 'source-1', meta: expect.objectContaining({ name: 'Bar-MCP', url: 'http://example.com', file: undefined }) },
 			['c1', 'c2'],
 			{ mcpId: 'foo', toolId: 'bar' }
 		);

If you prefer strict equality, add dir: undefined to the expected meta instead.

🧹 Nitpick comments (4)
src/core/mcp.ts (3)

257-261: Call new isBinaryFile signature

If you implement the refactor above, adjust the call to drop the readFileSync argument.

-							const isBinary = await this.isBinaryFile(full, readFileSync);
+							const isBinary = await this.isBinaryFile(full);

243-268: Skip common heavy folders during walk

To avoid needless traversal and I/O (node_modules, .git, dist, build, coverage, etc.), add a small ignore set.

-			const walk = async (dir: string) => {
+			const walk = async (dir: string) => {
+				const IGNORE = new Set(['node_modules', '.git', 'dist', 'build', 'coverage']);
 				let entries: string[] = [];
 				try {
 					entries = readdirSync(dir);
 				} catch {
 					return;
 				}
 				for (const entry of entries) {
+					if (IGNORE.has(entry)) continue;
 					const full = pathMod.join(dir, entry);

274-275: Consider debug logging on read failures

Silently swallowing per-file read errors can hide actionable issues (permissions, encoding). Emit a debug-level log with the path once per failure.

test/src/core/mcp.test.ts (1)

203-205: Move unmocking to afterEach for isolation

To avoid accidental leakage if future assertions throw before cleanup, consider unmocking in an afterEach hook.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e74cb55 and 55e7d94.

📒 Files selected for processing (4)
  • .github/workflows/ci.yml (0 hunks)
  • package.json (1 hunks)
  • src/core/mcp.ts (4 hunks)
  • test/src/core/mcp.test.ts (2 hunks)
💤 Files with no reviewable changes (1)
  • .github/workflows/ci.yml
🔇 Additional comments (4)
package.json (1)

3-3: Version bump looks good

No other changes detected here.

src/core/mcp.ts (1)

338-345: LGTM: explicit ServerResult typing

The explicit type annotation is clear and safe; appending prompt later is fine.

test/src/core/mcp.test.ts (2)

116-136: Good binary-detection mocking

Clear separation between binary-probe reads (encoding: null) and text reads.


138-152: Path mock is adequate for the test scope

Join/relative/dirname implementations are minimal and sufficient here.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/core/mcp.ts (1)

259-261: Fix signature mismatch: remove extra argument to isBinaryFile
Call site still passes readFileSync, but isBinaryFile now accepts only (filePath). This is a TS error.

-							const isBinary = await this.isBinaryFile(full, readFileSync);
+							const isBinary = await this.isBinaryFile(full);

Run to ensure no other call sites pass extra args:

#!/bin/bash
rg -nP -C1 'isBinaryFile\s*\([^,\n)]+,\s*' --type=ts
🧹 Nitpick comments (3)
src/core/mcp.ts (3)

211-227: Heuristic may misclassify valid UTF‑8 as binary
Counting bytes >126 as “non‑printable” penalizes multi‑byte UTF‑8 text (non‑ASCII languages). Consider a quick UTF‑8 sanity check (e.g., attempt decode and detect U+FFFD rate) before concluding “binary.” I can draft a lightweight check if you want.


239-242: Build paths with path.join for cross‑platform correctness
Avoid string concatenation for paths.

-			const dirToRead = `${baseDir}/${this.spec.contextDir}`;
-			const { readdirSync, statSync, readFileSync } = await import('node:fs');
-			const pathMod = await import('node:path');
+			const pathMod = await import('node:path');
+			const dirToRead = pathMod.join(baseDir, this.spec.contextDir!);
+			const { readdirSync, statSync, readFileSync } = await import('node:fs');

254-257: Skip symlinks to avoid cycles and surprises
Following symlinks can create loops or escape the intended dir. Prefer lstatSync and skip symbolic links.

-						const st = statSync(full);
+						const st = lstatSync(full);
+						if (st.isSymbolicLink?.()) {
+							continue;
+						}

Also update the import:

-			const { readdirSync, statSync, readFileSync } = await import('node:fs');
+			const { readdirSync, lstatSync, readFileSync } = await import('node:fs');
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55e7d94 and f41ff63.

📒 Files selected for processing (1)
  • src/core/mcp.ts (4 hunks)
🔇 Additional comments (1)
src/core/mcp.ts (1)

339-346: LGTM on the ServerResult construction
Type annotation and payload wiring look correct.

Comment on lines +196 to +205
private async isBinaryFile(filePath: string): Promise<boolean> {
const { openSync, readSync, closeSync } = await import('node:fs');
try {
// Read first 8KB only
const fd = openSync(filePath, 'r');
const buffer = Buffer.allocUnsafe(8192);
const bytesRead = readSync(fd, buffer, 0, 8192, 0);
closeSync(fd);
const chunk = buffer.subarray(0, bytesRead);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Close the file descriptor on all paths (use finally)
If readSync throws, closeSync isn’t reached, leaking the fd. Wrap closeSync in a finally and guard it.

-	private async isBinaryFile(filePath: string): Promise<boolean> {
-		const { openSync, readSync, closeSync } = await import('node:fs');
-		try {
-			// Read first 8KB only
-			const fd = openSync(filePath, 'r');
-			const buffer = Buffer.allocUnsafe(8192);
-			const bytesRead = readSync(fd, buffer, 0, 8192, 0);
-			closeSync(fd);
-			const chunk = buffer.subarray(0, bytesRead);
+	private async isBinaryFile(filePath: string): Promise<boolean> {
+		const { openSync, readSync, closeSync } = await import('node:fs');
+		let fd: number | undefined;
+		try {
+			// Read first 8KB only
+			fd = openSync(filePath, 'r');
+			const buffer = Buffer.allocUnsafe(8192);
+			const bytesRead = readSync(fd, buffer, 0, 8192, 0);
+			const chunk = buffer.subarray(0, bytesRead);
@@
-		} catch {
-			// If we can't read the file, assume it's binary to be safe
-			return true;
-		}
+		} catch {
+			// If we can't read the file, assume it's binary to be safe
+			return true;
+		} finally {
+			if (fd !== undefined) {
+				try { closeSync(fd); } catch {}
+			}
+		}

Also applies to: 228-231

🤖 Prompt for AI Agents
In src/core/mcp.ts around lines 196 to 205 the file descriptor opened with
openSync may not be closed if readSync throws; change the function to ensure
closeSync is always called by moving closeSync into a finally block (track the
fd variable outside try, check fd !== undefined before closing) so the
descriptor is closed on success and on error. Apply the same pattern to the
related code at lines 228-231 to guard against fd leaks.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
.gitignore (1)

19-19: Ignore SQLite sidecar files too

Add WAL/SHM/journal patterns so devs don’t accidentally commit SQLite sidecars.

 *.sqlite
+*.sqlite-wal
+*.sqlite-shm
+*.sqlite-journal
src/lib/store.ts (1)

176-179: Make progress logs 1-based (friendlier) and less ambiguous

Current logs show 0-based chunk numbers. Consider 1-based display; also “Finished ingestion with X chunks” can be read as inserted count when it’s the processed index. Keep as-is if you prefer, but this improves clarity.

- log.warn(`[${mcpId}/${toolId}] Embedding chunk ${idx} of ${chunks.length}`);
+ log.warn(`[${mcpId}/${toolId}] Embedding chunk ${idx + 1} of ${chunks.length}`);

- log.warn(`[${mcpId}/${toolId}] Inserting chunk ${idx} of ${chunks.length}`);
+ log.warn(`[${mcpId}/${toolId}] Inserting chunk ${idx + 1} of ${chunks.length}`);

Optional: also log “processed vs inserted” totals at the end to reflect skips due to hash matches.

test/src/core/mcp.test.ts (4)

116-137: Drop readFileSync “binary detection” branch to match implementation

Binary sniffing is now via openSync/readSync; keeping a readFileSync({ encoding: null }) path adds confusion. Simplify to text-read paths only here.


153-185: Track fd→path instead of peeking last open call

Using the last open call to infer path is brittle if multiple fds were used. Map file descriptors to paths for robustness.

Example:

const fdToPath = new Map<number, string>();
let nextFd = 100;

const openSyncMock = vi.fn((p: string) => {
  const fd = nextFd++;
  fdToPath.set(fd, p);
  return fd;
});

const readSyncMock = vi.fn((fd: number, buffer: Buffer, offset: number, length: number) => {
  const mockPath = fdToPath.get(fd) || '';
  // ... choose content from mockPath ...
});

228-231: Assert binaries weren’t text-read

Strengthen the expectation to ensure binary files were skipped for text reads.

expect(readFileMock).toHaveBeenCalled();
expect(readFileMock.mock.calls.some(([p, opts]) => /img\.png$/.test(p) && opts?.encoding !== null)).toBe(false);
expect(readFileMock.mock.calls.some(([p]) => /a\.md$/.test(p))).toBe(true);
expect(readFileMock.mock.calls.some(([p]) => /b\.txt$/.test(p))).toBe(true);

239-242: Reset module registry after unmock to avoid cross-test bleed

Follow unmock with resetModules so subsequent tests don’t retain cached mocked dependencies.

vi.doUnmock('node:fs');
vi.doUnmock('node:path');
vi.resetModules();
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f41ff63 and 4963770.

📒 Files selected for processing (5)
  • .gitignore (1 hunks)
  • src/core/mcp.ts (4 hunks)
  • src/lib/store.ts (1 hunks)
  • src/mcps/angular/tools/docs/index.ts (0 hunks)
  • test/src/core/mcp.test.ts (2 hunks)
💤 Files with no reviewable changes (1)
  • src/mcps/angular/tools/docs/index.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/core/mcp.ts
🧰 Additional context used
🧬 Code graph analysis (1)
src/lib/store.ts (1)
src/lib/log.ts (1)
  • log (76-128)

Comment on lines 186 to +196
vi.doMock('node:fs', () => ({
readdirSync: readdirMock,
statSync: statMock,
readFileSync: readFileMock,
openSync: openSyncMock,
readSync: readSyncMock,
closeSync: closeSyncMock,
}));

vi.doMock('node:path', () => pathMock);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

doMock after importing McpTool won’t affect it — reset modules and re-import

These tests mock node:fs/node:path with vi.doMock, but McpTool was imported earlier at file top. Since doMock isn’t hoisted, the mocks won’t be applied to the already-loaded module, causing flakiness and leaking real FS. Reset and re-import the SUT, then define classes against that import.

Minimal fix inside each test after doMock:

vi.resetModules();
const { McpTool: LocalMcpTool } = await import('../../../src/core/mcp');
// then: class DirTool extends LocalMcpTool { ... }

Also applies to: 250-255, 289-294, 328-333

🤖 Prompt for AI Agents
In test/src/core/mcp.test.ts around lines 186 to 196, the test calls vi.doMock
after McpTool was already imported so the mocks won’t be applied to the loaded
module; after calling vi.doMock add vi.resetModules() and re-import McpTool from
'../../../src/core/mcp' (e.g. const { McpTool: LocalMcpTool } = await
import(...)) and then define any test-local subclasses against LocalMcpTool;
apply the same change at the other affected ranges (lines 250-255, 289-294,
328-333).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants