Skip to content

Conversation

@wes-mil
Copy link
Contributor

@wes-mil wes-mil commented Dec 22, 2025

Description

Creates SQL files to populate AD and AZ extension schemas

Motivation and Context

Resolves BED-6721

Why is this change required? What problem does it solve?

All schema definitions are being moved over to postgres. This incremental change moves the AD and AZ schemas over to postgres without removing existing functionality.

How Has This Been Tested?

Code is generated running just generate.

Data population is run on app startup.

Screenshots (optional):

Screenshot from 2025-12-22 14-50-22 Screenshot from 2025-12-22 14-50-36 Screenshot from 2025-12-22 14-50-48

Types of changes

  • New feature (non-breaking change which adds functionality)
  • Database Migrations (sorta)

Checklist:

Summary by CodeRabbit

  • New Features

    • Added automatic extension data population during system initialization.
    • Introduced schema definitions for Active Directory and Azure extensions, including node and relationship types.
  • Chores

    • Enhanced structured logging for error reporting throughout the migration and initialization processes.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 22, 2025

Walkthrough

This pull request introduces extension data population as a new initialization step in the database setup flow. It adds the PopulateExtensionData function across bootstrap, database, and migration layers, implements SQL generation for Active Directory and Azure extensions, and integrates the step into service entrypoints and test fixtures to populate extension metadata before graph migrations.

Changes

Cohort / File(s) Summary
Bootstrap Wrapper
cmd/api/src/bootstrap/server.go
Added new public function PopulateExtensionData that wraps database extension data population.
Database Interface & Implementation
cmd/api/src/database/db.go, cmd/api/src/database/mocks/db.go
Added PopulateExtensionData(ctx context.Context) error to Database interface and BloodhoundDB implementation; invokes migrator ExecuteExtensionDataPopulation. Updated mocks accordingly.
Migration System
cmd/api/src/database/migration/migration.go, cmd/api/src/database/migration/stepwise.go
Embedded extension SQL files via //go:embed extensions, added ExtensionsData field to Migrator, and implemented ExecuteExtensionDataPopulation() method to read and execute .sql files from extensions directory.
Extension Schema Definitions
cmd/api/src/database/migration/extensions/ad.sql, cmd/api/src/database/migration/extensions/az.sql
PostgreSQL migration scripts defining Active Directory and Azure extensions with node kinds and edge kinds schema metadata.
Schema Generation
packages/go/schemagen/generator/sql.go, packages/go/schemagen/main.go
Added NodeIcon type, NodeIcons map for UI metadata, and SQL generation functions (GenerateExtensionSQLActiveDirectory, GenerateExtensionSQLAzure) to produce extension SQL files. Integrated GenerateSQL into main workflow.
Logging Enhancement
packages/go/schemagen/generator/cue.go
Replaced fmt.Sprintf debug log with structured slog.Debug call.
Service Integration
cmd/api/src/services/entrypoint.go, packages/go/graphify/graph/graph.go, cmd/api/src/services/graphify/graphify_integration_test.go, cmd/api/src/test/integration/database.go, cmd/api/src/test/lab/fixtures/postgres.go
Integrated PopulateExtensionData call into initialization flows after MigrateDB and before graph migrations across multiple service entrypoints and test fixtures.
Integration Tests
cmd/api/src/daemons/changelog/ingestion_integration_test.go, cmd/api/src/daemons/datapipe/datapipe_integration_test.go, cmd/api/src/database/database_integration_test.go
Added PopulateExtensionData invocation to test setup routines after database migration.

Sequence Diagram

sequenceDiagram
    participant Svc as Service/Entrypoint
    participant Boot as Bootstrap Layer
    participant DB as Database
    participant Mig as Migrator
    participant FS as File System
    
    Svc->>Boot: MigrateDB(ctx, db)
    Boot->>DB: Migrate(ctx)
    DB->>Mig: Migrate(ctx)
    Mig->>FS: Read & execute migration files
    FS-->>Mig: SQL executed
    Mig-->>DB: ✓ Success
    DB-->>Boot: ✓ Success
    
    Svc->>Boot: PopulateExtensionData(ctx, db)
    Boot->>DB: PopulateExtensionData(ctx)
    DB->>Mig: ExecuteExtensionDataPopulation()
    Mig->>FS: ReadDir(extensions/)
    FS-->>Mig: [ad.sql, az.sql]
    Mig->>FS: ReadFile(ad.sql)
    FS-->>Mig: SQL content
    Mig->>DB: Execute AD extension SQL (transaction)
    DB-->>Mig: ✓ Inserted extension metadata
    Mig->>FS: ReadFile(az.sql)
    FS-->>Mig: SQL content
    Mig->>DB: Execute AZ extension SQL (transaction)
    DB-->>Mig: ✓ Inserted extension metadata
    Mig-->>DB: ✓ Success
    DB-->>Svc: ✓ Success
    
    Svc->>Svc: Proceed to graph migrations
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Key areas requiring attention:
    • Schema generation logic in packages/go/schemagen/generator/sql.go — verify NodeIcons mapping completeness and SQL string assembly correctness for node/edge kinds
    • Extension SQL files (ad.sql, az.sql) — ensure all node and edge kind definitions are correct and match schema expectations
    • Error handling consistency in ExecuteExtensionDataPopulation() — verify transactional integrity and error propagation across multiple .sql file executions
    • Integration points across services — confirm PopulateExtensionData is called in the correct order relative to migrations and graph initialization in all code paths

Possibly related PRs

Suggested labels

enhancement, dbmigration

Suggested reviewers

  • LawsonWillard
  • AD7ZJ
  • superlinkx

Poem

🐰 Hop, hop! Extensions bloom,
Data populates every room,
AD and Azure schemas take flight,
Migrations dance through the night!
A fluffy PR, pure delight! 🌸

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: generating AD and AZ extension schemas and applying them at startup, with direct reference to the Jira ticket.
Description check ✅ Passed The description addresses all required template sections: description of changes, motivation/context with ticket reference, testing details, types of changes, and completed checklist items.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch BED-6721

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
cmd/api/src/bootstrap/server.go (1)

75-81: Function implementation is correct, but can be simplified.

The function correctly wraps the database's PopulateExtensionData method. However, the explicit return nil on line 80 is redundant since the error is already nil when reaching that point.

🔎 Optional simplification
 func PopulateExtensionData(ctx context.Context, db database.Database) error {
-	if err := db.PopulateExtensionData(ctx); err != nil {
-		return err
-	}
-
-	return nil
+	return db.PopulateExtensionData(ctx)
 }
cmd/api/src/test/lab/fixtures/postgres.go (1)

44-45: Extension data population correctly added, but consider reusing DB instance.

The extension data population step is properly integrated with correct error handling. However, this fixture creates multiple BloodhoundDB instances (lines 40, 42, 44, and 47). While this works correctly, consider creating a single instance at the beginning and reusing it throughout the setup chain for better efficiency.

🔎 Optional refactor to reduce instance creation
 var PostgresFixture = lab.NewFixture(func(harness *lab.Harness) (*database.BloodhoundDB, error) {
 	testCtx := context.Background()
 	if labConfig, ok := lab.Unpack(harness, ConfigFixture); !ok {
 		return nil, fmt.Errorf("unable to unpack ConfigFixture")
 	} else if pgdb, err := database.OpenDatabase(labConfig.Database.PostgreSQLConnectionString()); err != nil {
 		return nil, err
-	} else if err := integration.Prepare(testCtx, database.NewBloodhoundDB(pgdb, auth.NewIdentityResolver())); err != nil {
+	} else {
+		bhdb := database.NewBloodhoundDB(pgdb, auth.NewIdentityResolver())
+		if err := integration.Prepare(testCtx, bhdb); err != nil {
-		return nil, fmt.Errorf("failed ensuring database: %v", err)
+			return nil, fmt.Errorf("failed ensuring database: %v", err)
-	} else if err := bootstrap.MigrateDB(testCtx, labConfig, database.NewBloodhoundDB(pgdb, auth.NewIdentityResolver()), config.NewDefaultAdminConfiguration); err != nil {
+		} else if err := bootstrap.MigrateDB(testCtx, labConfig, bhdb, config.NewDefaultAdminConfiguration); err != nil {
-		return nil, fmt.Errorf("failed migrating database: %v", err)
+			return nil, fmt.Errorf("failed migrating database: %v", err)
-	} else if err := bootstrap.PopulateExtensionData(testCtx, database.NewBloodhoundDB(pgdb, auth.NewIdentityResolver())); err != nil {
+		} else if err := bootstrap.PopulateExtensionData(testCtx, bhdb); err != nil {
-		return nil, fmt.Errorf("failed populating extension data: %v", err)
+			return nil, fmt.Errorf("failed populating extension data: %v", err)
-	} else {
+		}
-		return database.NewBloodhoundDB(pgdb, auth.NewIdentityResolver()), nil
+		return bhdb, nil
-	}
+	}
 }, nil)
packages/go/schemagen/generator/sql.go (1)

212-229: Consider adding error context for debugging.

The filesystem operations (stat, mkdir, open, write) return errors without additional context. While the current error handling is functionally correct, wrapping errors with context would aid debugging when generation fails.

🔎 Example using error wrapping
 if _, err := os.Stat(dir); err != nil {
     if !os.IsNotExist(err) {
-        return err
+        return fmt.Errorf("failed to stat directory %s: %w", dir, err)
     }

     if err := os.MkdirAll(dir, defaultPackageDirPermission); err != nil {
-        return err
+        return fmt.Errorf("failed to create directory %s: %w", dir, err)
     }
 }

 if fout, err := os.OpenFile(path.Join(dir, "ad.sql"), fileOpenMode, defaultSourceFilePermission); err != nil {
-    return err
+    return fmt.Errorf("failed to open ad.sql: %w", err)
 } else {
     defer fout.Close()

     _, err := fout.WriteString(sb.String())
-    return err
+    if err != nil {
+        return fmt.Errorf("failed to write ad.sql: %w", err)
+    }
+    return nil
 }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e2e3a8 and 7c6b164.

📒 Files selected for processing (18)
  • cmd/api/src/bootstrap/server.go
  • cmd/api/src/daemons/changelog/ingestion_integration_test.go
  • cmd/api/src/daemons/datapipe/datapipe_integration_test.go
  • cmd/api/src/database/database_integration_test.go
  • cmd/api/src/database/db.go
  • cmd/api/src/database/migration/extensions/ad.sql
  • cmd/api/src/database/migration/extensions/az.sql
  • cmd/api/src/database/migration/migration.go
  • cmd/api/src/database/migration/stepwise.go
  • cmd/api/src/database/mocks/db.go
  • cmd/api/src/services/entrypoint.go
  • cmd/api/src/services/graphify/graphify_integration_test.go
  • cmd/api/src/test/integration/database.go
  • cmd/api/src/test/lab/fixtures/postgres.go
  • packages/go/graphify/graph/graph.go
  • packages/go/schemagen/generator/cue.go
  • packages/go/schemagen/generator/sql.go
  • packages/go/schemagen/main.go
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-06-06T23:12:14.181Z
Learnt from: elikmiller
Repo: SpecterOps/BloodHound PR: 1563
File: packages/go/graphschema/azure/azure.go:24-24
Timestamp: 2025-06-06T23:12:14.181Z
Learning: In BloodHound, files in packages/go/graphschema/*/`*.go` are generated from CUE schemas. When `just prepare-for-codereview` is run, it triggers code generation that may automatically add import aliases or other formatting changes. These changes are legitimate outputs of the generation process, not manual edits that would be overwritten.

Applied to files:

  • packages/go/schemagen/main.go
  • packages/go/schemagen/generator/cue.go
  • packages/go/schemagen/generator/sql.go
📚 Learning: 2025-06-25T17:52:33.291Z
Learnt from: superlinkx
Repo: SpecterOps/BloodHound PR: 1606
File: cmd/api/src/analysis/azure/post.go:33-35
Timestamp: 2025-06-25T17:52:33.291Z
Learning: In BloodHound Go code, prefer using explicit slog type functions like slog.Any(), slog.String(), slog.Int(), etc. over simple key-value pairs for structured logging. This provides better type safety and makes key-value pairs more visually distinct. For error types, use slog.Any("key", err) or slog.String("key", err.Error()).

Applied to files:

  • packages/go/schemagen/main.go
  • packages/go/schemagen/generator/cue.go
📚 Learning: 2025-11-25T22:11:53.518Z
Learnt from: LawsonWillard
Repo: SpecterOps/BloodHound PR: 2107
File: cmd/api/src/database/graphschema.go:86-100
Timestamp: 2025-11-25T22:11:53.518Z
Learning: In cmd/api/src/database/graphschema.go, the CreateSchemaEdgeKind method intentionally does not use AuditableTransaction or audit logging because it would create too much noise in the audit log, unlike CreateGraphSchemaExtension which does use auditing.

Applied to files:

  • packages/go/schemagen/generator/sql.go
🧬 Code graph analysis (12)
cmd/api/src/bootstrap/server.go (1)
cmd/api/src/database/db.go (1)
  • Database (72-192)
packages/go/graphify/graph/graph.go (1)
cmd/api/src/bootstrap/server.go (1)
  • PopulateExtensionData (75-81)
cmd/api/src/daemons/datapipe/datapipe_integration_test.go (1)
cmd/api/src/bootstrap/server.go (1)
  • PopulateExtensionData (75-81)
cmd/api/src/database/migration/stepwise.go (1)
cmd/api/src/database/migration/migration.go (1)
  • Migrator (47-51)
cmd/api/src/services/entrypoint.go (1)
cmd/api/src/bootstrap/server.go (1)
  • PopulateExtensionData (75-81)
cmd/api/src/test/lab/fixtures/postgres.go (3)
cmd/api/src/bootstrap/server.go (1)
  • PopulateExtensionData (75-81)
cmd/api/src/database/db.go (1)
  • NewBloodhoundDB (225-227)
cmd/api/src/auth/model.go (1)
  • NewIdentityResolver (74-76)
cmd/api/src/database/db.go (2)
cmd/api/src/bootstrap/server.go (1)
  • PopulateExtensionData (75-81)
cmd/api/src/database/migration/migration.go (1)
  • NewMigrator (54-64)
cmd/api/src/services/graphify/graphify_integration_test.go (1)
cmd/api/src/bootstrap/server.go (1)
  • PopulateExtensionData (75-81)
cmd/api/src/test/integration/database.go (1)
cmd/api/src/bootstrap/server.go (1)
  • PopulateExtensionData (75-81)
cmd/api/src/database/mocks/db.go (1)
cmd/api/src/bootstrap/server.go (1)
  • PopulateExtensionData (75-81)
cmd/api/src/database/database_integration_test.go (1)
cmd/api/src/bootstrap/server.go (1)
  • PopulateExtensionData (75-81)
packages/go/schemagen/generator/sql.go (3)
packages/go/schemagen/model/schema.go (2)
  • ActiveDirectory (62-72)
  • Azure (50-60)
packages/go/schemagen/generator/golang.go (1)
  • SchemaSourceName (32-32)
packages/go/schemagen/csgen/models.go (1)
  • Symbol (23-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build BloodHound Container Image / Build and Package Container
  • GitHub Check: run-tests
  • GitHub Check: build-ui
  • GitHub Check: run-analysis
🔇 Additional comments (21)
packages/go/schemagen/generator/cue.go (1)

98-101: LGTM! Structured logging adopted.

The change from unstructured logging to structured slog.Debug with explicit type functions improves observability and aligns with the project's logging standards.

Based on learnings, BloodHound prefers explicit slog type functions like slog.String() for better type safety and visual distinction.

cmd/api/src/services/graphify/graphify_integration_test.go (1)

88-89: LGTM: Extension data population properly integrated into test setup.

The extension data population step is correctly positioned after database migration and before graph schema assertion, with appropriate error handling.

cmd/api/src/daemons/datapipe/datapipe_integration_test.go (1)

93-94: LGTM: Consistent test setup pattern.

The extension data population is correctly integrated with proper error handling and sequencing.

cmd/api/src/database/database_integration_test.go (1)

61-62: LGTM: Extension data population added to database integration tests.

The new initialization step is properly integrated with appropriate error handling.

cmd/api/src/test/integration/database.go (1)

139-140: LGTM: Extension data population integrated into Prepare flow.

The new step is properly sequenced and includes appropriate error wrapping. The deprecation notice for this file doesn't affect the correctness of this change.

cmd/api/src/daemons/changelog/ingestion_integration_test.go (1)

119-119: LGTM: Extension data population added to changelog integration test.

The initialization step is correctly positioned with proper error handling.

packages/go/graphify/graph/graph.go (1)

178-179: LGTM: Extension data population integrated into service initialization.

The extension data population is correctly sequenced between database migration and graph migration, with appropriate error handling and propagation.

cmd/api/src/database/mocks/db.go (1)

2413-2425: LGTM! Generated mock aligns with interface changes.

The gomock-generated PopulateExtensionData method correctly implements the new Database interface method. The mock follows the established pattern and will support testing scenarios where extension data population is invoked.

cmd/api/src/services/entrypoint.go (1)

83-84: LGTM! Extension data population correctly sequenced.

The new PopulateExtensionData step is properly placed after RDMS migrations and before graph migrations, with appropriate error handling and descriptive error messages.

packages/go/schemagen/main.go (2)

76-86: LGTM! SQL generation function follows established patterns.

The GenerateSQL function mirrors the structure of GenerateGolang, GenerateSharedTypeScript, and GenerateCSharp, providing a consistent interface for generating extension SQL files.


92-93: Good use of structured logging.

The migration to slog with attr.Error and slog.String improves observability and follows Go structured logging best practices.

Based on learnings, this aligns with the preferred pattern in BloodHound for structured logging.

Also applies to: 98-99, 105-105

cmd/api/src/database/migration/stepwise.go (1)

199-239: LGTM! Extension data population is well-structured.

The ExecuteExtensionDataPopulation method properly:

  • Iterates through extension data sources
  • Filters for SQL files
  • Executes each file in a transaction
  • Provides clear error messages with file context

The SQL files are idempotent (DELETE before INSERT), so re-execution is safe if this method is called multiple times.

cmd/api/src/database/migration/migration.go (1)

30-31: LGTM! Clean separation of extension data from migrations.

The new ExtensionMigrations embed and ExtensionsData field provide a clear separation between schema migrations and extension data population, improving maintainability.

Also applies to: 49-49, 59-61

cmd/api/src/database/db.go (2)

101-101: LGTM! Interface extension is focused and well-defined.

The PopulateExtensionData method addition to the Database interface provides a clear contract for extension data initialization.


278-285: LGTM! Implementation properly delegates and logs errors.

The PopulateExtensionData implementation:

  • Delegates to the migrator's ExecuteExtensionDataPopulation
  • Uses structured logging with attr.Error for clear diagnostics
  • Provides descriptive error messages for the extensions data population phase
cmd/api/src/database/migration/extensions/ad.sql (2)

24-130: LGTM! Generated SQL follows correct pattern.

The DO block properly:

  • Captures the new extension id with RETURNING ... INTO
  • Uses the captured id for all subsequent inserts
  • Provides idempotency through initial DELETE

18-18: Foreign key constraints are properly configured with CASCADE deletes.

The schema_node_kinds, schema_edge_kinds, schema_properties, schema_environments, and schema_relationship_findings tables all reference schema_extensions(id) with ON DELETE CASCADE, so the DELETE operation will safely cascade without constraint violations.

cmd/api/src/database/migration/extensions/az.sql (2)

18-18: Verify foreign key constraints allow deletion.

Same as ad.sql: ensure ON DELETE CASCADE is configured for foreign keys referencing schema_extensions to prevent deletion failures when re-running this script.


24-97: LGTM! Generated SQL follows correct pattern.

The Azure extension SQL properly captures the new extension id and uses it for all node and edge kind inserts, maintaining referential integrity.

packages/go/schemagen/generator/sql.go (2)

27-163: LGTM!

The NodeIcon struct and NodeIcons map provide a clean way to associate UI metadata with schema node types. The hardcoded icon and color mappings are appropriate for built-in AD and Azure node types.


180-180: This is static SQL file generation, not a security vulnerability.

The code generates SQL INSERT statements and writes them to a .sql file. Values come from CUE schemas and a hardcoded NodeIcons map—both developer-controlled sources. Since the SQL is written to a static file rather than executed with user input, there is no SQL injection vector here.

If you want to add defensive escaping for robustness (in case schema definitions ever contain special characters), that's reasonable as a code quality improvement, but this should not be treated as a security issue.

Likely an incorrect or invalid review comment.

Comment on lines +178 to +188
for i, kind := range adSchema.NodeKinds {
if iconInfo, found := NodeIcons[kind.Symbol]; found {
sb.WriteString(fmt.Sprintf("\t\t(new_extension_id, '%s', '%s', '', %t, '%s', '%s')", kind.GetRepresentation(), kind.GetName(), found, iconInfo.Icon, iconInfo.Color))
} else {
sb.WriteString(fmt.Sprintf("\t\t(new_extension_id, '%s', '%s', '', %t, '', '')", kind.GetRepresentation(), kind.GetName(), found))
}

if i != len(adSchema.NodeKinds)-1 {
sb.WriteString(",\n")
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Semantic mismatch: found controls is_display_kind.

The boolean found (indicating whether a NodeIcon entry exists) is used directly as the is_display_kind value in the SQL. This creates a semantic coupling where node types without icons are marked as non-displayable. If is_display_kind should reflect UI display policy rather than icon availability, this is incorrect. If a new node type is added to the schema without a corresponding icon, it would incorrectly be marked is_display_kind=false.

Verify the intended semantics of is_display_kind in the database schema:

#!/bin/bash
# Search for is_display_kind usage and schema definitions
rg -n "is_display_kind" --type=go --type=sql -C3

for i, kind := range adSchema.RelationshipKinds {
_, traversable := traversableMap[kind.Symbol]

sb.WriteString(fmt.Sprintf("\t\t(new_extension_id, '%s', '', %t)", kind.GetRepresentation(), traversable))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

SQL injection risk from unescaped string interpolation.

Line 203 has the same SQL injection risk as identified in the node kinds section (lines 180, 182). kind.GetRepresentation() is interpolated without escaping.

🤖 Prompt for AI Agents
In packages/go/schemagen/generator/sql.go around line 203, the code interpolates
kind.GetRepresentation() directly into an SQL string causing SQL injection risk;
escape single quotes in the representation before injection (e.g., replace '
with ''), or better yet build these inserts using parameterized
statements/driver-specific escaping; update the code to sanitize/escape
kind.GetRepresentation() (or switch to parameters) before calling fmt.Sprintf so
generated SQL cannot be broken by embedded quotes.

Comment on lines +232 to +297
func GenerateExtensionSQLAzure(dir string, azSchema model.Azure) error {
var sb strings.Builder

sb.WriteString(fmt.Sprintf("-- Code generated by Cuelang code gen. DO NOT EDIT!\n-- Cuelang source: %s/\n", SchemaSourceName))

sb.WriteString("DELETE FROM schema_extensions WHERE name = 'AZ';\n\n")

sb.WriteString("DO $$\nDECLARE\n\tnew_extension_id INT;\nBEGIN\n")

sb.WriteString("\tINSERT INTO schema_extensions (name, display_name, version, is_builtin) VALUES ('AZ', 'Azure', 'v0.0.1', true) RETURNING id INTO new_extension_id;\n\n")

sb.WriteString("\tINSERT INTO schema_node_kinds (schema_extension_id, name, display_name, description, is_display_kind, icon, icon_color) VALUES\n")

for i, kind := range azSchema.NodeKinds {
if iconInfo, found := NodeIcons[kind.Symbol]; found {
sb.WriteString(fmt.Sprintf("\t\t(new_extension_id, '%s', '%s', '', %t, '%s', '%s')", kind.GetRepresentation(), kind.GetName(), found, iconInfo.Icon, iconInfo.Color))
} else {
sb.WriteString(fmt.Sprintf("\t\t(new_extension_id, '%s', '%s', '', %t, '', '')", kind.GetRepresentation(), kind.GetName(), found))
}

if i != len(azSchema.NodeKinds)-1 {
sb.WriteString(",\n")
}
}

sb.WriteString(";\n\n")

sb.WriteString("\tINSERT INTO schema_edge_kinds (schema_extension_id, name, description, is_traversable) VALUES\n")

traversableMap := make(map[string]struct{})

for _, kind := range azSchema.PathfindingRelationships {
traversableMap[kind.Symbol] = struct{}{}
}

for i, kind := range azSchema.RelationshipKinds {
_, traversable := traversableMap[kind.Symbol]

sb.WriteString(fmt.Sprintf("\t\t(new_extension_id, '%s', '', %t)", kind.GetRepresentation(), traversable))

if i != len(azSchema.RelationshipKinds)-1 {
sb.WriteString(",\n")
}
}

sb.WriteString(";\nEND $$;")

if _, err := os.Stat(dir); err != nil {
if !os.IsNotExist(err) {
return err
}

if err := os.MkdirAll(dir, defaultPackageDirPermission); err != nil {
return err
}
}

if fout, err := os.OpenFile(path.Join(dir, "az.sql"), fileOpenMode, defaultSourceFilePermission); err != nil {
return err
} else {
defer fout.Close()

_, err := fout.WriteString(sb.String())
return err
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Extract common SQL generation logic to eliminate duplication.

GenerateExtensionSQLAzure is nearly identical to GenerateExtensionSQLActiveDirectory. The functions differ only in extension metadata (name, display name, file name) and schema type. This duplication violates DRY and creates maintenance risk—future changes must be applied consistently to both functions.

Additionally, this function inherits the same issues identified in the AD function:

  • Lines 247, 249: found controls is_display_kind (semantic mismatch)
  • Lines 247, 249, 270: SQL injection risk from unescaped interpolation
🔎 Proposed refactor to eliminate duplication

Create a generic helper that accepts schema interface and extension metadata:

type ExtensionMetadata struct {
    Name        string
    DisplayName string
    Version     string
    FileName    string
}

type SchemaProvider interface {
    GetNodeKinds() []StringEnum
    GetRelationshipKinds() []StringEnum
    GetPathfindingRelationships() []StringEnum
}

func generateExtensionSQL(dir string, metadata ExtensionMetadata, schema SchemaProvider) error {
    var sb strings.Builder
    
    sb.WriteString(fmt.Sprintf("-- Code generated by Cuelang code gen. DO NOT EDIT!\n-- Cuelang source: %s/\n", SchemaSourceName))
    sb.WriteString(fmt.Sprintf("DELETE FROM schema_extensions WHERE name = '%s';\n\n", metadata.Name))
    
    // ... rest of common logic using schema.GetNodeKinds(), etc.
    
    if fout, err := os.OpenFile(path.Join(dir, metadata.FileName), fileOpenMode, defaultSourceFilePermission); err != nil {
        return err
    } else {
        defer fout.Close()
        _, err := fout.WriteString(sb.String())
        return err
    }
}

func GenerateExtensionSQLActiveDirectory(dir string, adSchema model.ActiveDirectory) error {
    return generateExtensionSQL(dir, ExtensionMetadata{
        Name:        "AD",
        DisplayName: "Active Directory",
        Version:     "v0.0.1",
        FileName:    "ad.sql",
    }, adSchemaAdapter{adSchema})
}

func GenerateExtensionSQLAzure(dir string, azSchema model.Azure) error {
    return generateExtensionSQL(dir, ExtensionMetadata{
        Name:        "AZ",
        DisplayName: "Azure",
        Version:     "v0.0.1",
        FileName:    "az.sql",
    }, azSchemaAdapter{azSchema})
}

Then implement adapter types to satisfy the SchemaProvider interface.

@@ -0,0 +1,131 @@
-- Copyright 2025 Specter Ops, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider renaming ad and az.sql files to something more descriptive like ad_graph_schema.sql or ad_kinds.sql

"github.com/specterops/bloodhound/packages/go/schemagen/model"
)

type NodeIcon struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodeIcon and NodeIcons likely dont need exporting.

},
}

func GenerateExtensionSQLActiveDirectory(dir string, adSchema model.ActiveDirectory) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with coderabbit here-- there is an opportunity to dry up this function and the corresponding azure generator.

},
}

func GenerateExtensionSQLActiveDirectory(dir string, adSchema model.ActiveDirectory) error {
Copy link
Contributor

@brandonshearin brandonshearin Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some simple unit testing may be helpful with maintaining this new stuff too. you can use the TempDir from testing.T to do something like:

    // Setup test data
    adSchema := model.ActiveDirectory{
        NodeKinds: []model.NodeKind{
            {Symbol: "User", Name: "User"},
            {Symbol: "Computer", Name: "Computer"},
            {Symbol: "Group", Name: "Group"},
        },
        RelationshipKinds: []model.RelationshipKind{
            {Symbol: "MemberOf"},
            {Symbol: "AdminTo"},
        },
        PathfindingRelationships: []model.PathfindingRelationship{
            {Symbol: "MemberOf"},
        },
    }
    
    // Create temp directory
    tmpDir := t.TempDir()
    
    // Execute
    err := GenerateExtensionSQLActiveDirectory(tmpDir, adSchema)
    
    // Assert
    require.NoError(t, err)
    
    // Verify file was created
    sqlPath := filepath.Join(tmpDir, "ad.sql")
    require.FileExists(t, sqlPath)
    
    // Read and verify content
    content, err := os.ReadFile(sqlPath)
    require.NoError(t, err)
    
    // Assertions on content...
}```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and content assertions could be for anything-- node icon/color mapping, traversability flag, or just high level structure. you can read the sql file into a variable for assertions like:

content, _ := os.ReadFile(filepath.Join(tmpDir, "ad.sql"))
sql := string(content)
    
    // Verify SQL structure
    assert.Contains(t, sql, "DELETE FROM schema_extensions WHERE name = 'AD'")
    assert.Contains(t, sql, "INSERT INTO schema_extensions")
    assert.Contains(t, sql, "INSERT INTO schema_node_kinds")
    assert.Contains(t, sql, "INSERT INTO schema_edge_kinds")
    assert.Contains(t, sql, "blah blah ")

@LawsonWillard
Copy link
Contributor

Just a heads up, with BED-7067 incoming, we'll need to insert the node and edge kinds into the DAWGS kinds table before inserting them into their respective schema tables

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants