Skip to content

Conversation

Copilot
Copy link

@Copilot Copilot AI commented Jun 16, 2025

Summary

Adds a new EnableHashedShortening configuration option that prevents slug collisions when truncating long slugs to a maximum length. Instead of simple truncation, it appends a hash-based postfix to ensure uniqueness.

Problem

The existing MaximumLength feature only performs simple truncation, which can cause different inputs to produce identical slugs:

var helper = new SlugHelper(new SlugHelperConfiguration { MaximumLength = 12 });

// All these produce the same result: "the-very-lon"
helper.GenerateSlug("The very long name liga");
helper.GenerateSlug("The very long name liga (W)");
helper.GenerateSlug("The very long name liga (M)");

Solution

The new EnableHashedShortening option generates unique hash postfixes:

var helper = new SlugHelper(new SlugHelperConfiguration 
{ 
    MaximumLength = 12,
    EnableHashedShortening = true 
});

// Now produces unique results:
helper.GenerateSlug("The very long name liga");     // "the-very-54"
helper.GenerateSlug("The very long name liga (W)"); // "the-very-a2"
helper.GenerateSlug("The very long name liga (M)"); // "the-very-0a"

Implementation Details

  • Backward Compatible: Feature is disabled by default (EnableHashedShortening = false)
  • Hash Format: Uses 2-character lowercase hex hash (e.g., -54, -a2)
  • Fallback: If MaximumLength is too small for hash postfix, falls back to simple truncation
  • Hash Source: Generated from the full slug before truncation to ensure consistency
  • No Dependencies: Uses built-in GetHashCode() to avoid external dependencies

Changes

  • Added EnableHashedShortening property to SlugHelperConfiguration
  • Enhanced truncation logic in SlugHelper.GenerateSlug()
  • Added GenerateSlugHash() private method for hash generation
  • Added 8 comprehensive unit tests covering all scenarios
  • Updated README.md with documentation and usage examples

Testing

All existing tests continue to pass, plus new tests verify:

  • ✅ Unique hashes for different inputs
  • ✅ Proper hash format (2-char hex)
  • ✅ Fallback behavior for small MaximumLength
  • ✅ No impact when truncation isn't needed
  • ✅ Feature disabled by default

Fixes #158.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@Copilot Copilot AI changed the title [WIP] Slug shortening Add EnableHashedShortening option for unique slug truncation Jun 16, 2025
@Copilot Copilot AI requested a review from ctolkien June 16, 2025 04:47
Copilot finished work on behalf of ctolkien June 16, 2025 04:47
@ctolkien ctolkien mentioned this pull request Jun 16, 2025
@ctolkien
Copy link
Owner

@copilot can you add some more tests combining the new hashing option, with other options to ensure the output remains correct.

Copy link
Author

Copilot AI commented Jun 16, 2025

@copilot can you add some more tests combining the new hashing option, with other options to ensure the output remains correct.

I've added 12 comprehensive tests that combine EnableHashedShortening with other configuration options to ensure correct behavior:

✅ ForceLowerCase combinations (case preservation with hashing)
✅ StringReplacements interactions (replacements applied before hashing)
✅ CollapseDashes combinations (dash handling with hash postfix)
✅ TrimWhitespace interactions (whitespace handling affects hashing)
✅ DeniedCharactersRegex combinations (custom filtering before hashing)
✅ SupportNonAsciiLanguages with hashing (Unicode text handling)
✅ AllowedCharacters modifications (custom character sets)
✅ Complex multi-option combinations
✅ Edge cases and parametrized tests

All 98 tests pass, confirming the hashing feature works correctly with all existing options. Commit: 5695a2b

Copilot finished work on behalf of ctolkien June 16, 2025 06:27
@ctolkien ctolkien marked this pull request as ready for review July 11, 2025 06:23
@ctolkien
Copy link
Owner

@copilot what hashing mechanism does dotnet use for a string object when you call GetHashCode()? Should we be worried about collisions and /or should we use a different hash such as Murmur?

Copy link
Author

Copilot AI commented Jul 11, 2025

@copilot what hashing mechanism does dotnet use for a string object when you call GetHashCode()? Should we be worried about collisions and /or should we use a different hash such as Murmur?

.NET's string.GetHashCode() uses a non-cryptographic hash function that has several limitations for our use case:

Current Implementation Issues:

  1. Platform/Version Variance: Hash values can differ between .NET versions, x86/x64, and potentially between application runs
  2. Limited Output Space: Using only 8 bits (256 values) creates high collision probability - roughly 50% chance after ~19 different inputs
  3. Hash Quality: Designed for hash table distribution, not uniqueness guarantees

Collision Risk Analysis:
With 256 possible values and typical slug usage patterns, collisions are quite likely. For example, if you have 20 different long titles that need truncation, there's about a 50% chance two will have the same hash.

Recommendation: Use a more robust approach

I suggest we implement a deterministic hash using a simple but effective algorithm like FNV-1a or a custom implementation that:

  • Uses more bits (4-6 characters instead of 2)
  • Provides consistent results across platforms/versions
  • Has better collision resistance

Would you like me to implement an improved hashing mechanism that addresses these concerns while maintaining backward compatibility?

Copilot finished work on behalf of ctolkien July 11, 2025 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slug shortening
2 participants