Skip to content

Implement 64-bit trace ID system with double-buffered storage and liveness tracking #262

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

jbachorik
Copy link
Collaborator

@jbachorik jbachorik commented Aug 18, 2025

What does this PR do?:

This PR implements a liveness-aware double-buffered call trace storage system with several key improvements:

  1. Liveness-aware trace management with selective preservation across JFR dumps
  2. Contention handling with dropped trace visibility in JFR output
  3. Double-buffered storage with active/standby hash table instances
  4. 64-bit trace ID system with instance-based collision avoidance
  5. Modular hash table architecture with dedicated CallTraceHashTable class

Motivation:

The changes address critical issues in call trace management:

  • Random CI test failures due to lock contention causing dropped samples without visibility
  • Liveness tracking requirements for preserving traces of live objects across garbage collection
  • Trace ID stability needed for consistent liveness tracking across storage swaps
  • Performance and modularity improvements through specialized hash table implementation

Additional Notes:

Key Features:

Liveness-Aware Storage:

  • Callback-based liveness checker registration system
  • Selective trace preservation during storage transitions
  • Coordinated trace collection through processTraces() method
  • Support for multiple concurrent liveness checkers

Contention Handling:

  • Special dropped trace with reserved ID (1ULL) for contention visibility
  • <dropped due to contention> shown in JFR stack traces instead of null entries
  • Platform-specific ASGCT_CallFrame alignment using LP64_ONLY macro
  • BCI_ERROR routing for proper native method resolution

Double-Buffered Architecture:

  • Active/standby hash table pattern for lock-free JFR operations
  • Instance-based trace IDs: (instance_id << 32) | slot preventing collisions
  • Atomic storage swapping with minimal profiling overhead
  • Thread-safe instance ID generation across storage transitions

Hash Table Improvements:

  • Extracted dedicated CallTraceHashTable class (441 lines)
  • Concurrent table expansion with proper synchronization
  • Overflow trace handling for hash table limits
  • Lock-free put operations with retry-based contention handling

Implementation Details:

Core Storage Refactoring:

  • CallTraceStorage reduced from 265→142 lines through hash table extraction
  • Dual active/standby storage instances with atomic swapping
  • Liveness preservation system integrated with JFR dump cycles

64-bit Trace ID Migration:

  • Updated all profiling interfaces: recordJVMTISample(), recordSample(), recordDeferredSample()
  • Modified LivenessTracker for 64-bit trace ID handling
  • JFR integration updated for 64-bit trace ID constant pool support
  • Instance-based ID generation preventing cross-storage collisions

Platform Compatibility:

  • COMMA macro factored to arch_dd.h for consistent designated initializer syntax
  • LP64_ONLY macro usage for proper 64-bit platform struct alignment
  • Cross-platform ASGCT_CallFrame structure handling

New Files:

  • callTraceHashTable.{h,cpp} - Dedicated hash table implementation (441 lines)
  • test_callTraceStorage.cpp - Comprehensive unit tests with liveness scenarios (387 lines)
  • LivenessTrackingTest.java - Java integration test for end-to-end validation (246 lines)
  • ContendedCallTraceStorageTest.java - Contention measurement and validation test (249 lines)

Modified Files:

  • Core profiler: profiler.{h,cpp}, objectSampler.cpp, wallClock.{h,cpp} - 64-bit trace ID adoption
  • Storage: callTraceStorage.{h,cpp} - major refactoring with liveness integration
  • JFR: flightRecorder.{h,cpp} - 64-bit trace ID support and dropped trace handling
  • Liveness: livenessTracker.{h,cpp} - 64-bit trace ID migration
  • Architecture: arch_dd.h - COMMA macro consolidation

How to test the change?:

# Run comprehensive test suite
./gradlew testDebug

# C++ unit tests for storage and liveness
./gradlew gtestDebug  

# Build verification across configurations
./gradlew buildDebug buildRelease

# Code formatting
./gradlew spotlessApply

The implementation includes extensive test coverage:

  • 9 C++ unit tests for CallTraceStorage liveness scenarios
  • Java integration tests for end-to-end liveness tracking
  • Contention measurement and validation tests
  • Platform-specific compatibility tests

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.
  • JIRA: PROF-12316

Summary: +1689 lines, -447 lines (net +1239 lines)

This implementation provides a robust foundation for liveness-aware profiling with clear visibility into contention issues while maintaining high performance through lock-free operations and efficient storage management.

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 18, 2025

CppCheck Report

Errors (2)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 19, 2025

CppCheck Report

Errors (2)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 19, 2025

CppCheck Report

Errors (2)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 19, 2025

CppCheck Report

Errors (2)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 19, 2025

CppCheck Report

Errors (2)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 19, 2025

CppCheck Report

Errors (2)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 19, 2025

CppCheck Report

Errors (2)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 19, 2025

CppCheck Report

Errors (2)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 19, 2025

CppCheck Report

Errors (2)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 19, 2025

CppCheck Report

Errors (2)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 19, 2025

CppCheck Report

Errors (2)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 20, 2025

CppCheck Report

Errors (2)

Warnings (8)

Style Violations (299)

1 similar comment
@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 20, 2025

CppCheck Report

Errors (2)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 20, 2025

CppCheck Report

Errors (1)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 21, 2025

CppCheck Report

Errors (3)

Warnings (8)

Style Violations (299)

1 similar comment
@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 21, 2025

CppCheck Report

Errors (3)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 21, 2025

CppCheck Report

Errors (3)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 21, 2025

CppCheck Report

Errors (3)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 21, 2025

CppCheck Report

Errors (3)

Warnings (8)

Style Violations (299)

1 similar comment
@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 21, 2025

CppCheck Report

Errors (3)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 21, 2025

CppCheck Report

Errors (3)

Warnings (8)

Style Violations (299)

@jbachorik jbachorik marked this pull request as ready for review August 21, 2025 17:24
@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 21, 2025

CppCheck Report

Errors (3)

Warnings (8)

Style Violations (299)

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a comprehensive 64-bit trace ID system with double-buffered storage and liveness tracking for call traces. The changes enhance call trace management by adding liveness-aware preservation across JFR dumps, contention handling with visibility into dropped traces, and a modular hash table architecture with instance-based collision avoidance.

  • Adds liveness-aware double-buffered storage with selective trace preservation
  • Implements 64-bit trace ID system with instance-based collision avoidance
  • Introduces contention handling with dropped trace visibility in JFR output
  • Extracts dedicated CallTraceHashTable class for improved modularity

Reviewed Changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
LivenessTrackingTest.java Comprehensive Java integration test for end-to-end liveness tracking validation
TagContextTest.java Enhanced test with dropped sample tracking and counter validation improvements
ContendedCallTraceStorageTest.java New test for measuring and validating contention in CallTraceStorage operations
test_callTraceStorage.cpp Extensive C++ unit tests covering liveness scenarios and concurrent operations
wallClock.h/cpp Updated copyright headers and 64-bit trace ID migration
thread.h Updated trace ID fields and methods to use 64-bit values
profiler.h/cpp Major refactoring for 64-bit trace IDs and liveness checker integration
objectSampler.cpp Updated copyright and 64-bit trace ID adoption
livenessTracker.h/cpp Enhanced with 64-bit trace IDs and self-registration with profiler
flightRecorder.h/cpp Updated for 64-bit trace ID support and improved trace processing
counters.h Added new counter for tracking dropped traces
callTraceStorage.h/cpp Major refactoring with double-buffering and liveness integration
callTraceHashTable.h/cpp New dedicated hash table implementation extracted from storage
arch_dd.h Added COMMA macro consolidation
CLAUDE.md New documentation file with project guidance and architecture details
Comments suppressed due to low confidence (1)

ddprof-lib/src/main/cpp/callTraceStorage.cpp:125

  • Copying the entire unordered_set could be expensive for large sets. Consider using std::move or passing the set by reference to avoid the copy overhead, especially since this is in the critical processTraces path.
        preserve_set = _preserve_set; // Copy the set for lock-free processing

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 22, 2025

CppCheck Report

Errors (3)

Warnings (8)

Style Violations (299)

2 similar comments
@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 22, 2025

CppCheck Report

Errors (3)

Warnings (8)

Style Violations (299)

@dd-octo-sts
Copy link

dd-octo-sts bot commented Aug 22, 2025

CppCheck Report

Errors (3)

Warnings (8)

Style Violations (299)

jbachorik and others added 6 commits August 22, 2025 15:59
Major architectural changes:
- Replace monolithic CallTraceStorage with double-buffered hash table design
- Add CallTraceHashTable with lock-free concurrent access and instance-based trace IDs
- Implement liveness tracking system to preserve active traces across JFR dumps
- Add dropped trace handling for lock contention with proper JFR integration

Key features:
- 64-bit trace IDs combining instance ID and slot for collision avoidance
- Split-lock strategy minimizing exclusive lock time during trace collection
- Platform-specific ASGCT_CallFrame alignment using LP64_ONLY macro
- Comprehensive test coverage including contention and liveness scenarios

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant