Skip to content

Conversation

pditommaso
Copy link
Member

@pditommaso pditommaso commented Sep 4, 2025

Summary

This PR implements a new feature that allows direct execution of individual Nextflow processes without requiring explicit workflow definitions. Users can now run processes directly from the command line with automatic parameter mapping.

Key Features

  • Automatic Single Process Execution: Scripts with one process run automatically
  • Automatic Multi-Process Execution: Scripts with multiple processes run the first process automatically
  • Command-Line Parameter Mapping: Automatic mapping of --param value arguments to process inputs
  • Full Input Type Support: Supports val, path, env, tuple, and each input types

Usage Examples

Single Process Script

Create a script with one process (auto-execution):

#!/usr/bin/env nextflow

process analyzeData {
    debug true
    
    input:
    val sampleName
    path inputFile
    
    script:
    """
    echo "Analyzing sample: ${sampleName}"
    echo "Input file: ${inputFile}"
    wc -l ${inputFile}
    """
}

Execute with:

nextflow run analyze.nf --sampleName "sample1" --inputFile "data.txt"

Multi-Process Script (Auto-runs First Process)

Scripts with multiple processes automatically execute the first process defined:

#!/usr/bin/env nextflow

process preprocessData {
    debug true
    input: 
    path inputFile
    val quality
    script: """
    echo "Preprocessing ${inputFile} with quality ${quality}"
    head -n 5 "${inputFile}"
    """
}

process analyzeResults {
    debug true
    input: 
    val experimentId
    path resultsFile
    script: """
    echo "Analyzing experiment ${experimentId}"
    tail -n 3 "${resultsFile}"
    """
}

Execute with:

# Automatically runs preprocessData (first process)
nextflow run multi.nf --inputFile "data.txt" --quality "high"

Comprehensive Input Types Example

#!/usr/bin/env nextflow

process complexProcess {
    debug true
    
    input:
    val sampleId
    path inputFile
    env DATABASE_URL
    tuple val(batchId), val(priority) 
    each item
    
    script:
    """
    echo "Sample: ${sampleId}"
    echo "File: ${inputFile}"  
    echo "Database: ${DATABASE_URL}"
    echo "Batch: ${batchId}, Priority: ${priority}"
    echo "Item: ${item}"
    """
}

Execute with:

nextflow run complex.nf   --sampleId "S001"   --inputFile "/path/to/data.txt"   --DATABASE_URL "postgres://localhost:5432/db"   --batchId "batch1"   --priority "high"   --item "item1,item2,item3"

Parameter Type Handling

Input Type Example Usage Channel Creation Notes
val name --name "value" Channel.of(paramValue) Direct value
path file --file "input.txt" Channel.of(Paths.get(paramValue)) File path conversion
env var --var "VALUE" Channel.of(paramValue) Environment variable
tuple items --items "a,b,c" Channel.of([paramValue]) Collection wrapping
each item --item "a,b,c" Channel.fromIterable(split(',')) Comma-separated iteration

Error Handling

Missing Required Parameters

nextflow run script.nf
# Error: Missing required parameter: --sampleName

No Processes Found

nextflow run empty.nf
# Error: No processes found for auto-execution

Testing Instructions

Test Single Process Auto-Execution

Create test_single.nf:

#\!/usr/bin/env nextflow

process testSingle {
    debug true
    
    input:
    val message
    
    script:
    """
    echo "Single process test: ${message}"
    """
}

Run: nextflow run test_single.nf --message "Hello World"

Test Multi-Process Auto-Execution (First Process)

Create test_multi.nf:

#\!/usr/bin/env nextflow

process processA {
    debug true
    input: val name
    script: "echo 'A: ${name}'"
}

process processB {
    debug true
    input: path file
    script: "echo 'B: processing file ${file}'"
}

Run:

# Automatically runs processA (first process)
nextflow run test_multi.nf --name "test"

Test Different Input Types

Create test_types.nf:

#!/usr/bin/env nextflow

process testTypes {
    debug true
    
    input:
    val stringVal
    path inputFile
    each item
    
    script:
    """
    echo "String: ${stringVal}"
    echo "File: ${inputFile}"
    echo "Item: ${item}"
    """
}

Run:

echo "test content" > input.txt
nextflow run test_types.nf   --stringVal "hello"   --inputFile "input.txt"   --item "a,b,c"

Test running a nf-core module

nextflow run nf-core/modules  -main-script modules/nf-core/muscle/main.nf --meta.id foo --fasta tests/data/prot.fa

nextflow.config

wave.enabled = true
wave.strategy = 'conda'
docker.enabled = true

Implementation Details

Architecture

The implementation extends BaseScript.groovy with a comprehensive process entry execution system:

  1. Auto-Execution Detection: When no workflows are defined, automatically check for processes
  2. Process Selection: Single process scripts run that process; multi-process scripts run the first process
  3. Parameter Extraction: Parse process bodies to extract input parameter definitions using internal Nextflow method interception
  4. Channel Mapping: Convert command-line arguments to appropriate Nextflow channels
  5. Workflow Generation: Create synthetic workflows for process execution

Key Components

  • ProcessEntryHandler: Main class handling process execution logic
  • createAutoProcessWorkflow(): Auto-execution for process-only scripts
  • parseProcessInputStructures(): Extract input definitions by intercepting _in_val, _in_path, etc. calls
  • getProcessInputArguments(): Map command-line parameters to process inputs
  • parseComplexParameters(): Handle dot notation for complex parameters

BaseScript Integration

Enhanced BaseScript.groovy to detect standalone processes:

// Check if we have standalone processes that can be executed automatically
if( meta.hasExecutableProcesses() ) {
    // Create a workflow to execute the process (single process or first of multiple)
    def handler = new ProcessEntryHandler(this, session, meta)
    entryFlow = handler.createAutoProcessWorkflow()
}

ScriptMeta Extensions

  • hasExecutableProcesses(): Detect processes without explicit workflows
  • getLocalProcessNames(): Get list of defined processes

Backward Compatibility

This feature is fully backward compatible:

  • Existing workflows continue to work unchanged
  • Traditional -entry workflowName syntax remains supported
  • No changes to existing Nextflow DSL or behavior
  • Only activates for scripts without explicit workflow definitions

Performance Considerations

  • Minimal overhead for traditional workflow execution
  • Process body parsing only occurs when needed
  • Efficient parameter mapping and channel creation
  • Lazy evaluation prevents unnecessary work

🤖 Generated with Claude Code

Copy link

netlify bot commented Sep 4, 2025

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 10ab2f3
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/68c2ff4bc31ea3000843acf6

Add support for executing Nextflow processes directly without explicit workflow definitions.

Key Features:
- Single process scripts run automatically: `nextflow run script.nf --param value`
- Multi-process scripts use entry selection: `nextflow run script.nf -entry process:name --param value`
- Automatic command-line parameter mapping to process input channels
- Support for all standard input types: val, path, env, tuple, each
- Comprehensive error handling with helpful suggestions

Implementation:
- Enhanced BaseScript with process entry workflow generation
- Added parameter mapping pipeline with input definition extraction
- Created specialized delegates for parsing compiled process bodies
- Added ScriptMeta methods for single/multi-process detection
- Comprehensive documentation and test coverage

This feature bridges the gap between command-line tools and workflow orchestration,
making Nextflow processes more accessible for direct execution scenarios.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
@pditommaso pditommaso force-pushed the feature/process-entry-execution branch from ff372df to fb7e788 Compare September 4, 2025 20:15
@pditommaso pditommaso marked this pull request as draft September 4, 2025 20:15
pditommaso and others added 2 commits September 4, 2025 22:44
Signed-off-by: Paolo Di Tommaso <[email protected]>
- Rename ProcessEntryHelper.groovy to ProcessEntryHandler.groovy for clearer naming
- Update all class references in BaseScript.groovy to use ProcessEntryHandler
- Clean separation of process entry execution feature from main BaseScript class
- All functionality preserved: single process auto-execution and multi-process entry selection

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
christopher-hakkaart and others added 5 commits September 6, 2025 17:15
---------

Signed-off-by: Rob Syme <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Co-authored-by: Ben Sherman <[email protected]>
Add defensive null checking for sender thread and workflowId to prevent NullPointerException when onFlowComplete is called after initialization failures in onFlowCreate or onFlowBegin methods.

Signed-off-by: Rob Syme <[email protected]>
Co-authored-by: Ben Sherman <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
@pditommaso pditommaso marked this pull request as ready for review September 8, 2025 15:20
@pditommaso
Copy link
Member Author

I've cleaned up the implementation, added some unit tests and above all the params mapping into process inputs

@pditommaso
Copy link
Member Author

@claude review

Copy link

claude bot commented Sep 8, 2025

Claude finished @pditommaso's task —— View job


Code Review for Process Entry Execution Implementation

I've completed a comprehensive review of this PR implementing direct process execution without explicit workflows. Here's my detailed analysis:

Todo List:

  • Review ProcessEntryHandler.groovy implementation
  • Analyze BaseScript.groovy changes
  • Review ScriptMeta.groovy modifications
  • Examine test coverage and quality
  • Check for potential issues and improvements
  • Provide comprehensive feedback

Overall Assessment

This is a well-implemented feature that significantly enhances Nextflow's usability for simple process execution scenarios. The implementation is clean, follows existing patterns, and includes comprehensive test coverage.

✅ Strengths

Architecture & Design

  • Clean separation of concerns: ProcessEntryHandler handles all process entry logic separately from BaseScript
  • Minimal invasive changes: Only 8 lines added to BaseScript.groovy with proper integration points
  • Backward compatibility: Existing workflows continue to work unchanged
  • Smart auto-execution logic: Single process executes automatically, multi-process executes first process

Implementation Quality

  • Robust parameter parsing: Handles complex dot notation (--meta.id=value) correctly in parseComplexParameters()
  • Type-safe conversions: Proper handling of different input types (val, path, env, tuple, each) in getValueForInput()
  • Comprehensive error handling: Clear error messages for missing parameters
  • Extensive logging: Good debug logging throughout for troubleshooting

Test Coverage

  • Unit tests: ProcessEntryHandlerTest covers parameter mapping extensively
  • Integration tests: ScriptProcessRunTest and practical .nf test files
  • Edge cases: Tests cover simple parameters, dot notation, tuples, and error conditions

🔍 Areas for Improvement

1. Process Body Parsing Complexity (ProcessEntryHandler.groovy:155-310)

The parseProcessInputStructures() method is quite complex and relies on intercepting internal Nextflow method calls. Consider:

// Current approach intercepts _in_val, _in_path, etc.
def _in_val(tokenVar) { 
    def varName = extractVariableName(tokenVar)
    if( varName ) inputStructures.add([type: 'val', name: varName]) 
}

Suggestion: As @bentsherman mentioned in review comments, leveraging the ProcessDef's InputsList/OutputsList when available would simplify this significantly and make it more maintainable.

2. Token Variable Name Extraction (ProcessEntryHandler.groovy:181-252)

The variable name extraction logic is fragile with multiple fallback strategies:

def extractVariableNameFromToken(token) {
    try {
        if( token.hasProperty('variable') && token.variable?.hasProperty('name') ) {
            return token.variable.name.toString()
        }
        // Multiple fallbacks...
    } catch( Exception e ) {
        log.debug "Error extracting variable name from ${token}: ${e.message}"
        return null
    }
}

Risk: This could break with internal AST changes. Consider adding integration tests that verify this parsing works correctly.

3. Error Handling in Process Execution (ProcessEntryHandler.groovy:298-307)

try {
    bodyClone.call()
} catch (Exception e) {
    // Ignore exceptions during parsing - we only want to capture input structures
}

Concern: Silently ignoring all exceptions during parsing could mask important errors. Consider logging warnings for unexpected exceptions.

4. Parameter Type Validation

The current implementation doesn't validate parameter types beyond basic conversion:

case 'path':
case 'file':
    if( paramValue instanceof String ) {
        return Nextflow.file(paramValue)
    }
    return paramValue

Enhancement: Could validate file existence, path accessibility, or other constraints before execution.

🚨 Potential Issues

1. Security Considerations

  • File path parameters are converted directly without validation
  • No checks for path traversal or access to unauthorized directories
  • Consider adding basic security validation for file paths

2. Memory Usage

  • The process body is cloned and executed for parsing (bodyClone = processDef.rawBody.clone())
  • For large process definitions, this could create unnecessary memory overhead

3. Thread Safety

  • The delegate object creation in parseProcessInputStructures() creates new objects each time
  • Should be fine for single-threaded script parsing, but worth noting

📝 Minor Improvements

1. Code Documentation

  • Add more specific examples in ProcessEntryHandler.groovy javadoc
  • Document the token extraction limitations and fallback strategies

2. Logging Consistency

// Current: Mix of log levels
log.debug "Getting input arguments for process: ${processDef.name}"
log.warn "Could not parse tuple element: ${item} of class ${className}"

Suggestion: Consider consistent logging levels and more structured log messages.

3. Test Coverage Gaps

  • Missing tests for edge cases like empty process definitions
  • No tests for the auto-execution selection logic (first process selection)
  • Could add negative tests for malformed input structures

🎯 Integration with Existing Features

BaseScript Integration (BaseScript.groovy:193-198)

The integration is clean and follows existing patterns:

if( meta.hasExecutableProcesses() ) {
    def handler = new ProcessEntryHandler(this, session, meta)
    entryFlow = handler.createAutoProcessWorkflow()
}

ScriptMeta Extensions (ScriptMeta.groovy:310-320)

The hasExecutableProcesses() method correctly identifies eligible scripts:

  • Excludes modules (isModule() check)
  • Requires processes but no workflows
  • Clean and logical implementation

🏆 Recommendation

APPROVE - This is a high-quality implementation that adds significant value to Nextflow. The code is well-structured, thoroughly tested, and follows existing patterns. While there are areas for improvement (particularly around process body parsing), the current implementation is solid and the identified issues are not blockers.

Priority Actions:

  1. High: Consider the ProcessDef InputsList/OutputsList approach mentioned by @bentsherman
  2. Medium: Add more robust error handling in process body parsing
  3. Low: Enhance parameter validation and security checks

The feature successfully achieves its goals and maintains backward compatibility while providing a much-improved user experience for simple process execution scenarios.


@pditommaso
Copy link
Member Author

I'm still not 100% sure about this capability, however I'd like to try in on the road. I'm going to merge as an undocumented feature, then we can decided to revert to improve as suggested by #6391

@pditommaso pditommaso merged commit 6d5a0bd into master Sep 11, 2025
22 checks passed
@pditommaso pditommaso deleted the feature/process-entry-execution branch September 11, 2025 18:48
@bentsherman
Copy link
Member

I'm not sure why you need to merge it to master to try it out. You can try it out just fine from a PR build

@pditommaso
Copy link
Member Author

"on the road" I mean testing with other tools and (internal) users feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants