Skip to content

Conversation

@skoren
Copy link
Member

@skoren skoren commented Nov 21, 2025

No description provided.

@skoren skoren requested a review from Copilot November 21, 2025 14:56
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR merges consensus generation changes from a development branch into the master branch. The changes primarily focus on improving consensus generation performance, adding new configuration options, and refactoring logging infrastructure.

Key changes:

  • Consensus generation now supports configurable iterations, coverage limits, and quick mode for initial runs
  • Refactored logging system by removing custom logger wrapper and adopting standard Python logging
  • Updated command-line interface to support consensus-related parameters and removed deprecated options

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/verkko.sh Added consensus configuration parameters and modified BAM generation defaults
src/scripts/scaffolding/scaffold_graph.py Removed custom logger dependency, switched to standard logging module
src/scripts/scaffolding/match_graph.py Removed custom logger wrapper, migrated to Python logging
src/scripts/scaffolding/logger_wrap.py Deleted entire custom logger wrapper module
src/scripts/merge_layouts.py Enhanced to handle scfmap files and filter dropped contigs
src/scripts/logging_utils.py New unified logging utility module
src/scripts/launch_scaffolding.py Updated to use new logging infrastructure
src/scripts/launch_phasing.py Added logging setup and updated function signature
src/scripts/cluster.py Major refactoring: removed deprecated functions, improved phasing logic, migrated to standard logging
src/scripts/get_utig1_from_utig4.py New script for transforming unitig paths
src/scripts/get_layout_from_mbg.py Added copy count tracking for reads
src/scripts/fasta_util.py Enhanced readScfMap to return both scaffold map and names
src/scripts/check_layout_gaps.py Fixed string formatting bug
src/Snakefiles/7-generateConsensus.sm Added configurable consensus parameters and quick mode support
src/Snakefiles/7-combineConsensus.sm Enhanced BAM merging with ulimit handling and conditional generation
src/Snakefiles/6-layoutContigs.sm Updated to handle dropped contigs output
src/Snakefiles/2-processGraph.sm Modified gap insertion pass parameters
src/main.mk Updated file installation paths for new logging module
Comments suppressed due to low confidence (1)

src/Snakefiles/c2-findOverlaps.sm:1

  • [nitpick] The atomic rename pattern is good practice, but consider cleaning up the .WORKING files in case of failure. Add a trap or cleanup step to remove temporary files if the command fails.
#############################################################################

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

real_idy = self.parseIDY(idy)
#Constant not depending on genome, intervals too similar for hi-c alignment to use
if real_idy > 0.995:
if real_idy > self.FILTERED_IDY_CUTOFF:
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The constant FILTERED_IDY_CUTOFF (0.995) is hardcoded as a class attribute but was previously embedded in code. While this is an improvement, consider making this configurable via a parameter if different identity cutoffs might be needed for different genomes or use cases.

Copilot uses AI. Check for mistakes.
if len(sys.argv) != 7:
print(f'Usage: {sys.argv[0]} graph.gfa homologous_nodes.matches hic_byread output_dir, no_rdna, uneven_depth')
if len(sys.argv) != 8:
print(f'Usage: {sys.argv[0]} graph.gfa hpc.mashmap nonhpc.mashmap hic_byread output_dir no_rdna uneven_depth')
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The argument count validation expects 8 arguments but the usage message lists 7 parameters (plus the script name would be 8 total). This is correct but could be clearer by explicitly stating the script name is sys.argv[0].

Suggested change
print(f'Usage: {sys.argv[0]} graph.gfa hpc.mashmap nonhpc.mashmap hic_byread output_dir no_rdna uneven_depth')
print(f'Usage: python {os.path.basename(sys.argv[0])} graph.gfa hpc.mashmap nonhpc.mashmap hic_byread output_dir no_rdna uneven_depth')

Copilot uses AI. Check for mistakes.
echo " Max open files limited to \$bef, no increase possible."
fi

mem_per_core=\`(expr \( {resources.mem_gb} \* 70 \) / \( 100 \* {threads} \) | awk '{{if (\$1 < 1) print "1G"; else print \$1"G"}}') || true\`
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The memory calculation uses magic number 70 (70% of memory). Consider defining this as a variable with a descriptive name like MEM_USAGE_PERCENT=70 to improve clarity and maintainability.

Copilot uses AI. Check for mistakes.
@skoren skoren merged commit 018d9e9 into master Nov 21, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants