Merge in consensus changes to master #341

skoren · 2025-11-21T14:56:43Z

No description provided.

update combine and consensus times due to bam generation enable bam output by default and allow consensus options properly process and drop contigs in layout when they have only noisy long reads no accurate read support when merging layouts output layout read use counts used in bam MAPQ and consensus

…ate early but not reported as failed by grid engine

when running initial consensus (for Hi-C phasing), skip bam output/iterations to save time

Copilot

Pull request overview

This PR merges consensus generation changes from a development branch into the master branch. The changes primarily focus on improving consensus generation performance, adding new configuration options, and refactoring logging infrastructure.

Key changes:

Consensus generation now supports configurable iterations, coverage limits, and quick mode for initial runs
Refactored logging system by removing custom logger wrapper and adopting standard Python logging
Updated command-line interface to support consensus-related parameters and removed deprecated options

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/verkko.sh	Added consensus configuration parameters and modified BAM generation defaults
src/scripts/scaffolding/scaffold_graph.py	Removed custom logger dependency, switched to standard logging module
src/scripts/scaffolding/match_graph.py	Removed custom logger wrapper, migrated to Python logging
src/scripts/scaffolding/logger_wrap.py	Deleted entire custom logger wrapper module
src/scripts/merge_layouts.py	Enhanced to handle scfmap files and filter dropped contigs
src/scripts/logging_utils.py	New unified logging utility module
src/scripts/launch_scaffolding.py	Updated to use new logging infrastructure
src/scripts/launch_phasing.py	Added logging setup and updated function signature
src/scripts/cluster.py	Major refactoring: removed deprecated functions, improved phasing logic, migrated to standard logging
src/scripts/get_utig1_from_utig4.py	New script for transforming unitig paths
src/scripts/get_layout_from_mbg.py	Added copy count tracking for reads
src/scripts/fasta_util.py	Enhanced readScfMap to return both scaffold map and names
src/scripts/check_layout_gaps.py	Fixed string formatting bug
src/Snakefiles/7-generateConsensus.sm	Added configurable consensus parameters and quick mode support
src/Snakefiles/7-combineConsensus.sm	Enhanced BAM merging with ulimit handling and conditional generation
src/Snakefiles/6-layoutContigs.sm	Updated to handle dropped contigs output
src/Snakefiles/2-processGraph.sm	Modified gap insertion pass parameters
src/main.mk	Updated file installation paths for new logging module

Comments suppressed due to low confidence (1)

src/Snakefiles/c2-findOverlaps.sm:1

[nitpick] The atomic rename pattern is good practice, but consider cleaning up the .WORKING files in case of failure. Add a trap or cleanup step to remove temporary files if the command fails.

#############################################################################

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/verkko.sh

Copilot · 2025-11-21T14:58:31Z

src/scripts/scaffolding/match_graph.py

        real_idy = self.parseIDY(idy)
        #Constant not depending on genome, intervals too similar for hi-c alignment to use
-        if real_idy > 0.995:
+        if real_idy > self.FILTERED_IDY_CUTOFF:


[nitpick] The constant FILTERED_IDY_CUTOFF (0.995) is hardcoded as a class attribute but was previously embedded in code. While this is an improvement, consider making this configurable via a parameter if different identity cutoffs might be needed for different genomes or use cases.

src/scripts/logging_utils.py

Copilot · 2025-11-21T14:58:31Z

src/scripts/cluster.py

-    if len(sys.argv) != 7:
-        print(f'Usage: {sys.argv[0]} graph.gfa homologous_nodes.matches hic_byread output_dir, no_rdna, uneven_depth')
+    if len(sys.argv) != 8:
+        print(f'Usage: {sys.argv[0]} graph.gfa hpc.mashmap nonhpc.mashmap hic_byread output_dir no_rdna uneven_depth')


[nitpick] The argument count validation expects 8 arguments but the usage message lists 7 parameters (plus the script name would be 8 total). This is correct but could be clearer by explicitly stating the script name is sys.argv[0].

Suggested change

print(f'Usage: {sys.argv[0]} graph.gfa hpc.mashmap nonhpc.mashmap hic_byread output_dir no_rdna uneven_depth')

print(f'Usage: python {os.path.basename(sys.argv[0])} graph.gfa hpc.mashmap nonhpc.mashmap hic_byread output_dir no_rdna uneven_depth')

src/Snakefiles/functions.sm

Copilot · 2025-11-21T14:58:32Z

src/Snakefiles/7-combineConsensus.sm

+      echo "  Max open files limited to \$bef, no increase possible."
+   fi
+
+   mem_per_core=\`(expr \( {resources.mem_gb} \* 70 \) / \( 100 \* {threads} \) | awk '{{if (\$1 < 1) print "1G"; else print \$1"G"}}') || true\`


[nitpick] The memory calculation uses magic number 70 (70% of memory). Consider defining this as a variable with a descriptive name like MEM_USAGE_PERCENT=70 to improve clarity and maintainability.

Dmitry-Antipov and others added 19 commits August 15, 2025 15:00

Phasing reworekd

ccc9170

Merge remote-tracking branch 'origin/chrY_fix'

159186c

Cleanup

e074229

update MBG version

1855544

move index only on successfull completion, safety in case jobs termin…

958f22e

…ate early but not reported as failed by grid engine

update gap filling criteria, disallow single read gap closing

5fe146a

add bamtools, pysam, samtools, seqtk to dependencies and fix CI test

fb5bd61

add utility script to convert from utig4- paths to utig1- paths

a795867

utig4 tranlation handle overlaps containing full utig1- nodes

5fbe6f4

fix for empty partitions

43aefba

fix handling of offset for repeated nodes

221f7a9

changed logic for filtering hic reads from similar regions

5e0a871

logging simplified

c1a8675

missed makefile change

70d6ce5

excessive requirement removed

613079d

fix name typo

74bbdce

increase open files while sorting bam outputs

62594e1

update canu tag

9f1f959

when running initial consensus (for Hi-C phasing), skip bam output/iterations to save time

skoren requested a review from Copilot November 21, 2025 14:56

Copilot AI reviewed Nov 21, 2025

View reviewed changes

copilot fixes

34cd469

skoren merged commit 018d9e9 into master Nov 21, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge in consensus changes to master #341

Merge in consensus changes to master #341

Uh oh!

skoren commented Nov 21, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	print(f'Usage: {sys.argv[0]} graph.gfa hpc.mashmap nonhpc.mashmap hic_byread output_dir no_rdna uneven_depth')
	print(f'Usage: python {os.path.basename(sys.argv[0])} graph.gfa hpc.mashmap nonhpc.mashmap hic_byread output_dir no_rdna uneven_depth')

Merge in consensus changes to master #341

Merge in consensus changes to master #341

Uh oh!

Conversation

skoren commented Nov 21, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants