-
Notifications
You must be signed in to change notification settings - Fork 33
Merge in consensus changes to master #341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
update combine and consensus times due to bam generation enable bam output by default and allow consensus options properly process and drop contigs in layout when they have only noisy long reads no accurate read support when merging layouts output layout read use counts used in bam MAPQ and consensus
…ate early but not reported as failed by grid engine
when running initial consensus (for Hi-C phasing), skip bam output/iterations to save time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR merges consensus generation changes from a development branch into the master branch. The changes primarily focus on improving consensus generation performance, adding new configuration options, and refactoring logging infrastructure.
Key changes:
- Consensus generation now supports configurable iterations, coverage limits, and quick mode for initial runs
- Refactored logging system by removing custom logger wrapper and adopting standard Python logging
- Updated command-line interface to support consensus-related parameters and removed deprecated options
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/verkko.sh | Added consensus configuration parameters and modified BAM generation defaults |
| src/scripts/scaffolding/scaffold_graph.py | Removed custom logger dependency, switched to standard logging module |
| src/scripts/scaffolding/match_graph.py | Removed custom logger wrapper, migrated to Python logging |
| src/scripts/scaffolding/logger_wrap.py | Deleted entire custom logger wrapper module |
| src/scripts/merge_layouts.py | Enhanced to handle scfmap files and filter dropped contigs |
| src/scripts/logging_utils.py | New unified logging utility module |
| src/scripts/launch_scaffolding.py | Updated to use new logging infrastructure |
| src/scripts/launch_phasing.py | Added logging setup and updated function signature |
| src/scripts/cluster.py | Major refactoring: removed deprecated functions, improved phasing logic, migrated to standard logging |
| src/scripts/get_utig1_from_utig4.py | New script for transforming unitig paths |
| src/scripts/get_layout_from_mbg.py | Added copy count tracking for reads |
| src/scripts/fasta_util.py | Enhanced readScfMap to return both scaffold map and names |
| src/scripts/check_layout_gaps.py | Fixed string formatting bug |
| src/Snakefiles/7-generateConsensus.sm | Added configurable consensus parameters and quick mode support |
| src/Snakefiles/7-combineConsensus.sm | Enhanced BAM merging with ulimit handling and conditional generation |
| src/Snakefiles/6-layoutContigs.sm | Updated to handle dropped contigs output |
| src/Snakefiles/2-processGraph.sm | Modified gap insertion pass parameters |
| src/main.mk | Updated file installation paths for new logging module |
Comments suppressed due to low confidence (1)
src/Snakefiles/c2-findOverlaps.sm:1
- [nitpick] The atomic rename pattern is good practice, but consider cleaning up the .WORKING files in case of failure. Add a trap or cleanup step to remove temporary files if the command fails.
#############################################################################
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| real_idy = self.parseIDY(idy) | ||
| #Constant not depending on genome, intervals too similar for hi-c alignment to use | ||
| if real_idy > 0.995: | ||
| if real_idy > self.FILTERED_IDY_CUTOFF: |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The constant FILTERED_IDY_CUTOFF (0.995) is hardcoded as a class attribute but was previously embedded in code. While this is an improvement, consider making this configurable via a parameter if different identity cutoffs might be needed for different genomes or use cases.
| if len(sys.argv) != 7: | ||
| print(f'Usage: {sys.argv[0]} graph.gfa homologous_nodes.matches hic_byread output_dir, no_rdna, uneven_depth') | ||
| if len(sys.argv) != 8: | ||
| print(f'Usage: {sys.argv[0]} graph.gfa hpc.mashmap nonhpc.mashmap hic_byread output_dir no_rdna uneven_depth') |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The argument count validation expects 8 arguments but the usage message lists 7 parameters (plus the script name would be 8 total). This is correct but could be clearer by explicitly stating the script name is sys.argv[0].
| print(f'Usage: {sys.argv[0]} graph.gfa hpc.mashmap nonhpc.mashmap hic_byread output_dir no_rdna uneven_depth') | |
| print(f'Usage: python {os.path.basename(sys.argv[0])} graph.gfa hpc.mashmap nonhpc.mashmap hic_byread output_dir no_rdna uneven_depth') |
| echo " Max open files limited to \$bef, no increase possible." | ||
| fi | ||
|
|
||
| mem_per_core=\`(expr \( {resources.mem_gb} \* 70 \) / \( 100 \* {threads} \) | awk '{{if (\$1 < 1) print "1G"; else print \$1"G"}}') || true\` |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The memory calculation uses magic number 70 (70% of memory). Consider defining this as a variable with a descriptive name like MEM_USAGE_PERCENT=70 to improve clarity and maintainability.
No description provided.