Skip to content

2025-05-06.build7983

Latest

Choose a tag to compare

@azat-badretdin azat-badretdin released this 08 May 18:41
· 18 commits to master since this release

This release is based on PGAP-6.10 and includes several significant changes that have been made to the structural annotation components of the pipeline.

New Features

  • In order to improve pipeline scalability and maintainability, NCBI PGAP now uses Miniprot 0.15 for protein to genome alignments (https://github.com/lh3/miniprot; see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9869432/).

    • We have worked hard to minimize the adverse effects of a switch in algorithms and do not expect any disruption in the quality of our annotation calls. After extensive testing on a broad range of taxa, we conclude that PGAP 6.8 perfectly reproduces 98.6% of the protein models produced by PGAP 6.7, with the vast majority of the remaining differences confined to small changes in start site selection. On average, we expect such changes to approximately 40 models per assembly.
  • Introduced ORF filtering: a process whereby we focus prediction efforts on ORFs most likely to correspond to final annotation. The net effect is a significant performance improvement with no appreciable impact on annotation quality.

  • Together, these changes have resulted in a greater than 30% improvement in processing time.

  • CRISPR Identification: CRISPRCasFinder has replaced PILER-CR-based CRISPR identification

  • component upgrade: Rfam v.15.0

  • component upgrade: PFam 37.1

  • component upgrade: CDD 3.21 in Protein Family Models

Third Party Software versions used:

  • tRNAscan-SE 2.0.12
  • hmmer v.3.4
  • CRISPRCasFinder 4.3.2
  • AntiFam v.3.0
  • Rfam v.15.0
  • GeneMarkS2-v.1.14_1.25
  • infernal 1.1.5