Automated pipeline for PPI prediction and figure creation.
PPIFold is a tool for analyzing Protein-Protein Interactions from AlphaPulldown, with automated pre- and post-processing. It is used to generate PPI predictions for multiple systems without wasting time on generating initial files and sorting results. It predicts the best homo-oligomer for a protein and the best interface for interacting with specific proteins. This allows for the prediction of massive multimeric complexes with numerous PPIs. PPIFold is designed to naively generate a complete interactome, producing all possible interactions to help predict more accurate ones.
- AlphaFold data base
- Conda
- SignalP5 (optional)
- Singularity and Singularity Image
Installation of AlphaFold data base :
sudo apt install aria2
git clone https://github.com/deepmind/alphafold.git
cd alphafold
scripts/download_all_data.sh /<Directory> > download.log 2> download_all.logSignalP5 installation (optional) :
https://services.healthtech.dtu.dk/services/SignalP-5.0/9-Downloads.php
tar -xvzf signalp-5.0b.Linux.tar.gz
cd signalp-5.0b/
cp bin/signalp /usr/local/bin
sudo cp -r lib/* /usr/local/libNote
If you don't want to use SignalP, set --use_signalP to False and don't install SignalP5.
Singularity installation :
https://docs.sylabs.io/guides/3.0/user-guide/installation.html#install-on-linux
Download Singularity image (score generation) from zenodo :
PPIFold installation :
conda create -n PPIFold -c omnia -c bioconda -c conda-forge python==3.11 openmm==8.0 pdbfixer==1.9 kalign2 hhsuite hmmer
conda activate PPIFold
pip install PPIFold
pip install -U "jax[cuda12]"==0.5.3You need two initial files :
test.txt
This file needs to be a ".txt" file.
The initial file can be set up using UniProt IDs, FASTA sequences, or both.
UniProt IDs need to be on the same line, separated by commas.
Ex :
UniprotID1,UniprotID2,UniprotID3
>ProtName4
MFKRSGSLSLALMSSFCSSSLATPLSSAEFDHVARKCAPSVATSTLAAIAK
VESRFDPLAIHDNTTGETLHWQDHTQATQVVRHRLDARHSLDVGLMQINSR
NFSMLGLTPDGALKACPSLSAAANMLKSRYAGGETIDEKQIALRRAISAYN
TGNFIRGFANGYVRKVETAAQSLVPALIEPPQDDHKALKSEDTWDVWGSYQ
RRSQEDGVGGSIAPQPPDQDNGKSADDNQVLFDLY
>ProtName5
MKHSLRTLWRLRVKINEFNEYIKEARSFDIDRMHGMRQRMRIAMALTVLFG
LMTIALALAVAALTPLKTVEPFVIRVDNSTGIIETVSALKETPNDYDEAIT
RYFASKYVRAREGFQLSEAEHNFRLVSLLSSPEEQSRFAKWYAGNNPESPQ
NIYQNMIATVTIKSISFLSKDLIQVRYYKTVRELNDKENISHWVSILNFSY
INAQISTQDRLINPLGFQVSEYRSDPEVIQconf.txt
The conf.txt file needs to contains all paths.
Path_Uniprot_ID : Path and name of the Uniprot/fasta file.
Path_AlphaFold_Data : Path to the AlphaFold database (default on ./alphadata).
Path_Singularity_Image : Path and name of the singularity image.
Path_Pickle_Feature : Path to your feature folder (default on ./feature).To use PPIFold, simply run the PPIFold command in the folder containing conf.txt and test.txt.
PPIFold --use_mmseq Boolean --make_multimers String --max_aa Integer --use_signalP Boolean --org StringOptional arguments
--use_mmseq Enable or disable MMseq for feature generation ,set to True by default
--make_multimers This argument is set to all by default. If you only want to generate inter-interaction you have to set it on inter, intra only for intra-interaction
--max_aa The maximum length of a model that can be generated by your GPU (depending on VRAM), set to 2000 by default (24 GB)
--use_signalP Use SignalP if your proteins can be periplasmic, set to True by default
--org If you use SignalP, you can select the organism (gram-, gram+, arch or euk), set to Gram- by default
Tip
Save all your pickle files in the same directory.
This pipeline have a cutoff on PAE (10), iQ-score (50) and hiQ-score (50).
MSA depth
All aligned homologous sequences for O50333.

The y-axis represents the number of homologous sequences, the x-axis represents the positions in the sequence. The color represents the sequence identity.
Residue interaction table
Table of distance between two atoms of O50331 and O5333.

Chains represent different proteins. Two residues in contact are specified, along with their distances. Distances are calculated from the center of mass of the residues. The distance threshold is 10 angtroms, and the PAE is 7.
Distogram
Distance map between each atom of O50331 and O5333.

The x and y axes represent interacting proteins. Pixels inside the black squares represent intra-protein residue distances, while pixels outside represent inter-protein residue distances. The color represents the distance in angstroms: blue indicates a short distance between two residues, and yellow indicates a large distance.
Interaction network
Protein-protein interaction network with iQ-score and homo-oligomers (hiQ-score) predictions.

This network represents interactions between R388 proteins. Each interaction is represented by a line connecting two proteins, colored according to the corresponding iQ-score. A loop on a protein indicates the best homo-oligomers with the highest hiQ-score.
iQ-Score heatmap
Heatmap of iQ-score between each PPI.

Color represents the iQ-score, with a better iQ-score indicated by a lighter color. The black boxes represent either poor PAE, homo-oligomers, or overly large total protein length.
Protein interface
Amino acid sequence with different interfaces used in interacations.

Each interface with a protein is represented by all contact residues, which are colored. The last interaction represents the interface used in homo-oligomerization. If two proteins use the same interface, they will have the same colors.
OOM_int.txt
A text file containing interactions that are too large, based on --max_aa.
Shallow_MSA.txt
A text file containing proteins with an MSA depth lower than 100 sequences.
Warning
Results for proteins with fewer than 100 sequences in the MSA are not accurate for validating or invalidating predicted PPIs.
table.cyt
A file for manually generating a network in Cytoscape.
summary.signalp5
A file who resume signal peptides for all proteins.
.pdb file
Model structure, with residues colored according to their interaction interface.
After completing test.txt, you need to complete the conf.txt file with all your paths.
Activate your Conda environment.
You must run the command in the directory.
Command :
PPIFoldQuentin Rouger, Emmanuel Giudice, Damien F Meyer, Kévin Macé, PPIFold: a tool for analysis of protein–protein interaction from AlphaPullDown, Bioinformatics Advances, Volume 5, Issue 1, 2025, vbaf090, https://doi.org/10.1093/bioadv/vbaf090
