Instructions for using ERGO-II, the TCR-Peptide binding predictor
1. Specify input
The input file must be in .csv format as in the provided example (case sensitive).
The file should be a CSV table with several columns, matching the relevant features.
The file have to include the column headers 'TRA', 'TRB', 'TRAV', 'TRAJ', 'TRBV', 'TRBJ', 'T-Cell-Type', 'Peptide' and 'MHC', even if not all features are used.
Leave blank spots for missing features, yet 'TRB' and 'Peptide' sequences must appear in each line.
ERGO-II expects TCR sequences to begin with Cysteine ('C') and finish with Phenylalanine ('F').
Make sure your file contain only upper-cased amino-acid letters, without any other characters as '*', 'X' or non-ascii characters.
T-Cell-Type can be 'CD4' or 'CD8'.
The model prediction might crash for wrong input files.
When using the Autoencoder based model, notice that the TCR length should be at most 28. The model will only use the first 28 amino acids for longer TCR sequences.
There is no length limit on the peptides.
Predict up to 50,000 pairs at a time. For longer files, you can split your file into chunks.
2. Model Configurations
Choose the model configuration for the prediction.
ERGO-II is trained using two distinct models: the autoencoder based model or the LSTM based model.
You can also choose the database that the model used for training: McPAS or VDJdb.
Additional features such as TCR alpha sequence, V and J genes, MHC and T-Cell-Type can be used.
Note that this webtool allows you to choose only configurations that ERGO-II has already trained on.
3. Submit to prediction
Click on the 'predict' button to get the binding prediction scores.
4. Download results
The results format is a csv file similar to the input format, with an additional score column,
which contains the model predictions between 0 and 1 for each row.
For example, see the output file in the 'Example' section.