SignalP 6.0 - DTU Health Tech (2024)

Table of Contents
Instructions 1. Specify the input sequences 2. Customize your run 3. Submit the job Example Outputs Training and testing data sets Additional data Current version (SignalP v. 6.0) Original method (SignalP v. 1.1) Update to SignalP v. 2.0 Update to SignalP v. 3.0 Update to SignalP v. 4.0 Update to SignalP v. 4.1 Update to SignalP v. 5.0 Other publications Changes from version 5 to 6 — What's new? — What happened to the organism group selection? — What are the fast and slow model modes? Changes from version 4 to 5 — What's new? — What happened to the C-, S- and Y-scores? Changes from version 4.0 to 4.1 — What's new? — Why do you present a choice between two cutoff settings? Can't you just decide on one? — Why have you imposed a minimum length? — What happened to the Background page? Changes from version 3 to 4 — What's new? — What happened to the HMM part? — Why is my favourite signal peptide no longer predicted correctly?SignalP 3.0 could do it! — What happened to the Yes/No answers for max C score etc.? Biological background, signal peptides — What are signal peptides? — Are signal peptides always N-terminal? — Are signal peptides (in the narrow sense) always cleaved? — Which protease is responsible for signal peptide (Sec/SPI)cleavage? — My protein has a signal peptide. Can I then safelyconclude that it is secreted? — Does SignalP predict signal peptides of bacterial and archaeal lipoproteins? — Does SignalP predict Tat (Twin-arginine translocation) signal peptides? Biological background, other sorting signals — What are signal anchors? — What should I use for predicting signal peptides in thebroad sense? — What should I use for predicting non-classical (leaderless) secreted proteins? Biological background, organism groups — Which version should I use for vira and bacteriophages? — Which version should I use for Tenericutes/Mollicutes(Mycoplasma and related genera)? — Which version should I use for metagenomic sequencesof unknown origin? — Is one version enough for all eukaryotic organisms, orare there differences within the eukaryotes? — Are two versions enough for all bacteria, orare there differences within the Gram-positive/Gram-negative bacterial groups? History — How are the various versions of SignalP related? — Was there ever a Nobel prize awarded for signal peptides? — Was SignalP the first signal peptide predictor? — How many times have the SignalP papers been cited? Version history Portable version Software Downloads

Instructions

1. Specify the input sequences

All the input sequences must be in one-letter amino acidcode. The allowed alphabet (identical to UniProt, not case sensitive) is as follows:

A C D E F G H I K L M N P Q R S T V W Y and X U B Z O (unknown/ambigous/non-standard)

All the alphabetic symbols not in the allowed alphabetwill be converted to X before processing. All the non-alphabeticsymbols, including white space and digits, will be ignored.

The sequences can be input in the following two ways:

  • Paste a single sequence (just the amino acids) or a number of sequences inFASTAformat into the upper window of the main server page.
  • Select a FASTAfile on your local disk, either by typing the file name into the lower windowor by browsing the disk.

Both ways can be employed at the same time: all the specified sequences willbe processed. However, there may be not more than 5,000 sequences in one submission. The sequencesmay not be longer than 10,000 amino acids.

2. Customize your run

  • Organism:
    You should specify the correct organism of origin either Eukarya or Other. This is done to prevent the prediction of types other than Sec/SPI in eukaryotic proteins. Other includes Archaea, Gram-positive and Gram-negative bacteria.
  • Output format:
    You can choose between two output formats:
    Standard
    Appropriate for most users. Shows one plot and one summary per sequence.
    Short
    Convenient if you submit lots of sequences. Shows only one line of output per sequence and no graphics.
  • Prediction mode:
    You can choose between two prediction modes:
    Fast
    Appropriate for most users. Runs a reduced-size version of SignalP 6.0 that accurately predicts probabilities. This model was generated from the slow (full) model using model distillation.
    Slow
    Runs the full SignalP 6.0 model. This is six times slower than the fast version and should be used if accurate region border predictions are needed.

3. Submit the job

Click on the "Submit" button. The status of your job (either 'queued'or 'running') will be displayed and constantly updated until it terminates andthe server output appears in the browser window.

At any time during the wait you may enter your e-mail address and simply leavethe window. Your job will continue; you will be notified by e-mail when it hasterminated. The e-mail message will contain the URL under which the results arestored; they will remain on the server for 24 hours for you to collect them.


Example Outputs

By default the server produces the following output for each input sequence. The example below shows the output for thioredoxin domain containing protein 4 precursor (endoplasmic reticulum protein ERp44), taken from the Uniprot entry ERP44_HUMAN. The signal peptide prediction is consistent with the database annotation.

One annotation is attributed to each protein, the one that has the highest probability. The protein can have a Sec signal peptide (Sec/SPI), a Lipoprotein signal peptide (Sec/SPII), a Tat signal peptide (Tat/SPI), a Tat lipoprotein signal peptide (Tat/SPII),a Pilin signal peptide (Sec/SPIII) or No signal peptide at all (Other).

If a signal peptide is predicted, the cleavage site position is reported as well.

On the plot, marginal probabilities for signal peptide regions are reported, i.e. Sec/SPI n-region / Tat/SPII h-region. There are also marginal probabilities for residues belonging to regions of the mature protein.To keep plots clean, we exclude regions with very low probabilities. The most likely label sequence, from which the cleavage site is inferred, is also indicated. The positions of the following regions and features are predicted by the model:

  • n-region: The n-terminal region of the signal peptide. Reported for Sec/SPI, Sec/SPII, Tat/SPI and Tat/SPII. Labeled as N
  • h-region: The center hydrophobic region of the signal peptide. Reported for Sec/SPI, Sec/SPII, Tat/SPI and Tat/SPII. Labeled as H
  • c-region: The c-terminal region of the signal peptide, reported for Sec/SPI and Tat/SPI.
  • Cysteine: The conserved cysteine in +1 of the cleavage site of Lipoproteins that is used for Lipidation. Labeled as c.
  • Twin-arginine motif: The twin-arginine motif at the end of the n-region that is characteristic for Tat signal peptides. Labeled as R.
  • Sec/SPIII: These signal peptides have no known region structure.

Example: secretory protein - standard output format

SignalP 6.0 - DTU Health Tech (1)

Example: secretory protein - short output format

SignalP 6.0 - DTU Health Tech (2)

From the Downloads tab, the user can obtain the results of the run in various formats, i.e. JSON, Prediction summary (results for each submission, 1 line per sequence), Processed entries fasta (a FASTA sequence file containing the sequences of protein that had predicted signal peptides, with the signal peptide removed) and Processed entries gff3 (a file showing the signal peptides feature of those proteins that had predicted signal peptides in GFF3 format).

Training and testing data sets

The datasets for training and testing SignalP 6.0 can be found here. Both datasets are in 3-line FASTA format:

>Uniprot_AC|Kingdom|Type|Partition Noamino-acid sequenceannotation [S: Sec/SPI signal peptide | T: Tat/SPI or Tat/SPII signal peptide | L: Sec/SPII signal peptide | P: Sec/SPIII signal peptide | I: cytoplasm | M: transmembrane | O: extracellular]

SignalP 6.0 Training set: download

SignalP 5.0 Benchmark set: download

Unpartitioned dataset: download

Additional data

Predictions of SignalP 6.0 in reference proteomes from UniProt release 2021_02, as used in the manuscript.

Archaea: download

Eukarya: download

Bacteria: download

Selected reference proteomes from the paper: download

Main references:

  • Current version (SignalP v. 6.0)
  • Original method (SignalP v. 1.1)
  • Update to SignalP v. 2.0
  • Update to SignalP v. 3.0
  • Update to SignalP v. 4.0
  • Update to SignalP v. 4.1
  • Update to SignalP v. 5.0

Other publications
Henrik Nielsen's PhD thesis

Current version (SignalP v. 6.0)

SignalP 6.0 predicts all five types of signal peptides using protein language models.
Felix Teufel, José Juan Almagro Armenteros, Alexander Rosenberg Johansen, Magnús Halldór Gíslason, Silas Irby Pihl, Konstantinos D Tsirigos, Ole Winther, Søren Brunak, Gunnar Von Heijne and Henrik Nielsen.
Nature Biotechnology (2021), doi:10.1038/s41587-021-01156-3
Signal peptides (SPs) are short amino acid sequences that control protein secretion and translocation in all living organisms. SPs can be predicted from sequence data, but existing algorithms are unable to detect all known types of SPs. We introduce SignalP 6.0, a machine learning model that detects all five SP types and is applicable to metagenomic data.
PMID: 34980915

Original method (SignalP v. 1.1)

Identification of prokaryotic and eukaryotic signal peptidesand prediction of their cleavage sites.
Henrik Nielsen, Jacob Engelbrecht, Søren Brunak and Gunnar vonHeijne.
Protein Engineering, 10:1-6 (1997).

We have developed a new method for the identification of signal peptides andtheir cleavage sites based on neural networks trained on separate sets ofprokaryotic and eukaryotic sequence. The method performs significantly betterthan previous prediction schemes and can easily be applied on genome-wide datasets. Discrimination between cleaved signal peptides and uncleaved N-terminalsignal-anchor sequences is also possible, though with lower precision.Predictions can be made on a publicly available WWW server.

PMID: 9051728(free full text pdfversion)

Update to SignalP v. 2.0

Prediction of signal peptides and signal anchors by a hidden Markovmodel.
Henrik Nielsen and Anders Krogh.
Proc Int Conf Intell Syst Mol Biol. (ISMB 6), 6:122-130 (1998).

A hidden Markov model of signal peptides has been developed. It containssubmodels for the N-terminal part, the hydrophobic region, and the regionaround the cleavage site. For known signal peptides, the model can be used toassign objective boundaries between these three regions. Applied to our data,the length distributions for the three regions are significantly different fromexpectations. For instance, the assigned hydrophobic region is between 8 and 12residues long in almost all eukaryotic signal peptides. This analysis alsomakes obvious the difference between eukaryotes, Gram-positive bacteria, andGram-negative bacteria. The model can be used to predict the location of thecleavage site, which it finds correctly in nearly 70% of signal peptides in across-validated test — almost the same accuracy as the best previous method. Oneof the problems for existing prediction methods is the poor discriminationbetween signal peptides and uncleaved signal anchors, but this is substantiallyimproved by the hidden Markov model when expanding it with a very simple signalanchor model.

PMID: 9783217

Update to SignalP v. 3.0

Improved prediction of signal peptides: SignalP 3.0.
Jannick Dyrløv Bendtsen, Henrik Nielsen, Gunnar von Heijne and Søren Brunak.
J. Mol. Biol., 340:783-795 (2004).

We describe improvements of the currently mostpopular method for prediction of classically secreted proteins,SignalP. SignalP consists of two different predictors based onneural network and hidden Markov model algorithms, and bothcomponents have been updated. Motivated by the idea that thecleavage site position and the amino acid composition of thesignal peptide are correlated, new features have been included asinput to the neural network. This addition, together with athorough error-correction of a new data set, have improved theperformance of the predictor significantly over SignalP version 2.In version 3, correctness of the cleavage site predictions haveincreased notably for all three organism groups, eukaryotes, Gramnegative and Gram positive bacteria. The accuracy of cleavage siteprediction has increased in the range from 6–17 % over theprevious version, whereas the signal peptide discriminationimprovement mainly is due to the elimination of false positivepredictions, as well as the introduction of a new discriminationscore for the neural network. The new method has also beenbenchmarked against other available methods.

PMID: 15223320 doi: 10.1016/j.jmb.2004.05.028

Update to SignalP v. 4.0

SignalP 4.0: discriminating signal peptides from transmembrane regions.
Thomas Nordahl Petersen, Søren Brunak, Gunnar von Heijne and Henrik Nielsen.
Nature Methods, 8:785-786 (2011).

This is a Correspondence, it has no abstract.

doi: 10.1038/nmeth.1701
Access to the supplementary materials: nmeth.1701-S1.pdf

Update to SignalP v. 4.1

Predicting Secretory Proteins with SignalP
Henrik Nielsen.
In Kihara, D (ed): Protein Function Prediction (Methods in Molecular Biology vol. 1611) pp. 59-73, Springer 2017.
doi: 10.1007/978-1-4939-7015-5_6
PMID: 28451972
SignalP is the currently most widely used program for prediction of signal peptides from amino acid sequences. Proteins with signal peptides are targeted to the secretory pathway, but are not necessarily secreted. After a brief introduction to the biology of signal peptides and the history of signal peptide prediction, this chapter will describe all the options of the current version of SignalP and the details of the output from the program. The chapter includes a case study where the scores of SignalP were used in a novel way to predict the functional effects of amino acid substitutions in signal peptides.

Update to SignalP v. 5.0

SignalP 5.0 improves signal peptide predictions using deep neural networks.
José Juan Almagro Armenteros, Konstantinos D. Tsirigos, Casper Kaae Sønderby, Thomas Nordahl Petersen, Ole Winther, Søren Brunak, Gunnar von Heijne and Henrik Nielsen.
Nature Biotechnology, 37, 420-423, doi:10.1038/s41587-019-0036-z (2019)
Signal peptides (SPs) are short amino acid sequences in the amino terminus of many newly synthesized proteins that target proteins into, or across, membranes. Bioinformatic tools can predict SPs from amino acid sequences, but most cannot distinguish between various types of signal peptides. We present a deep neural network-based approach that improves SP prediction across all domains of life and distinguishes between three types of prokaryotic SPs.

PMID: 30778233

Other publications


Locating proteins in the cell using TargetP,SignalP, and related tools
Olof Emanuelsson, Søren Brunak, Gunnar von Heijne, Henrik Nielsen
Nature Protocols, 2:953-971 (2007).

Determining the subcellular localization of a protein is an importantfirst step toward understanding its function. Here, we describe theproperties of three well-known N-terminal sequence motifs directingproteins to the secretory pathway, mitochondria and chloroplasts, andsketch a brief history of methods to predict subcellular localizationbased on these sorting signals and other sequence properties. We thenoutline how to use a number of internet-accessible tools to arrive at areliable subcellular localization prediction for eukaryotic andprokaryotic proteins. In particular, we provide detailed step-by-stepinstructions for the coupled use of the amino-acid sequence-basedpredictors TargetP, SignalP, ChloroP and TMHMM, which are all hosted atthe Center for Biological Sequence Analysis, Technical University ofDenmark. In addition, we describe and provide web references to otheruseful subcellular localization predictors. Finally, we discusspredictive performance measures in general and the performance ofTargetP and SignalP in particular.

PMID: 17446895
Please click here to access the paper and supplementary materials.


Machine learning approaches to the prediction of signal peptidesand other protein sorting signals.
Henrik Nielsen, Søren Brunak, and Gunnar von Heijne.
Protein Engineering, 12:3-9 (1999), Review.

Prediction of protein sorting signals from the sequence of amino acids hasgreat importance in the field of proteomics today. Recently, the growth ofprotein databases, combined with machine learning approaches, such as neuralnetworks and hidden Markov models, have made it possible to achieve a level ofreliability where practical use in, for example automatic database annotationis feasible. In this review, we concentrate on the present status and futureperspectives of SignalP, our neural network-based method for prediction of themost well-known sorting signal: the secretory signal peptide. We discuss theproblems associated with the use of SignalP on genomic sequences, showing thatsignal peptide prediction will improve further if integrated with predictionsof start codons and transmembrane helices. As a step towards this goal, ahidden Markov model version of SignalP has been developed, making it possibleto discriminate between cleaved signal peptides and uncleaved signal anchors.Furthermore, we show how SignalP can be used to characterize putative signalpeptides from an archaeon, Methanococcus jannaschii. Finally, we briefly reviewa few methods for predicting other protein sorting signals and discuss thefuture of protein sorting prediction in general.

PMID: 10065704


A neural network method for identification of prokaryotic and eukaryoticsignal peptides and prediction of their cleavage sites.
Henrik Nielsen, Jacob Engelbrecht, Søren Brunakand Gunnar von Heijne.
Int. J. Neural Sys., 8:581-599 (1997).

We have developed a new method for the identification of signal peptides andtheir cleavage sites based on neural networks trained on separate sets ofprokaryotic and eukaryotic sequences. The method performs significantly betterthan previous prediction schemes, and can easily be applied to genome-wide datasets. Discrimination between cleaved signal peptides and uncleaved N-terminalsignal-anchor sequences is also possible, though with lower precision.Predictions can be made on a publicly available WWW server:http://www.cbs.dtu.dk/services/SignalP/.

PMID: 10065837


Defining a similarity threshold for a functional protein sequence pattern:the signal peptide cleavage site.
Henrik Nielsen, Jacob Engelbrecht, Gunnar von Heijneand Søren Brunak.
Proteins, 24:165-77 (1996).

When preparing data sets of amino acid or nucleotide sequences it isnecessary to exclude redundant or hom*ologous sequences in order to avoidoverestimating the predictive performance of an algorithm. For some timemethods for doing this have been available in the area of protein structureprediction. We have developed a similar procedure based on pair-wisealignments for sequences with functional sites. We show how a correlationcoefficient between sequence similarity and functional hom*ology can be usedto compare the efficiency of different similarity measures and choose anonarbitrary threshold value for excluding redundant sequences. The impactof the choice of scoring matrix used in the alignments is examined. Wedemonstrate that the parameter determining the quality of the correlation isthe relative entropy of the matrix, rather than the assumed (PAM oridentity) substitution mode. Results are presented for the case ofprediction of cleavage sites in signal peptides. By inspection of the falsepositives, several errors in the database were found. The procedurepresented may be used as a general outline for finding a problem-specificsimilarity measure and threshold value for analysis of other functionalamino acid or nucleotide sequence patterns.

PMID: 8820484


From sequence to sorting: Prediction of signal peptides.
Henrik Nielsen.
Ph.D. thesis, defended at Department of Biochemistry,Stockholm University, Sweden, May 25, 1999.

In the present age of genome sequencing, a vast number of predictedgenes are initially known only by their putative nucleotidesequence. The newly established field of bioinformatics is concernedwith the computational prediction of structural and functionalproperties of genes and the proteins they encode, based on theirnucleotide and amino acid sequences.
Since one of the crucial properties of a protein is its subcellularlocation, prediction of protein sorting is an important question inbioinformatics. A fundamental distinction in protein sorting is thatbetween secretory and non-secretory proteins, determined by acleavable N-terminal sorting signal, the secretory signal peptide.
The main part of this thesis, including four of the six papers,concerns prediction of secretory signal peptides in both eukaryoticand bacterial data using two machine learning techniques: artificialneural networks and hidden Markov models. A central result is theSignalP prediction method, which has been made available as a WorldWide Web server and is very widely used.
Two additional prediction methods are also included, with one papereach. ChloroP predicts chloroplast transit peptides, anothercleavable N-terminal sorting signal; while NetStart predicts startcodons in eukaryotic genes. For prediction of all N-terminal signals,the assignment of correct start codon can be critical, which is whyprediction of translation initiation from the nucleotide sequence isalso important for protein sorting prediction.
This thesis comprises a detailed review of the molecular biology ofprotein secretion, a short introduction to the most important machinelearning algorithms in bioinformatics, and a critical review ofexisting methods for protein sorting prediction. In addition, it contains general treatment of the principles of data set constructionand performance evaluation for prediction methods in bioinformatics.

Access to the thesis (without the six included papers): PhDthesis.pdf; PhDthesis-cover.pdf

Changes from version 5 to 6
Changes from version 4 to 5
Changes from version 4.0 to 4.1
Changes from version 3 to 4
Biological background, signal peptides
Biological background, other sorting signals
Biological background, organism groups
History

Changes from version 5 to 6

— What's new?

Please see the version history page.

— What happened to the organism group selection?

SignalP 6.0 is based on a protein language model, which makes it capable of understanding the phylogenomic context of a protein from its amino acid sequence directly. The model does no longer require the organism information for prediction.

— What are the fast and slow model modes?

The protein language model on which SignalP 6.0 is built is computationally very expensive. To enable a prediction speed comparable to previous versions, we created a model of reduced size that emulates the output of the larger (slow) model. We recommend the fast model for most applications, i.e. predicting SPs in a large number of unknown sequences. For detailed analysis of SP regions the slow model should be used. The creation of the fast model is described in the supplementary material of the manuscript.

Changes from version 4 to 5

— What's new?

Please see the version history page.

— What happened to the C-, S- and Y-scores?

The output layer of SignalP 5.0 is a conditional random field (CRF) which yields marginal probabilities, just like the HMM module did in SignalP versions2 and 3. Since the CRF is a grammatical method which is aware that there can only be one cleavage site in a given signal peptide, there is no need for the post-processing of the network output that was represented by the Y-score.

Changes from version 4.0 to 4.1

— What's new?

Please see the version history page.

— Why do you present a choice between two cutoff settings? Can't you just decide on one?

The optimal cutoff really depends on what you want to use the methodfor. If it is important to find all signal peptides, use the sensitivecutoff. If you want an estimate of the number of signal peptides in agenome, use the default cutoff.

— Why have you imposed a minimum length?

Because we believe that predictions of signal peptides shorter than ten residues made by SignalP 4.1 are false. The shortestknown signal peptides are 11 residues long (with one exception, SP23_TENMO, which does not look like a signal peptide atall). Clickhere for an updated list of experimentally confirmed signalpeptides from UniProt of length 11 or shorter.

— What happened to the Background page?

It's here! The important material from the Background page has beenintegrated into this FAQ, we hope you like the new format.

Changes from version 3 to 4

— What's new?

Please see the version history page.

— What happened to the HMM part?

While making SignalP 4.0, we did retrain the Hidden Markov Model (HMM) part of SignalP. However, we found that it did not perform better thanthe neural networks in any of the performance parameters we tested.Therefore, we decided not to include it. If the HMM output is importantfor you, you can still use SignalP 3.0.

— Why is my favourite signal peptide no longer predicted correctly?SignalP 3.0 could do it!

As explained on the performance page, SignalP 4 with the default cutoff has a lower sensitivity than SignalP3. Please try again with the new "Sensitive" setting.

— What happened to the Yes/No answers for max C score etc.?

SignalP 3.0 provided five Yes/No answers for the NN part. We found thatthis was confusing for users and obscured the fact that the D-score isthe best score for discriminating between signal peptides and non-signalpeptides.

Biological background, signal peptides

— What are signal peptides?

The term "signal peptide" is used with two meanings: In the broadsense (used in many textbooks), a signal peptide is any sorting signal embedded in the aminoacid sequence of a protein. In the narrow sense (used in most of the scientific literature), a signal peptideis an N-terminal signal that directs the protein across the ERmembrane in eukaryotes and across the plasma membrane in prokaryotes.Signal peptides in the narrow sense are also known as ER signalpeptides or secretory signal peptides. Read more in UniProt, inWikipedia,and in the Sequence feature ontology.

It is important to emphasize that SignalP predicts signal peptides inthe narrow sense only.

— Are signal peptides always N-terminal?

In the narrow sense: Yes, per definition. In the broad sense: No, there are several sorting signal that are C-terminal(e.g. the PTS1 signal for peroxisomal import)or internal (e.g. the nuclear localization signal).

— Are signal peptides (in the narrow sense) always cleaved?

No, there are rare cases of uncleaved signal peptides. For an updated list of such proteins annotated in UniProt, click here.These should not be confused with signal anchors, see below.

— Which protease is responsible for signal peptide (Sec/SPI)cleavage?

In bacteria, it is Signal Peptidase I (SPase I), also known as Leader Peptidase(Lep). In eukaryotes, it is the signal peptidase complex (SPC), whichconsists of four subunits in yeast and five in mammals. Read more in MEROPS.

— My protein has a signal peptide. Can I then safelyconclude that it is secreted?

No. You can only conclude that it enters the secretory pathway.

In eukaryotes, there are several opportunities for a protein with asignal peptide to escape secretion. It could:

  • be retained in the endoplasmic reticulum (ER). Soluble ER-resident proteins have a C-terminal retention signal with the consensussequence KDEL, see PROSITE.
  • be retained in the Golgi apparatus,
  • be directed to the lysosome (vacuole in plants and fungi),
  • have one or more transmembrane helices and therefore beretained in either the plasma membrane, or one of the membranes of thesecretory pathway (ER, Golgi, lysosome/vacuole), or
  • have a signal for GPI-anchoring, a C-terminal cleavedpeptide which functions as a signal for attachment of aGlycophosphatidylinositol (GPI) group that anchors the protein to theouter face of the plasma membrane.

In Gram-positive bacteria and Archaea, a protein with a signal peptide could:

  • have one or more transmembrane helices, or
  • be attached to the cell wall.

In Gram-negative bacteria, a protein with a signal peptide could:

  • have one or more transmembrane helices,
  • be retained in the periplasm, or
  • be inserted into the outer membrane as a β-barrel transmembraneprotein.

— Does SignalP predict signal peptides of bacterial and archaeal lipoproteins?

Yes. Bacterial lipoproteins have special signal peptides (Sec/SPII) which arecleaved by Signal Peptidase II (SPase II), also known as Lipoproteinsignal peptidase (Lsp). A diacylglyceryl group is attached to a Cysteine residuein position +1 relative to the cleavage site, which bears no resemblanceto the SPase I cleavage site. See alsoMEROPSand PROSITE.

— Does SignalP predict Tat (Twin-arginine translocation) signal peptides?

Yes. Bacterial and archaeal Tat signal peptides (Tat/SPI), which direct their proteins throughan alternative translocon (TatABC instead of SecYEG), have a special motif, usually containing twoArginines, in the n-region. Additionally, they are in general longer and lesshydrophobic than "normal" (Sec) signal peptides. See also PROSITE andInterPro.

Biological background, other sorting signals

— What are signal anchors?

A signal anchor is a transmembrane helix located close to the N-terminusof a protein with an N-in orientation (i.e. the N-terminus is on thecytoplasmic side of the membrane). It functions much like a signalpeptide since it is recognized by the Signal Recognition Particle (SRP)and inserted into the translocon; but instead of being cleaved anddegraded it remains in the membrane and anchors the protein to it.Proteins anchored in this way are known as Type II transmembraneproteins.

SignalP 6.0 - DTU Health Tech (3)Signal peptides (above) versus
signal anchors (below)

It is important to realize that the difference between signal peptidesand signal anchors is not a question of presence or absence of acleavage site. Instead, the most important difference seems to be thelength of the hydrophobic domain. It has been shown experimentally that it is possible to convert a cleavedsignal peptide to a signal anchor merely by lengthening theh-region, without altering the cleavage site(Chou & Kendall 1990; Nilsson, Whitley, & von Heijne 1994).

The introduction of the Hidden Markov Model (HMM) method in SignalPversion 2 made it possible to some extent to distinguish signal peptidesfrom signal anchors (in that version, only in eukaryotes). However,SignalP 4 (based entirely on the Neural Network (NN) method), does abetter job, since its negative set is not confined only to transmembranehelices annotated as signal anchors, but includes all types oftransmembrane segments close to the N-terminus.

— What should I use for predicting signal peptides in thebroad sense?

For mitochondrial and plastid import signals, also known as transitpeptides, we recommend TargetP. Forgeneral prediction of subcellular location in eukaryotes, we recommendDeepLoc.

— What should I use for predicting non-classical (leaderless) secreted proteins?

Not all secretory proteins carry signal peptides. Some proteins enter a non-classical secretory pathwaywithout any currently known sequence motif. In eukaryotes, these proteins are mostly growth factorsand extracellular matrix binding proteins. In Gram-negative bacteria, thetype I, III, IV and VI secretion systems function without signal peptides. For prediction of such proteins werecommend the SecretomePserver.

Biological background, organism groups

— Which version should I use for vira and bacteriophages?

You should use the version corresponding to the host organism. There aresome indications that viral signal peptides differ from those of thehost organism, but SignalP currently does not take that into account.

— Which version should I use for Tenericutes/Mollicutes(Mycoplasma and related genera)?

You shouldn't use SignalP at all for these organisms, since they seem tolack a type I signal peptidase completely!

— Which version should I use for metagenomic sequencesof unknown origin?

This is an unsolved question. Please use all four versions tosearch for signal peptides in such data.

— Is one version enough for all eukaryotic organisms, orare there differences within the eukaryotes?

It is known that some yeast signal peptides are not recognized bymammalian cells (Bird et al., 1987 and 1990). Therefore, it would be natural to assume that separate SignalP versionsfor yeast and Mammalia would provide better predictions than a commoneukaryotic version. While developing SignalP 4.0 we tried dividing theeukaryotic data into animals, fungi, and plants and training separatemethods for these three groups. However, this did not give anyimprovement, and performance for all three groups was better when usingthe method trained on all eukaryotic sequences together.

— Are two versions enough for all bacteria, orare there differences within the Gram-positive/Gram-negative bacterial groups?

The Gram-negative version of SignalP is almost certainly biased towardsE. coli and other γ-proteobacteria, since these constitute the bulkof the experimentally annotated bacterial proteins in UniProt.Unpublished results suggest that some bacteria have very divergentcleavage site motifs. Future versions of SignalP might therefore dividethe Gram-negative bacteria into several classes, if data are available.

Gram-positive bacteria probably constitute a more hom*ogenous group, butit is an open question whether there are differences in signal peptidesbetween Actinobacteria (high G+C Gram-positive bacteria) and Firmicutes (low G+C Gram-positive bacteria). More data onActinobacteria are needed before that can be answered.

History

— How are the various versions of SignalP related?

Please see the version history page

— Was there ever a Nobel prize awarded for signal peptides?

Yes, for signal peptides in the broad sense. The importance of signal peptideswas emphasized in 1999 when Günter Blobel received the Nobel Prize inphysiology or medicine for his discovery "proteins have intrinsicsignal that govern their transport and localization in the cell". See the press release.

— Was SignalP the first signal peptide predictor?

No, but it was, to our knowledge, the first to be implemented as aweb server (in 1996). Among the earlier methods were McGeoch (1985) and von Heijne (1986), both of which have been included in PSORT.

— How many times have the SignalP papers been cited?

This information is available on Henrik Nielsen's ResearcherID,Scopus,and GoogleScholar pages.

Version history

Please click on the version number to activate the corresponding server where available.

6.0 The current server. New in this version:
  • Model architecture: SignalP 6.0 is based on a transformer language model, trained on a massive dataset of unlabeled protein sequences. Pretraining on unlabeled protein sequences before learning to detect signal peptides leads to better prediction performance, especially for SP types where the number of known signal peptide sequences is very small.
  • Tat lipoprotein signal peptides: SignalP 6.0 can differentiate between "standard" Tat signal peptides cleaved by signal peptidase I (Tat/SPI) and lipoprotein Tat signal peptides cleaved by signal peptidase II (Tat/SPII) in Bacteria and Archaea.
  • Pilin and Pilin-like signal peptides: SignalP 6.0 can predict the signal peptides of Pilins and Pilin-like proteins that are translocated by Sec and cleaved by signal peptidase III (Sec/SPIII) in Bacteria and Archaea.
  • Signal peptide regions: SignalP 6.0 is capable of predicting the positions of the biochemical regions of all signal peptide types.
  • Metagenomic data: SignalP 6.0 does no longer need to know the organism group of origin for prokaryotes (Gram-positive, Gram-negative and Archaea). It can thus be used on metagenomic data where the origin of the sequences is unclear.
Main publication:
  • SignalP 6.0 predicts all five types of signal peptides using protein language models.
    Felix Teufel, José Juan Almagro Armenteros, Alexander Rosenberg Johansen, Magnús Halldór Gíslason, Silas Irby Pihl, Konstantinos D Tsirigos, Ole Winther, Søren Brunak, Gunnar Von Heijne and Henrik Nielsen.
    Nature Biotechnology (2021), doi:10.1038/s41587-021-01156-3
5.0 New in this version:
  • Deep learning: SignalP 5.0 is based on convolutional and recurrent (LSTM) neural networks. The deep recurrent neural network architecture is better suited to recognizing sequence motifs of varying length, such as signal peptides, than traditional feed-forward neural networks (as used in SignalP 1-4).
  • Conditional random field: The neural networks of SignalP 5.0 are combined with a conditional random field (CRF). The CRF imposes a defined grammar on the prediction and obviates the need for the post-processing step (Y- and D-scores) used in earlier versions of SignalP.
  • Transfer learning: Instead of training separate networks for each organism group, SignalP 5.0 exploits the fact that signal peptides from different domains of life are to some degree similar. Thus, SignalP 5.0 is trained on data from all groups while an extra input unit informs the network about the origin of the sequences.
  • Archaeal option: Thanks to transfer learning, SignalP 5.0 is able to make predictions also of signal peptides from Archaea, even though the data set is limited.
  • Lipoprotein signal peptides: SignalP 5.0 can now differentiate between "standard" signal peptidase I-cleaved signal peptides (Sec/SPI) and signal peptidase II-cleaved lipoprotein signal peptides (Sec/SPII) in Bacteria and Archaea. Previously, we referred to the LipoP server for this prediction.
  • Tat signal peptides: SignalP 5.0 can now differentiate between "standard" signal peptides translocated by the Sec translocon (Sec/SPI) and "Tat" (Twin-Arginine Translocation) signal peptides translocated by the Tat translocon (Tat/SPI) in Bacteria and Archaea. Previously, we referred to the TatP server for this prediction. However, SignalP 5.0 cannot predict lipoprotein signal peptides translocated by the Tat translocon (Tat/SPII) since we did not find any confirmed examples of these while constructing the data sets.
Main publication:
  • SignalP 5.0 improves signal peptide predictions using deep neural networks
    José Juan Almagro Armenteros, Konstantinos D. Tsirigos, Casper Kaae Sønderby, Thomas Nordahl Petersen, Ole Winther, Søren Brunak, Gunnar von Heijne and Henrik Nielsen.
    Nature Biotechnology, 37:420-423, 2019.
4.1 New in this version:
  • For the web page, an option to set the D-score cutoff values so that the sensitivity is the same as that of SignalP 3.0.
  • Option included to set the minimum cleavage site position i.e. Ymax position - default value is 10.
  • For the signalp package an option has been included to specify a temporary directory (-T dir).
  • For the signalp package an option has been included to show signalp version (-V).
  • Documentation rewritten.

Main publication:

  • SignalP 4.0: discriminating signal peptides from transmembrane regions
    Thomas Nordahl Petersen, Søren Brunak, Gunnar von Heijne and Henrik Nielsen.
    Nature Methods, 8:785-786, 2011.
4.0 New in this version:
  • Improved discrimination between signal peptides and transmembrane regions.
  • No HMM method - only one prediction.

Main publication:

  • SignalP 4.0: discriminating signal peptides from transmembrane regions
    Thomas Nordahl Petersen, Søren Brunak, Gunnar von Heijne and Henrik Nielsen.
    Nature Methods, 8:785-786, 2011.
3.0 New in this version:
  • D-score. Improved quality of prediction.

Main publication:

  • Improved prediction of signal peptides: SignalP 3.0.
    Jannick Dyrløv Bendtsen, Henrik Nielsen, Gunnar von Heijne and Søren Brunak.
    J. Mol. Biol., 340:783-795, 2004.
2.0 New in this version:
  • Incorporation of a hidden Markov model version: SignalP V2.0 comprises two signal peptide prediction methods, SignalP-NN (based on neural networks, corresponding to SignalP V1.1) and SignalP-HMM (based on hidden Markov models). For eukaryotic data, SignalP-HMM has a substantially improved discrimination between signal peptides and uncleaved signal anchors, but it has a slightly lower accuracy in predicting the precise location of the cleavage site. The user can choose whether to run SignalP-NN, SignalP-HMM, or both.
  • Retraining of the neural networks: SignalP-NN in SignalP V2.0 is trained on a newer data set derived from SWISS-PROT rel. 35 (instead of rel. 29 as in SignalP V1.1).
  • Graphics integrated in the output: SignalP V2.0 shows signal peptide and cleavage site scores for each position as plots in GIF format on the output page. The plots provide more information than the prediction summary, e.g. about possible cleavage sites other than the strongest prediction.
  • Signal peptide region assignment: SignalP-HMM provides not only a prediction of the presence of a signal peptide and the position of the cleavage site, but also an approximate assignment of n-, h- and c-regions within the signal peptide. These are shown in the graphical output as probabilities for each position being in one of these three regions.
  • Automatic truncation: in SignalP V1.1, we recommended that you should submit only the N-terminal part of each protein, not more than 50-70 amino acids. SignalP V2.0 now offers to truncate your sequences automatically.

Main publication:

  • Prediction of signal peptides and signal anchors by a hidden Markov model.
    Henrik Nielsen and Anders Krogh.
    Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology (ISMB 6), AAAI Press, Menlo Park, California, pp. 122-130, 1998.
1.1 The original server: the method based on artificial neural networks.

Main publication:

  • Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.
    Henrik Nielsen, Jacob Engelbrecht, Søren Brunak and Gunnar von Heijne.
    Protein Engineering, 10:1-6, 1997.

Portable version

Would you prefer to run SignalP at your own site? SignalP 6.0 is available as a Python package, with the same functionality as this service. There is a download page for academic users; other users are requested to contact DTU Health Technology Software Package Manager at

Software Downloads

  • Version 6.0h
    • slow_sequential
    • fast
  • Version 5.0b
    • Linux
    • Darwin
  • Version 4.1g
    • Linux
    • IRIX64
    • Darwin
    • CYGWIN
  • Version 3.0
    • SunOS
    • OSF1
    • Linux
    • IRIX
    • AIX
  • Version 2.0
    • SunOS
    • OSF1
    • Linux
    • IRIX
    • AIX
SignalP 6.0 - DTU Health Tech (2024)
Top Articles
Latest Posts
Article information

Author: Tish Haag

Last Updated:

Views: 5708

Rating: 4.7 / 5 (47 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Tish Haag

Birthday: 1999-11-18

Address: 30256 Tara Expressway, Kutchburgh, VT 92892-0078

Phone: +4215847628708

Job: Internal Consulting Engineer

Hobby: Roller skating, Roller skating, Kayaking, Flying, Graffiti, Ghost hunting, scrapbook

Introduction: My name is Tish Haag, I am a excited, delightful, curious, beautiful, agreeable, enchanting, fancy person who loves writing and wants to share my knowledge and understanding with you.