Running InterProScan

Quickstart

The typical command for running InterProScan is as follows:

nextflow run ebi-pf-team/interproscan6 \
  -r 6.0.1 \ # (1)!
  -profile docker \ # (2)!
  --input /path/to/sequences.faa \ # (3)!
  --datadir data \ # (4)!
  --interpro latest # (5)!

Run InterProScan v6.0.1
Executes tasks in Docker containers
Path to your input FASTA file of protein sequences
Directory where to download and save the reference files
Use the latest version of InterPro

Info

Nextflow options use a single dash (e.g. -profile, -r), while InterProScan parameters use double dashes (e.g. --input, --datadir).

Tip

--interpro <VERSION> is optional and defaults to latest, but we strongly recommend to pin a specific version (e.g. --interpro 108.0) for reproducibility.

Selecting specific analyses

By default, InterProScan runs all analyses except the deep-learning-based analyses.

You can control the analyses to be executed with the following parameters:

--applications <LIST>: run only selected analyses
--skip-applications <LIST>: exclude analyses
--run-ml: enable deep-learning-based analyses

<LIST> is a comma-separated list of analysis names, e.g. AntiFam,CATH-Gene3D,Pfam.

See Analyses for the full list of supported analyses.

Warning

--applications is mutually exclusive with --skip-applications and --run-ml.

nextflow run ebi-pf-team/interproscan6 \
  -r 6.0.1 \
  -profile docker \
  --datadir data \
  --input /path/to/sequences.faa \
  --applications Pfam,MobiDB-lite # (1)!

Only Pfam and MobiDB-lite will be executed

nextflow run ebi-pf-team/interproscan6 \
  -r 6.0.1 \
  -profile docker \
  --datadir data \
  --input /path/to/sequences.faa \
  --skip-applications CDD,HAMAP,PANTHER # (1)!

CDD, HAMAP, and PANTHER will not be executed

Tip

Analysis names are case-insensitive, and hyphens and underscores are ignored: CATH-Gene3D, cathgene3d, and CATH_GENE3D are all valid.

Warning

Some analyses depend on separately licensed third-party applications. See Licensed applications.

Annotate nucleotide sequences

By default, InterProScan expects protein sequences, but supports nucleotide sequences when the --nucleic flag is on:

nextflow run ebi-pf-team/interproscan6 \
  -r 6.0.1 \
  -profile docker \
  --datadir data \
  --input /path/to/contigs.fna \ # (1)!
  --nucleic # (2)!

Path to a FASTA file of nucleic acid sequences
InterProScan will do a six-frame translation before annotatings the translated ORFs

InterProScan uses Easel esl-translate with its default settings. By default, it reports ORFs longer than 20 amino acids, does not require a specific start codon, and allows any amino acid to begin an ORF. This helps with sequence fragments, genes with introns, and other incomplete coding regions.

Warning

Because one nucleotide sequence can produce multiple ORFs, nucleotide FASTA files often take much longer to analyse than protein FASTA files of the same size. To improve performance, use stricter ORF filtering before running InterProScan.

Including GO terms and pathway annotations

InterPro groups related signatures from member databases such as Pfam, PIRSF, and PANTHER into curated InterPro entries. These entries capture protein families, domains, and functional sites, and can carry cross-references such as Gene Ontology (GO) terms and pathway mappings.

When a match is integrated into an InterPro entry, InterProScan can add those annotations to the output.

nextflow run ebi-pf-team/interproscan6 \
  -r 6.0.1 \
  -profile docker \
  --datadir data \
  --input /path/to/sequences.faa \
  --goterms \ # (1)!
  --pathways # (2)!

Includes GO terms associated with matched InterPro entries
Includes pathway annotations derived from matched InterPro entries

GO terms are curator-assigned to InterPro entries and describe conserved molecular functions, biological processes, or cellular locations. Pathway mappings are inferred from relationships between InterPro entries and reviewed UniProtKB proteins with EC numbers.