Select a row to see which lineages the domain architecture is present in
MolEvolvR allows users to start with protein(s) of interest and perform the full analysis (1+3+4), only protein characterization (1+3), only homology searches (1+4), or start with external outputs from BLAST or Interproscan for further analysis, summarization, and visualization (2+3+4). MolEvolvR is interactive, queriable, and customizable.
Studying proteins through the lens of evolution can help identify conserved features and lineage-specific variants, and consequently, their functions. MolEvolvR is a web-app that enables researchers to run a general-purpose computational workflow for characterizing the molecular evolution and phylogeny of their proteins of interest. The web-app accepts inputs in multiple formats: protein/domain sequences (FASTA/AccNum), homologous proteins (e.g., BLAST output, MSA), or motif/domain scans (e.g., InterProScan output). MolEvolvR returns detailed data about homologs along with dynamic graphical summaries such as multiple sequence alignment, phylogenetic trees, domain architectures, domain proximity networks, phyletic spreads, and co-occurrence patterns across lineages. Thus, MolEvolvR provides a powerful, easy-to-use interface for computationally characterizing proteins.
MolEvolvR: A web-app for characterizing proteins using molecular evolution and phylogeny
Joseph T Burke*, Samuel Z Chen*, Lo M Sosinski*,
John B Johnson, Janani Ravi. [*Co-primary]
bioRxiv 2022. doi:
https://doi.org/10.1101/2022.02.18.461833
; web-app:
http://jravilab.org/molevolvr
This page provides a summary of your analysis; please explore the other tabs to view the complete analysis
View your processed query/output data here.
View your processed query/output data here.
Select a row to see which lineages the domain architecture is present in
This page will provide you with instructions on how to use the web-app to its fullest potential. The video tutorials will demonstrate how to navigate the web-app after retrieving your results.
Characterization of proteins is crucial to understanding the molecular basis of fundamental cellular processes. Several computational tools have been separately developed to characterize protein sequence, structure, function, and phylogeny. We have created a workflow to allow users to analyze proteins of interest at all of these levels. Analyzing the function, evolution, and phylogeny of proteins together provides a better insight into the lineage, structure, and biology of homologous proteins.
For example, the Phage-shock proteins (Psp) were analyzed using the workflow present in this web-application. Homology, domain architecture, and phylogeny of these proteins (and more) were created, showing their prevalence in other organisms and detailing how variations of this phage shock stress response system are present across many lineages.
Users can input or upload a list of accession numbers or FASTA files for their proteins of interest to identify homologs, determine domain architectures, and delineate phyletic spreads. These analyses provide insight into the purpose of the protein(s) of interest within organisms as well as detail how they have evolved. This is done to assist with providing an overview of their importance to a particular biologial process, or survival of organisms themselves.
MolEvolvR's functionality is comprised of 4 different types of analysis:
To begin the full analysis, the accession numbers associated with your proteins of interest can be entered into the "Upload" tab. They should be given in a comma-separated list -- they may be copy/pasted or uploaded as a .csv
file. You then click on the "Generate FASTA" button and FASTA sequences are retrieved from the protein database at NCBI. Additionally, you may input FASTA sequences copy/pasted within the box provided or as a .fa
or .fasta
file from NCBI.
Alternatively, if the user already has a list of homologous proteins, they may enter the workflow post-homology search. Their protein data, in the form of accession numbers or FASTA sequences, is run through our molecular characterization and phylogenetic analysis processes. Additionally, the MolEvolvR web-app can use Clustal Omega, ClustalW, or MUSCLE to perform multiple sequence alignments on the proteins. If the user prefers to use a different alignment algorithm, they may enter their aligned FASTA Sequences that have been aligned using the algorithm of their choosing.
This option allows you to search for homologous proteins related to the specific domains found within your query proteins. It allows for a broader search that discovers remote homologs, which would otherwise not be detected by a standard search. Phylogenetic searches, domain architecture, and characterization is then done.
A user can start the analysis after conducting their own BLAST search and obtaining homologs. Combining various tools can then produce either phyletic spreads and/or domain architecture and characterization. Beginning with homologs, MolEvolvR can cluster the proteins by similarity and perform phylogenetic analyses, with the option to add on the domain architecture if desired.
Selecting this will only perform the phylogenetic analyses on the query protein(s) provided. No other analysis will be performed unless explicitly selected.
Selecting this option allows for only a homology search to be performed with the protein(s) of interest, unless otherwise specified.
The option only selects the analysis of the domain architecture for the query proteins provided. No other analysis will be performed unless explicitly selected.
By default, the web-app runs the full slew of analysis. The phylogeny, homology, and domain architecture analyses will all be performed with the given proteins (FASTA/AccNums).
DELTA-BLAST/PSI-BLAST + parameter options To begin the homolog search, a protein's FASTA sequence or accession number is given to the application.
If an accession number is given, the web-app will search for the corresponding FASTA sequence. The FASTA file is then run through either DELTA-BLAST or PSI-BLAST, which can be chosen by the user. Both are variations of BLASTP, with a slight difference to each other: PSI-BLAST runs a BLASTP search and creates a Position-Specific Scoring Matrix (PSSM), which is then used to search the BLAST databases for more matches; DELTA-BLAST also uses PSSMs, but first searches pre-constructed PSSMs and the CDD database. Once the BLAST homology search completes, MolEvolvR clusters the resulting homolog sequences with BLASTClust. Clustering homologs is based on sequence similarity (similar amino acids) and identity (exact matches of amino acids) to one another, placing proteins that are most similar to one another within a cluster. Cluster names are then added based on the most prominent domain in each group of proteins and the number of proteins in the cluster.
MolEvolvR allows for customization of the analyses on your proteins of interest. Several different approaches can be taken to fit your needs.
The full analysis allows the user to begin with accession numbers or FASTA files for protein(s) of interest. It then compiles a comprehensive set of homologs, which can then be used to determine evolution, phylogeny, and domain architectures of all homologs. Users have the option to perform only phylogenetic analysis or domain architecture if both are not required.
Additionally, users may load results from NCBI BLAST or InterProScan and begin the analysis at that stage. Web-BLAST results allow the user to determine homolog similarity, the domain architecture and/or phylogeny, whereas uploading InterProScan results allow for domain characterization and, if desired, phylogeny.
Users may also enter the workflow with data obtained from a previous BLAST run. This data can be run through BLASTClust to cluster similar sequences among the retrieved homologs. The phylogenetic analysis and domain architecture components can then be applied.
MolEvolvR allows you to input BLAST results that have been run externally on the BLAST web-server for your protein(s) of interest. To help us help you, below are instructions and useful parameters to modify prior to setting up your BLAST runs. Additionally, we'll help you with identifying the right format to download these results in to ensure compatibility with our web-app.
Here are some instructions on uploading BLASTP results to our "Upload" tab:
First, enter your Accession Number(s) or FASTA sequence(s) into the "Enter Query Sequence" box. Choose either proteins from the non-redundant database (nr
) or reference sequence collection (refseq_proteins
) and which algorithm to run. If you would like to further filter your results based on lineages (e.g., species, genus, family, kingdom), please enter the name/taxID accordingly and toggle the box for include/exclude those results in your search. 'BLASTP' is great for standard protein BLAST, if you know very little about your protein. If you are interested in identifying remote homologs, we suggest using the PSI-BLAST. DELTA-BLAST works very well if your protein has domains of interest. Creating a job title for the run is optional and for your personal use. If you are interested in determining remote homologs, performing an iterative search using PSI-BLAST would be best.
Choose the number of maximum target sequences to look up. Our recommendation is 5,000
total target sequences, or hits to ensure maximum inclusion of homologs. Next, go down to the PSI/PHI/DELTA-BLAST box at the bottom and choose your threshold value, or e-value. We recommend using 1e-5 (also written as 1x10-5
or 0.00001
). Double check your parameters in the box at the bottom of the page, then click the BLAST button.
Once the BLAST algorithm has finished running, there will be an option on the upper left of the screen to download results. Click on the "Download" button and select the "Hit Table (text)" option. Once these results are downloaded, you can directly upload these text files to the MolEvolvR web-app.
NCBI has updated the BLAST site to allow for a description table of your protein homologs to be downloaded, which we encourage you to look at. However, our app requires the HitTable for analysis to correctly and completely run.