Types of BLAST, applications, Algorithm & interpretation of results

Types of BLAST

Nucleotide BLAST (BLASTn): compares a nucleotide query (DNA/RNA) against a nucleotide sequence database. Used for searching gene transcripts, primers, SNPs, and other short DNA sequences.
protein BLAST (BLASTp): compares a protein query against a protein sequence database. Used for finding similar proteins, examining functional domains, and exploring protein evolution.
BLASTx: compares a nucleotide query translated in all reading frames against a protein sequence database. Used when a nucleotide sequence is suspected to code for a protein.
tBLASTn: compares a protein query against a translated nucleotide database in all reading frames. Used for finding genes in unfinished genomes based on protein homologies.
tBLASTx: translates the query nucleotide sequence in all reading frames and compares against the translated nucleotide database in all reading frames. Used for finding similarities among genes that may have frameshift sequencing errors.
Protein-protein BLAST (blastp): This program, given a protein query, returns the most similar protein sequences from the protein database that the user specifies.
Position-Specific Iterative BLAST (PSI- BLAST) (blastpgp): This program is used to find distant relatives of a protein.
Large numbers of query sequences (megablast): When comparing large numbers of input sequences via the command-line BLAST, “megablast” is much faster than running BLAST multiple times.

Applications of the different BLAST types

BLASTn – Used for finding homologous gene transcripts, detecting gene families, identifying promoters, confirming cDNA clones, and designing probes/primers.
BLASTp – Widely used for protein function prediction, domain identification, inferring orthologs/paralogs, and constructing phylogenetic trees. Also used to identify protein toxins, allergens, or antimicrobial peptides.
BLASTx – Useful when only a DNA sequence is available and protein homology needs to be checked. Allows putative protein translation products to be compared.
tBLASTn – Used to predict genes in unfinished genomes and for chromosome walking. Allows translated nucleotide databases to be searched using a protein query.
tBLASTx – Helpful for analyzing EST libraries, viral genomes, metagenomes, and comparisons between closely related species. Translation in all frames accommodates sequencing errors.
PSI-BLAST – Specialized version of BLASTp for iteratively searching a database using a profile of previously identified sequences. Useful for discovering distant evolutionary relationships.
PHI-BLAST – Searches a database with a pattern specified as a motif or regular expression. Used for identifying new members of a protein family.
RPS-BLAST – Searches protein databases with a profile defined in a CDD (Conserved Domain Database). Used to identify conserved domains in protein sequences.

Algorithm of BLAST

The blast algorithm is fast, accurate, and web-accessible. It is relatively faster than other sequence similarity search tools. Complex BLAST algorithm requires multiple steps and many parameters

The BLAST algorithm works in three main stages:

Seed generation – From the query sequence, BLAST selects small subsequences of fixed length called “words” or “seeds”. These act as starting points for finding sequence matches.
Extension – BLAST extends each seed word match in both directions along the query sequence as far as possible using a defined substitution matrix to score each extension. This generates High-scoring Segment Pairs (HSPs) representing a local alignment between the query and a database sequence.
Evaluation – The HSPs are evaluated based on statistical significance. BLAST uses statistics to calculate the expectation value (E-value) of each alignment score occurring by chance in a database search. Alignments with significant E-values are reported as hits.

Key parameters in BLAST algorithm:

Word size – Length of initial exact match seeds
Scoring matrix – Rewards/penalties for matches/mismatches
E-value threshold – Cut-off for reporting statistically significant hits

BLAST Steps overview

Specifying A Sequence Of Interest
Selecting BLAST Program
Selecting Database
Selecting Optional Parameters Selecting
Formatting Parameter

Interpretation of BLAST Results

Key aspects of analyzing BLAST results:

E-value – Lower E-value means more significant alignment. Generally, E-value < 0.001 is a good hit.
Bit score – Higher bit score indicates better alignment.
Query coverage – What percentage of the query sequence is covered by the alignment. Higher coverage is better.
Max identity – The highest identity between the query and the hit, given in percentage. Higher identity indicates closer match.
Alignments – Review the aligned sequences between the query and subject sequences for mismatches, gaps, and matched regions.
Subject description – The annotation of the database hit provides information on the putative identification and known function.

Types of BLAST, applications, Algorithm & interpretation of results

Types of BLAST

Applications of the different BLAST types

Algorithm of BLAST

BLAST Steps overview

Interpretation of BLAST Results

Leave a Reply Cancel reply

Check out these ...

Types of BLAST

Applications of the different BLAST types

Algorithm of BLAST

BLAST Steps overview

Interpretation of BLAST Results

Sign Up For Daily Newsletter

Our resources that will help you excel in your academics and research.

Leave a Reply Cancel reply