The Stanford Glossary may also be useful!
BLAST is a very rapid sequence searching method. The original BLAST did not allow gaps in the sequence matches. WU-BLAST and the current version of BLAST (BLAST2) overcome that problem.
PSI-BLAST is an enhancement in which searches are iterated, with a position specific scoring matrix. The matrix used in any iteration is computed based on significant alignments found in the previous iteration. Success depends on the quality of the matrix, which in turn depends on the homologous nature of the set of sequences which match the query above some BLAST E-value. Weighting is performed on the set of sequences used to generate the matrix according to Heinkoff D and Heinkoff JG 1994 J. Mol. Biol. 216:813-818, so that sequences with high similarities are weighted lower than the more divergent sequences.
See:
Goodman L 1997 More blast for the buck. Genome
Research. 7:858-859
Atschul AF et al. 1997 Gapped BLAST and PSI-BLAST: a new generation
of protein database search programs. Nucl. Acid Res.
25:3389-3402.
Blocks are multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins. Block Searcher, Get Blocks and Block Maker are aids to detection and verification of protein sequence homology.
The Blocks database is made automatically by looking for the most highly conserved regions in groups of proteins documented in the Prosite Database.
See: http://www.blocks.fhcrc.org/
Clusters of Orthologous Groups
Typically a COG database is built by pairwise comparisons of all proteins from a set of complete genomes. For each protein, the best hit (BeT) in each of the other genomes is identified. A COG is then defined by a triangular relationship of BeTs.
This database is used by BLASTing an unknown sequence against the set of all genomes in the COGs database, and looking for the case in which the unknown sequence has BeTs to more than one member of the COG.
See:
Tatusov RL, Koonin EV, and Lipman DJ. (1997) A genomic perspective
on protein families. Science. 278:631-637.
A program for finding coiled coils.
See:
Lupas, A (1996) Prediction and analysis of coiled coil
structures. Methods Enzymol 266:513-525.
Duplicated genes within a single organism with the same activity. Diversification of function during evolution of isologs will lead to paralogs.
Kyoto Encyclopedia of Genes and Genomes (KEGG) is an effort to computerize current knowledge of molecular and cellular biology in terms of the information pathways that consist of interacting molecules or genes and to provide links from the gene catalogs produced by genome sequencing projects.
Tutorial: From Pathway to Genes and Molecules
Homologously related sequences from different gemones with the same function. Strictly two genes are orthologs only if they had a common ancestral gene in the most recent common ancestral species. Defined by Fitch, W.M. (1970) Syst. Zool. 19:99-110
Paralogs are homologous (i.e. they have an evolutionary relationship). Two definitions have been used: 1) similar sequences with different functions which have arisen through duplication prior to diversification; 2) similar sequences in a single organism (in some cases these might better be named isologs). Originally the term was used to imply that functional differences would evolve after duplication. Without biochemical data, one cannot prove that the functions will be different so the word is often used in a loose sense to describe similar proteins thought not to be orthologs. Defined by Fitch, W.M. (1970) Syst. Zool. 19:99-110
A neural-network based program for predicting secondary structure
See:
Rost, B (1996) PHD: predicting one-dimensional protein structure by
profile-based neural networks. Methods Enzymology
266:525-39.
The ProDom protein domain database consists of an automatic compilation of homologous domains.
A technique for identifying landmarks every 100kb in the human genome used as part of the human genome mapping project. A RH map simply shows the order of a set of landmarks with distances between neighbours plus an indication of the level of support for the ordering.
A set of overheads on RH mapping is available from http://www.nhgri.nih.gov/COURSE99/Pdf/matise.pdf and a set of useful links is at http://linkage.rockefeller.edu/tara/rhmap/
System for Easy Analysis of Lots of Sequences
Designed for large-scale research projects in bioinformatics rapidly to implement standard sequence analysis protocols, design new investigations.
See:
Walker, DR, and Koonin, EV (1997) SEALS: A System for Easy Analysis
of Lots of Sequences. Intelligent Systems for Molecular Biology
5:333-339
A masking program used by BLAST to identify low-complexity regions. Runs automatically as part of BLAST but may be downloaded in a standalone version from ftp://ncbi.nlm.nih.gov/pub/seg/seg/ Note that the automatic BLAST version will only mask your probe sequence and not the database itself.
See:
Wootton, JC, and Federhen, S (1996) Analysis of compositionally
biased regions in sequence databases. Methods Enzymol
266:554-571.
A neural-network based program for finding signal peptides
See:
Nielsen H, Engelbrecht J, Brunak S, and von Heijne G (1997)
Identification of prokaryotic and eukaryotic signal peptides and
prediction of their cleavage sites. Protein Engineering
10:1-6. For a review of signal prediction methods, see:
Claros MG, Brunak S, and von Heijne G (1997) Prediction of
N-terminal protein sorting signals. Current Opinions in Structural
Biology 7:394-398.