Home   About Us   eMedicine Search   Drug Development   Feedback   Google Scholar Search   Intranet 
Literature Database   News   Photo Gallery   Publications   Site Map   Site Search   Useful Links 
 

 Back to  Bioinformatics

Enhanced by Neuroinformation

Bioinformatics Reviews: 2001

(175 References)

Achard, F., G. Vaysseix, et al. (2001). "XML, bioinformatics and data integration." Bioinformatics 17(2): 115-25.

            Motivation: The eXtensible Markup Language (XML) is an emerging standard for structuring documents, notably for the World Wide Web. In this paper, the authors present XML and examine its use as a data language for bioinformatics. In particular, XML is compared to other languages, and some of the potential uses of XML in bioinformatics applications are presented. The authors propose to adopt XML for data interchange between databases and other sources of data. Finally the discussion is illustrated by a test case of a pedigree data model in XML. Contact: Emmanuel.Barillot@infobiogen.fr

 

Adam, B. L., A. Vlahou, et al. (2001). "Proteomic approaches to biomarker discovery in prostate and bladder cancers." Proteomics 1(10): 1264-70.

            Proteomic technologies, including high resolution two-dimensional electrophoresis (2-DE), antibody/protein arrays, and advances in mass spectrometry (MS), are providing the tools needed to discover and identify disease associated biomarkers. Although application of these technologies to search for potential diagnostic/prognostic biomarkers associated with prostate and bladder cancer have been somewhat limited to date, proteins either overexpressed or underexpressed have been detected in both these urological cancers. Recent advances in mass spectrometry, especially platforms that permit rapid "fingerprint" profiling of multiple biomarkers, and tandem mass spectrometers for protein identification, will most assuredly enhance the discovery, identification, and characterization of potential cancer associated biomarkers. Furthermore, application of laser capture microdissection microscopes has provided a rapid and reproducible approach to procure pure populations of cells. This technology coupled to 2-DE and MS has significantly aided the elucidation of the differential expression profiles between disease, benign and normal prostate and bladder cell populations. Finally, development and application of learning algorithms and bioinformatics to the data generated by these proteomic technologies will be essential in determining the clinical potential of a protein biomarker. The purpose of this review is to provide the reader with an overview of the application of these technologies in the search and identification of potential diagnostic/prognostic biomarkers for prostate and bladder cancers.

 

Akutsu, T. (2001). "[Algorithms for inferring genetic networks]." Tanpakushitsu Kakusan Koso 46(16 Suppl): 2505-9.

           

Alix, A. J. (2001). "[A turning point in the knowledge of the structure-function-activity relations of elastin]." J Soc Biol 195(2): 181-93.

            In this review are presented the last new results of our research group dealing with the molecular structures (atomic level) of tropoelastin, elastin and elastin derived peptides studied by using essentially methods of bioinformatics (theoretical predictions and molecular modelling) linked to experimental circular dichroism spectroscopic studies. We already had characterized both the local secondary structure and some parts of the tertiary structure of the tropoelastin and elastin molecules (human, bovine...), by using either theoretical predictions (local secondary structure, linear epitopes...) and/or experimental data (optical spectroscopic methods: Raman scattering, infrared absorption, circular dichroism). Except the cross-linking regions which are in helical conformations, the whole tropoelastin structure displays a lot of beta-reverse turns which usually belong to irregular structures in proteins. These turns play a key role in other regularly structures orientation (alpha-helix, beta-strand), thus they are very important in the native protein 3D architecture. It is particularly true for human tropoelastin, because its sequence is rich in glycines and prolines, and these residues are frequently met in beta-turns (a beta-turn is made of four consecutive residues which are stabilized by an hydrogen bond). Several types of beta-turns can be defined with the dihedral angles values phi and psi of the two central residues. Thus, by using a very recent updated set of propensities for the amino acid residues to belong to given types of reverse beta-turns (extracted from a reference set of known 3-D structures of globular proteins), we have determined, (by using our home made software COUDES), for all possible tetrapeptides of the human tropoelastin sequence, the distribution and the characterization of the possible type of turns. Thus, it is shown that the locations and/or the types of these reverse beta-turns reveal a regularity and are not all random. This confirms our hypothesis that intra-molecular elasticity of tropoelastin could be explained by the possibility of transitions between conformations involving short beta-strands and beta-turns. This result is of great interest in the construction (by using molecular biology) of elastic biomaterials derived from the elastin sequence (particularly, the elastin derived peptides corresponding to the sequence exon 21--(exon 24--exon 24...). Our study permit also to predict the conformations of specific elastin derived peptides which could have interesting biological activity. Peptides resulting from the degradation of elastin, the insoluble polymer of tropoelastin and responsible for the elasticity of vertebrate tissues, can induce biological effects and notably the regulation of matrix metalloproteinases (MMP-s) activity. Recently, it was proposed that some elastin derived hexapeptides resulting from circular permutations of VGVAPG (a three fold repetition sequence in exon 24 of human tropoelastin) possess MMP-1 production and activation regulation properties. This effect depends on the presence of the tropoelastin specific membraneous receptor 67 KDa EBP (Elastin Binding Protein). Our results obtained by using both circular dichroism spectroscopy and linear predictions confirmed the hypothesis of a structure dependent mechanism with a possibly occurring type VIII beta-turn on the first four residues of the GXXPG sequence consensus which is only present among all active peptides. Thus, we have performed extensive molecular dynamics studies, in both implicit and explicit solvent, on these active and inactive elastin derived hexapeptides. Using our own analysis method of pattern recognition of the types of the beta-reverse-turns followed during the molecular dynamics trajectory, we found that active and inactive peptides effectively form two well distinct conformational groups in which active peptides preferentially adopt conformation close to type VIII GXXP (beta-reverse-turn. The structural role of the C terminal G residue could also be explained. Additional molecular simulations on (VGVAPG)2 and (VGVAPG)3 show the formation of two or three GXXP tetrapeptides adopting a structure close to type VIII beta-reverse-turn, suggesting a local conformational preference for this motif. This observation of a specific structural single and/or repeated motif is in agreement with the circular dichroism spectra of the involved (VGVAPG)1, (VGVAPG)2 and (VGVAPG)3 peptides and then it can be proposed that their biological activities have to be linear. The final aim of this type of work is to understand more about the sequence/structure/function/activity relationships of those structured peptides in order to propose specific sequences (corresponding to specific structures) for best biological activity results.

 

Andrade, M. A., C. Petosa, et al. (2001). "Comparison of ARM and HEAT protein repeats." J Mol Biol 309(1): 1-18.

            ARM and HEAT motifs are tandemly repeated sequences of approximately 50 amino acid residues that occur in a wide variety of eukaryotic proteins. An exhaustive search of sequence databases detected new family members and revealed that at least 1 in 500 eukaryotic protein sequences contain such repeats. It also rendered the similarity between ARM and HEAT repeats, believed to be evolutionarily related, readily apparent. All the proteins identified in the database searches could be clustered by sequence similarity into four groups: canonical ARM-repeat proteins and three groups of the more divergent HEAT-repeat proteins. This allowed us to build improved sequence profiles for the automatic detection of repeat motifs. Inspection of these profiles indicated that the individual repeat motifs of all four classes share a common set of seven highly conserved hydrophobic residues, which in proteins of known three-dimensional structure are buried within or between repeats. However, the motifs differ at several specific residue positions, suggesting important structural or functional differences among the classes. Our results illustrate that ARM and HEAT-repeat proteins, while having a common phylogenetic origin, have since diverged significantly. We discuss evolutionary scenarios that could account for the great diversity of repeats observed.

 

Baba, Y. (2001). "Development of novel biomedicine based on genome science." Eur J Pharm Sci 13(1): 3-4.

            Towards the post genomic sequencing era, conventional drug discovery is drastically improving genomic technologies and computational advances. The completion of the entire genome sequence of many experimental organisms as well as the human organism allow us to compare several genomic sequences, comparative genomics, to get valuable information for gene discovery and functional genomics. Pharmacogenomic studies and chemical genomic investigations are quickly becoming fundamental techniques for genomic drug discovery. Additionally, progress in microchip and microarray technology has been stimulating genomic drug discovery studies. This paper reviews recent progress in human genome research, basic elements in the new strategy for drug discovery based on genome science, and future perspectives for the bio and pharmaceutical industries.

 

Barbier-Brygoo, H., F. Gaymard, et al. (2001). "Strategies to identify transport systems in plants." Trends Plant Sci 6(12): 577-85.

            Since the first molecular structures of plant transporters were discovered over a decade ago, considerable advances have been made in the study of plant membrane transport, but we still do not understand transport regulation. The genes encoding the transport systems in the various cell membranes are still to be identified, as are the physiological roles of most transport systems. A wide variety of complementary strategies are now available to study transport systems in plants, including forward and reverse genetics, proteomics, and in silico exploitation of the huge amount of information contained in the completely known genomic sequence of Arabidopsis.

 

Barratt, M. D. and R. A. Rodford (2001). "The computational prediction of toxicity." Curr Opin Chem Biol 5(4): 383-8.

            Recent developments in the prediction of toxicity from chemical structure have been reviewed. Attention has been drawn to some of the problems that can be encountered in the area of predictive toxicology, including the need for a multi-disciplinary approach and the need to address mechanisms of action. Progress has been hampered by the sparseness of good quality toxicological data. Perhaps too much effort has been devoted to exploring new statistical methods rather than to the creation of data sets for hitherto uninvestigated toxicological endpoints and/or classes of chemicals.

 

Bartlett, J. (2001). "Technology evaluation: SAGE, Genzyme molecular oncology." Curr Opin Mol Ther 3(1): 85-96.

            Genzyme Molecular Oncology (GMO) is using its SAGE (Serial Analysis of Gene Expression) combinatorial chemistry technology to screen compound libraries. SAGE is a high-throughput, high-efficiency method to simultaneously detect and measure the expression levels of genes expressed in a cell at a given time, including rare genes. SAGE can be used in a wide variety of applications to identify disease-related genes, to analyze the effect of drugs on tissues and to provide insights into disease pathways. It works by isolating short fragments of genetic information from the expressed genes that are present in the cell being studied. These short sequences, called SAGE tags, are linked together for efficient sequencing. The sequence data are then analyzed to identify each gene expressed in the cell and the levels at which each gene is expressed. This information forms a library that can be used to analyze the differences in gene expression between cells [293437]. By December 1999, GMO had identified a set of 40 genes from 3.5 million transcripts that were expressed at elevated levels in all cancer tissue but not seen in normal tissue. The company hope these may provide diagnostic markers or therapeutic targets. The studies also provided data furthering the understanding of the way cells use their genome [349968]. GMO has signed a collaborative agreement with the National Cancer Institute (NCI) to search for new drug candidates in the field of cancer chemotherapy. The collaboration combines GMO's SAGE technology with the NCI's extensive array of 60 cell-based cancer screens. Under the agreement, the NCI will evaluate Genzyme's library consisting of one million compounds against selected cancer screens to identify compounds with anticancer properties [255082]. Xenometrix granted a license agreement for gene expression profiling to GMO in February 1999, giving company access to claims covered in issued US and European patents. The license is non-exclusive and covers the collection of gene expression profiles utilizing all methods including high-density microarrays [315329]. Ontogeny (now Curis Inc) and GMO have entered into a collaboration to study genes for the potential discovery of therapeutic products. GMO will use its SAGE technology to produce libraries of RNA supplied by Ontogeny. The libraries will be put through Ontogeny's screening program [279417]. Under an agreement made in August 1998, Bayer will use SAGE technology to identify genes and thus potential therapeutics [317452]. GMO and Hexagen signed an agreement in March 1998 on the use of SAGE technology in Hexagen's disease gene discovery programs. The first phase of the collaboration will focus on the use of SAGE in studies within Hexagen's type II diabetes gene discovery program. Hexagen has designed these studies to discover susceptibility genes for diabetes and to provide gene expression information for genes associated with type II diabetes [280012]. GMO signed a five-year agreement with Johns Hopkins University School of Medicine (JHU) in July 1997 for research leading to the identification of cancer-related genes. Under the terms of the agreement, JHU researchers will use the SAGE technology to identify and analyze gene expression in cancer. The power of SAGE in finding rare genes was confirmed in a study of gastrointestinal cancer by JHU researchers published in the May 27, 1997 issue of Science. The study showed that of almost 50,000 genes expressed in normal gastrointestinal cells and gastrointestinal tumor cells, 86% of the genes were present at five or fewer copies per cell. Only 51% of those low-abundancy genes were recorded in the GenBank database of known genes in the human genome [257128].

 

Baxter, S. M. and J. S. Fetrow (2001). "Sequence- and structure-based protein function prediction from genomic information." Curr Opin Drug Discov Devel 4(3): 291-5.

            Existing functional annotation transfer is fraught with inaccuracies that may hinder forward interpretation and mining of genomic data. Hand-curation of the annotation placed into databases is not practical. In lieu of experimental evidence, computational biological approaches offer high-throughput tools to predict function accurately; however, these methods are still notably deficient in defining and describing the complexity of protein function. Enriching genomic sequences obtained from sequencing efforts and expression array methods with protein function information and classification will be an efficient first step for incorporating genomic data into drug discovery programs.

 

Bonneau, R., J. Tsai, et al. (2001). "Functional inferences from blind ab initio protein structure predictions." J Struct Biol 134(2-3): 186-90.

            Ab initio protein structure prediction methods have improved dramatically in the past several years. Because these methods require only the sequence of the protein of interest, they are potentially applicable to the open reading frames in the many organisms whose sequences have been and will be determined. Ab initio methods cannot currently produce models of high enough resolution for use in rational drug design, but there is an exciting potential for using the methods for functional annotation of protein sequences on a genomic scale. Here we illustrate how functional insights can be obtained from low-resolution predicted structures using examples from blind ab initio structure predictions from the third and fourth critical assessment of structure prediction (CASP3, CASP4) experiments.

 

Bornholdt, S. (2001). "Modeling genetic networks and their evolution: a complex dynamical systems perspective." Biol Chem 382(9): 1289-99.

            After finishing the sequence of the human genome, a functional understanding of genome dynamics is the next major step on the agenda of the biosciences. New approaches, such as microarray techniques, and new methods of bioinformatics provide powerful tools aiming in this direction. In the last few years, important parts of genome organization and dynamics in a number of model organisms have been determined. However, an integrated view of gene regulation on a genomic scale is still lacking. Here, genome function is discussed from a complex dynamical systems perspective: which dynamical properties can a large genomic system exhibit in principle, given the local mechanisms governing the small subsystems that we know today? Models of artificial genetic networks are used to explore dynamical principles and possible emergent dynamical phenomena in networks of genetic switches. One observes evolution of robustness and dynamical self-organization in large networks of artificial regulators that are based on the dynamic mechanism of transcriptional regulators as observed in biological gene regulation. Possible biological observables and ways of experimental testing of global phenomena in genome function and dynamics are discussed. Models of artificial genetic networks provide a tool to address questions in genome dynamics and their evolution and allow simulation studies in evolutionary genomics.

 

Brazma, A. and J. Vilo (2001). "Gene expression data analysis." Microbes Infect 3(10): 823-9.

            Microarrays are one of the latest breakthroughs in experimental molecular biology, which allow monitoring of gene expression for tens of thousands of genes in parallel and are already producing huge amounts of valuable data. Analysis and handling of such data is becoming one of the major bottlenecks in the utilization of the technology. The raw microarray data are images, which have to be transformed into gene expression matrices, tables where rows represent genes, columns represent various samples such as tissues or experimental conditions, and numbers in each cell characterize the expression level of the particular gene in the particular sample. These matrices have to be analyzed further if any knowledge about the underlying biological processes is to be extracted. In this paper we concentrate on discussing bioinformatics methods used for such analysis. We briefly discuss supervised and unsupervised data analysis and its applications, such as predicting gene function classes and cancer classification as well as some possible future directions.

 

Brenner, S. E. (2001). "A tour of structural genomics." Nat Rev Genet 2(10): 801-9.

            Structural genomics projects aim to provide an experimental or computational three-dimensional model structure for all of the tractable macromolecules that are encoded by complete genomes. To this end, pilot centres worldwide are now exploring the feasibility of large-scale structure determination. Their experimental structures and computational models are expected to yield insight into the molecular function and mechanism of thousands of proteins. The pervasiveness of this information is likely to change the use of structure in molecular biology and biochemistry.

 

Brizuela, L., P. Braun, et al. (2001). "FLEXGene repository: from sequenced genomes to gene repositories for high-throughput functional biology and proteomics." Mol Biochem Parasitol 118(2): 155-65.

            The vast amount of information generated by the human genome sequencing project and related projects has given rise to a new paradigm in experimental biology. This new paradigm invokes the experimentation and data analysis at genome-wide scales, as well as the generation of new technologies and resources that take full advantage of the available sequence information. The Institute of Proteomics at Harvard Medical School is building a comprehensive, characterized, arrayed and flexible gene repository that will allow full exploitation of the genomic information by enabling functional genomics as well as protein expression, purification and analysis at genome wide scale. The FLEXGene repository (Full Length EXpression-ready) will contain clones representing the complete set of open reading frames (ORFs) of different organisms including H. sapiens and several pathogens and model organisms. The clones are constructed using recombination-based cloning technology so that hundreds or thousands of coding regions can be transferred into any expression vector in a parallel and timely mode, allowing the broadest variety of experiments to be carried out.

 

Brookes, A. J. (2001). "Rethinking genetic strategies to study complex diseases." Trends Mol Med 7(11): 512-6.

            Understanding the genetic basis of complex diseases is turning out to be difficult, prompting a widespread (re-)evaluation of the relevant issues. 'Forward' and 'reverse' genetics strategies have been applied arguably in a manner only suitable for much simpler diseases. It would now be beneficial to pay detailed attention to experimental design, and to increase study scales dramatically. Ultimately, this would lead to completely hypothesis-free, truly comprehensive, multi-platform investigations. Such studies would maximize the chances of finding data patterns indicative of real etiology, although many aspects of complex disease causation might simply be too intricate and inconsistent to ever be deciphered. Therefore, considerable technology development is an immediate priority, along with parallel advances in bioinformatics and biostatistics systems aimed at discriminating between marginal signals and background noise within extremely large, diverse and complex data sets. Community standards and open data sharing will be essential ingredients for success in this exciting 21st-century challenge.

 

Brosch, R., A. S. Pym, et al. (2001). "The evolution of mycobacterial pathogenicity: clues from comparative genomics." Trends Microbiol 9(9): 452-8.

            Comparative genomics, and related technologies, are helping to unravel the molecular basis of the pathogenesis, host range, evolution and phenotypic differences of the slow-growing mycobacteria. In the highly conserved Mycobacterium tuberculosis complex, where single-nucleotide polymorphisms are rare, insertion and deletion events (InDels) are the principal source of genome plasticity. InDels result from recombinational or insertion sequence (IS)-mediated events, expansion of repetitive DNA sequences, or replication errors based on repetitive motifs that remove blocks of genes or contract coding sequences. Comparative genomic analyses also suggest that loss of genes is part of the ongoing evolution of the slow-growing mycobacterial pathogens and might also explain how the vaccine strain BCG became attenuated.

 

Califano, A. (2001). "Advances in sequence analysis." Curr Opin Struct Biol 11(3): 330-3.

            In its early days, the entire field of computational biology revolved almost entirely around biological sequence analysis. Over the past few years, however, a number of new non-sequence-based areas of investigation have become mainstream, from the analysis of gene expression data from microarrays, to whole-genome association discovery, and to the reverse engineering of gene regulatory pathways. Nonetheless, with the completion of private and public efforts to map the human genome, as well as those of other organisms, sequence data continue to be a veritable mother lode of valuable biological information that can be mined in a variety of contexts. Furthermore, the integration of sequence data with a variety of alternative information is providing valuable and fundamentally new insight into biological processes, as well as an array of new computational methodologies for the analysis of biological data.

 

Cho, Y. and V. Walbot (2001). "Computational methods for gene annotation: the Arabidopsis genome." Curr Opin Biotechnol 12(2): 126-30.

            Since the structure of the DNA molecule was identified half a century ago, the complete genome sequence has been determined for 37 prokaryotes and several eukaryotes. With the exponential growth of genetic information, bioinformatics has attempted to predict gene locations and functions in cyberspace prior to experimental confirmation at the bench.

 

Claverie, J. M. (2001). "[Transcriptome analysis in cancerology: bioinformatics aspects]." Bull Cancer 88(3): 269-76.

            Recent technological advances (e.g. various DNA arrays and chips) allow the measurement of expression level (mRNA abundance) for thousand of genes simultaneously, over multiple conditions or time. Initially developed and tested on model systems such as yeast or in vitro cell line cultures, these techniques have recently begun to be applied to the analysis of human cancers. Initial results are promising, and large-scale gene expression profiling is now expected to become a clinical tool for better tumour identification, prognosis, and optimal treatment design. It is thus important that clinicians become familiar with the theoretical principles underlying the interpretation of gene expression profiles as used in three different contexts: gene discovery, tumour class prediction, and molecular diagnosis. This is the purpose of the present article.

 

Coppel, R. L. (2001). "Bioinformatics and the malaria genome: facilitating access and exploitation of sequence information." Mol Biochem Parasitol 118(2): 139-45.

            The torrent of sequence information unleashed by the various genome sequencing projects, including that of Plasmodium falciparum, will lead to an unprecedented increase in the data available for research purposes. The scientific community is struggling to develop ways to assimilate this information and ensure that it is fully analysed in a way that enables rapid development of new therapeutic and diagnostic advances. This is particularly so for the field of tropical medicine where many of the scientists have had limited training in the area of Bioinformatics and may be further hampered by poor access to the sequence data. A number of collections of malaria genome sequence are available, each with their own advantages and disadvantages, however further improvements in these information resources are needed. In particular, there would be great benefit in integrating genomic sequence and functional genomics results with the large amount of pre-existing knowledge related to parasite biology and immunological interactions with the host. Attempts to achieve this include the PlasmoDB database, and the lessons learned in this effort could be of great utility to other organism-specific databases.

 

Cowman, A. F. (2001). "Functional analysis of drug resistance in Plasmodium falciparum in the post-genomic era." Int J Parasitol 31(9): 871-8.

            Malaria has plagued humans throughout recorded history and results in the death of over 2 million people per year. The protozoan parasite Plasmodium falciparum causes the most severe form of malaria in humans. Chemotherapy has become one of the major control strategies for this parasite; however, the development of drug resistance to virtually all of the currently available drugs is causing a crisis in the use and deployment of these compounds for prophylaxis and treatment of this disease. The genome sequence of P. falciparum is providing the informational base for the use of whole-genome strategies such as bioinformatics, microarrays and genetic mapping. These approaches, together with the availability of a high-resolution genome linkage map consisting of hundreds of microsatellite markers and the advanced technologies of transfection and proteomics, will facilitate an integrated approach to address important biological questions. In this review we will discuss strategies to identify novel genes involved in the molecular mechanisms used by the parasite to circumvent the lethal effect of current chemotherapeutic agents.

 

Dahl, S. G., O. Edvardsen, et al. (2001). "Bioinformatics and receptor mechanisms of psychotropic drugs." Biotechnol Annu Rev 7: 165-77.

            One important aspect in biotechnology is gene discovery and target validation for drug discovery. Information from the human genome (HUGO) project may be used to deduce the amino acid sequence of all proteins produced in the human body. However, knowing the amino acid sequence of a protein is not the same as knowing its function. Identification of novel molecular targets for discovery of new, safer and more efficient therapeutic drugs from the human genome sequences requires multidisciplinary research efforts, including proteomics, structural biology and bioinformatics. In addition to possible effects on gene expression, most of the currently used therapeutic drugs either have enzymes or membrane proteins as their molecular targets of action. These membrane proteins include transporters of small molecules across cell membranes, ion channels, or receptors that convey signals from one side of a membrane to the other. Our research group as well as others have used computational techniques, along with biotechnology, molecular biology and other experimental techniques, to construct detailed 3-dimensional models of transporter proteins and G-protein coupled receptors (GPCRs), which are the molecular targets of action of psychotropic drugs. The models have been used to simulate the molecular dynamics and study the ligand binding and signal transduction mechanisms of these receptors. The use of bioinformatics, as exemplified in our modelling of GPCRs, is only one of the key factors for success in post-genomic research for new targets for therapeutic drugs.

 

Danielsen, M. (2001). "Bioinformatics of nuclear receptors." Methods Mol Biol 176: 3-22.

           

Davidson, D. and R. Baldock (2001). "Bioinformatics beyond sequence: mapping gene function in the embryo." Nat Rev Genet 2(6): 409-17.

            The spatio-temporal expression pattern of a gene during development is a valuable piece of information. But there is no way to compare precisely the patterns of expression of different genes, or the way the patterns are changed in a mutant. One way to solve this problem is to construct digital reference images of development (a bioinformatics framework), to which expression patterns can be mapped and stored, then compared. Such frameworks are under active development in several model systems. They will form the basis of powerful and integrated gene expression databases, which facilitate comparisons between genes, tissues and species.

 

Davis, D. R., J. B. McAlpine, et al. (2001). "Enterococcus faecalis multi-drug resistance transporters: application for antibiotic discovery." J Mol Microbiol Biotechnol 3(2): 179-84.

            Using bioinformatics approaches, 34 potential multidrug resistance (MDR) transporter sequences representing 4 different transporter families were identified in the unannotated Enterococcus faecalis database (TIGR). A functional genomics campaign generating single-gene insertional disruptions revealed several genes whose absence confers significant hypersensitivities to known antimicrobials. We constructed specific strains, disrupted in a variety of previously unpublished, putative MDR transporter genes, as tools to improve the success of whole-cell antimicrobial screening and discovery. Each of the potential transporters was inactivated at the gene level and then phenotypically characterized, both with single disruption mutants and with 2-gene mutants built upon a delta norA deleted strain background.

 

De Groot, A. S., A. Bosma, et al. (2001). "From genome to vaccine: in silico predictions, ex vivo verification." Vaccine 19(31): 4385-95.

            Bioinformatics tools enable researchers to move rapidly from genome sequence to vaccine design. EpiMer and EpiMatrix are computer-driven pattern-matching algorithms that identify T cell epitopes. Conservatrix, BlastiMer, and Patent-Blast permit the analysis of protein sequences for highly conserved regions, for homology with other known proteins, and for homology with previously patented epitopes, respectively. Two applications of these tools to epitope-driven vaccine design are described in this review. Using Conservatrix and EpiMatrix, we analyzed more than 10000 HIV-1 sequences and identified peptides that were potentially immunostimulatory and highly conserved across HIV-1 clades. MHC binding assays and CTL assays have been carried out: 50 (69%) of the 72 candidate epitopes bound in assays with cell lines expressing the corresponding MHC molecule; 15 of the 24 B7 peptides (63%) stimulated gamma-interferon release in ELISpot assays. These results lend support to the bioinformatics approach to selecting novel, conserved, HIV-1 CTL epitopes. EpiMatrix was also applied to the entire 'proteome' derived from two Mycobacterium tuberculosis (Mtb) genomes. Using EpiMatrix, BlastiMer, and Patent-Blast, we narrowed the list of putative Mtb epitopes to be tested in vitro from 1600000 to 3000, a 99.8% reduction. The pace of vaccine design will accelerate when these and other bioinformatics tools are systematically applied to whole genomes and used in combination with in vitro methods for screening and confirming epitopes.

 

De Luca, V. and P. Laflamme (2001). "The expanding universe of alkaloid biosynthesis." Curr Opin Plant Biol 4(3): 225-33.

            Characterization of many of the major gene families responsible for the generation of central intermediates and for their decoration, together with the development of large genomics and proteomics databases, has revolutionized our capability to identify exotic and interesting natural-product pathways. Over the next few years, these tools will facilitate dramatic advances in our knowledge of the biosynthesis of alkaloids, which will far surpass that which we have learned in the past 50 years. These tools will also be exploited for the rapid characterization of regulatory genes, which control the development of specialized cell factories for alkaloid biosynthesis.

 

de Vos, W. M. (2001). "Advances in genomics for microbial food fermentations and safety." Curr Opin Biotechnol 12(5): 493-8.

            The exponentially growing collection of genomic sequence information, the high-throughput analysis of expression products, and the ability to order this information using advanced bioinformatics are expected to affect biotechnology and life sciences in a profound and unprecedented way. These developments offer many possibilities to improve the functionality of fermentations by food-grade microorganisms and to increase the microbial safety of foods. It will be necessary to combine functional studies with comparative genomics approaches to provide effective strategies for improving the functionality and safety of foods.

 

Degrave, W. M., S. Melville, et al. (2001). "Parasite genome initiatives." Int J Parasitol 31(5-6): 532-6.

            During 1993-1994, scientists from developing and developed countries planned and initiated a number of parasite genome projects and several consortiums for the mapping and sequencing of these medium-sized genomes were established, often based on already ongoing scientific collaborations. Financial and other support came from WHO/TDR, Wellcome Trust and other funding agencies. Thus, the genomes of Plasmodium falciparum, Schistosoma mansoni, Trypanosoma cruzi, Leishmania major, Trypanosoma brucei, Brugia malayi and other pathogenic nematodes are now under study. From an initial phase of network formation, mapping efforts and resource building (EST, GSS, phage, cosmid, BAC and YAC library constructions), sequencing was initiated in gene discovery projects but soon also on a small chromosome, and now on a fully fledged genome scale. Proteomics, functional analysis, genetic manipulation and microarray analysis are ongoing to different degrees in the respective genome initiatives, and as the funding for the whole genome sequencing becomes secured, most of the participating laboratories, apart from larger sequencing centres, become oriented to post-genomics. Bioinformatics networks are being expanded, including in developing countries, for data mining, annotation and in-depth analysis.

 

Dhiman, N., R. Bonilla, et al. (2001). "Gene expression microarrays: a 21st century tool for directed vaccine design." Vaccine 20(1-2): 22-30.

            DNA microarray technology is a new and powerful tool that allows the simultaneous analysis of a large number of nucleic acid hybridization experiments in a rapid and efficient fashion. The development of the DNA microarray chip has been driven by modern techniques of microelectronic fabrication, miniaturization and integration to produce what is referred to as "laboratory-on-chip" devices. The application of DNA chip technology includes the comprehensive analysis of multiple gene mutations and expressed sequences with regard to newer drug designs, host-pathogen interactions and the design of new vaccines. An advantage of microarray technology is that it can assist researchers to better define and understand the expression profile of a given genotype associated with disease, adverse effects from exposure to certain stimuli, or the ability to understand or predict immune responses to specific antigens. This paper briefly reviews DNA microarray technology and its implications with special reference to vaccine design. The technical aspects comprising array manufacturing and design, array hybridization, formatting, scanning and data handling are also briefly discussed.

 

Dixon, D. M. (2001). "US-Japan workshops in medical mycology: past, present and future." Nippon Ishinkin Gakkai Zasshi 42(2): 75-80.

            The Extramural Mycology Program of the National Institutes of Health (NIH), National Institute of Allergy and Infectious Diseases (NIAID) has organized and implemented a five workshop series in medical mycology during a critical period in the evolution of contemporary medical mycology (1992 to 2000; http://www.niaid.nih.gov/research/dmid.htm). The goals of the workshop series were to: initiate interactions; build collaborations; identify research needs; turn needs into opportunities; stimulate molecular research in medical mycology; and summarize recommendations emerging from the workshop proceedings. A recurring recommendation in the series was to foster communications within and beyond the field of medical mycology. US-Japan interactions were noted as one specific example of potential information exchange for mutual benefit. The first formal action directed at this recommendation was the workshop Emergence and Recognition of Fungal Diseases convened under the auspices of the US-Japan Cooperative Medical Science Program (USJCMSP; http://www.niaid. nih.gov/dmid/us%5Fjapan/default.htm) in Bethesda, Maryland USA on 30 June 1999 (D.M. Dixon & T. Matsumoto, co-chairs). A major goal of the workshop was to present contemporary medical mycology to the Joint Committee of the USJCMSP through representative research presentations in order to make the Committee aware of current status in the field, and the potential for scientific interactions. The second formal action is the workshop, under the auspices of the Japanese Society for Medical Mycology Medical Perspectives of Fungal Genome Studies scheduled for 28 November 2000 in Tokyo, Japan (T. Matsumoto & D.M. Dixon, co-chairs). The NIAID Mycology Workshop series recommended interactions between the following groups: academic and pharmaceutical; medical and molecular (model systems); medical and plant pathogens; basic and clinical; mycologists and immunologists. The first two US-Japan workshops can be viewed as consistent with these recommendations, and serve as a Western/Eastern gateway for exchange. The focus of the second US-Japan workshop on genome projects for the medically important fungi provides an excellent model for international communications. Given the tsunami of information that is flowing from genomics and bioinformatics, it is clear that global interactions will be essential in managing and interpreting the data.

 

Edwards, Y. J. and A. Cottage (2001). "Prediction of protein structure and function by using bioinformatics." Methods Mol Biol 175: 341-75.

           

Edwards, Y. J. and S. M. Brocklehurst (2001). "Finding genes in genomic nucleotide sequences by using bioinformatics." Methods Mol Biol 175: 235-47.

           

Emanuelsson, O. and G. von Heijne (2001). "Prediction of organellar targeting signals." Biochim Biophys Acta 1541(1-2): 114-9.

            The subcellular location of a protein is an important characteristic with functional implications, and hence the problem of predicting subcellular localization from the amino acid sequence has received a fair amount of attention from the bioinformatics community. This review attempts to summarize the present state of the art in the field.

 

Engels, M. F. and P. Venkatarangan (2001). "Smart screening: approaches to efficient HTS." Curr Opin Drug Discov Devel 4(3): 275-83.

            Faced with the prospect of a rising number of potential drug targets and given almost unlimited access to internal and external chemistry resources, the 'brute-force' approach to high-throughput screening (HTS) is becoming increasingly unattractive. Pharmaceutical companies realize that they have both to increase the scope of screening experiments and improve on the efficiency of the screening process per se. In acknowledging this development, hybrid screening strategies have been suggested that unite in silico and in vitro screening in one integrated process. The partnering of both screening approaches in one process is believed to exploit the potential of HTS in much smarter and more cost-efficient ways. This review will describe some recent applications of this new screening paradigm and discuss the impact of integrating these novel strategies in the drug discovery process.

 

Fagerlund, T. H. and O. Braaten (2001). "No pain relief from codeine...? An introduction to pharmacogenomics." Acta Anaesthesiol Scand 45(2): 140-9.

            Drug treatment remains a mainstay of medicine. In some situations a drug unexpectedly has no effect, or unforeseen serious side effects occur. For the patient this represents a dangerous and potentially life-threatening situation. It certainly is a distressing experience for the doctor. At the societal level, adverse drug reactions represent a leading cause of disease and death. Genetic variation often underlies these unexpected situations. Pharmacogenetics is the term used about genetically determined variability in the metabolism of drugs. Pharmacogenomics usually refers to drug discovery based on knowledge of genes, but it is a discipline that offers insight into aetiologic mechanisms, and possible prevention and treatment. There is a trend towards a definition of pharmacogenomics that includes both pharmacogenetics and pharmacogenomics as defined above. Our article is an introduction to pharmacogenomics, using the broader definition. Biotechnological methods cannot be understood without a grasp of basic medical genetics, and we provide a brush-up on the fundamentals. We then outline pharmacogenetics, giving examples of genetically based variation in drug metabolising enzymes, drug receptors and drug transporting proteins. Modern biotechnology would be unthinkable without the aid of computers, and we briefly touch upon the field of bioinformatics. Finally, we give an overview of pharmacogenomics in the narrower sense. The rapidly growing field of pharmacogenomics is going to influence our everyday practice of medicine in the immediate future.

 

Fenselau, C. and P. A. Demirev (2001). "Characterization of intact microorganisms by MALDI mass spectrometry." Mass Spectrom Rev 20(4): 157-71.

            The application of MALDI mass spectrometry to desorb protein biomarkers from intact viruses, bacteria, fungus, and spores is the focus of this review. Instrumentation, sample collection, sample preparation, and algorithms for data analysis are summarized. Optimally these analyses should be carried out in less than five minutes. Successful applications are discussed from biotechnology, cell biology, and the pharmaceutical industry.

 

Fey, S. J. and P. M. Larsen (2001). "2D or not 2D. Two-dimensional gel electrophoresis." Curr Opin Chem Biol 5(1): 26-33.

            2D gel electrophoresis is the technology that everyone loves to hate-it requires manual dexterity and precision to reproduce precisely and is thus not well-suited as a high-throughput technology. Although almost everyone would like to replace it, the resolution and sensitivity it offers are exquisite and unsurpassed if one wants a global view of cellular activity. There have been several recent developments, for example, the detection of low abundance proteins, and the resolution possible with narrow-range IPG gels.

 

Foster, J. A. (2001). "Evolutionary computation." Nat Rev Genet 2(6): 428-36.

            Evolution does not require DNA, or even living organisms. In computer science, the field known as 'evolutionary computation' uses evolution as an algorithmic tool, implementing random variation, reproduction and selection by altering and moving data within a computer. This harnesses the power of evolution as an alternative to the more traditional ways to design software or hardware. Research into evolutionary computation should be of interest to geneticists, as evolved programs often reveal properties - such as robustness and non-expressed DNA - that are analogous to many biological phenomena.

 

Frantz, G. D., T. Q. Pham, et al. (2001). "Detection of novel gene expression in paraffin-embedded tissues by isotopic in situ hybridization in tissue microarrays." J Pathol 195(1): 87-96.

            Correlating altered gene expression patterns with particular disease states is a critical step in understanding disease processes and developing treatment strategies. Many thousands of novel gene sequences have recently been annotated in public and private databases and are now available for analysis. Tissue-specific expression patterns of these sequences can be evaluated physically on DNA arrays and other high throughput assays, or virtually by bioinformatics mining of expressed sequence tag (EST) databases. As a secondary screening tool, in situ hybridisation (ISH) not only confirms tissue specificity, but also reveals what is often valuable information about cell-type expression patterns of nov16l sequences. Due to their availability and long-term stability at room temperature, formalin-fixed paraffin-embedded clinical specimens provide an invaluable resource for evaluating expression patterns of novel human genes. We describe a high-throughput approach for identifying and quantifying the expression of novel genes in paraffin-embedded human tissues using isotopic in situ hybridisation and tissue microarrays (TMA).

 

Gedeck, P. and P. Willett (2001). "Visual and computational analysis of structure--activity relationships in high-throughput screening data." Curr Opin Chem Biol 5(4): 389-95.

            Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. Recent work in visualisation and data mining has been used to develop structure--activity relationships from such chemical-biological datasets.

 

Gentle, C. R., W. Z. Golinski, et al. (2001). "Computational studies of 'whiplash' injuries." Proc Inst Mech Eng [H] 215(2): 181-9.

            The term 'whiplash' was initially used to describe injuries to the neck caused by the head being forced backwards during a rear-end collision in cars without head restraints. The addition of head restraints in the 1970s was expected to solve this problem by preventing excessive extension of the neck but experience suggests the problem still exists. This paper reviews available experimental studies of whiplash and uses the data to construct a finite element model which is capable of dynamically simulating whiplash collisions and predicting the forces in all the relevant neck ligaments. For the first time, it is shown that trauma occurs long before the head hits the head restraint as a result of displacement between the head and the torso caused by the head's inertia leading to markedly different acceleration histories. It is concluded that experimental and computational studies must be used together to produce progress in biomechanical studies.

 

Glassbrook, N. and J. Ryals (2001). "A systematic approach to biochemical profiling." Curr Opin Plant Biol 4(3): 186-90.

            Sequencing of the Arabidopsis thaliana genome is complete. The analytical tools for determining gene function by altering and monitoring gene expression are relatively well developed, and are generating large volumes of valuable data. Recent advances in techniques for the analysis of small molecules allow researchers to apply biochemical profiling as another powerful approach to functional genomics and metabolic research.

 

Glynne, R. J. and S. R. Watson (2001). "The immune system and gene expression microarrays--new answers to old questions." J Pathol 195(1): 20-30.

            The recent increase in availability of gene expression technologies has the potential to dramatically expand our understanding of cellular immunology in molecular detail. Expression levels of tens of thousands of genes can be measured in dozens of samples in only a few days, and this data can be integrated with sequence informatics to tentatively assign some (limited) functional information to a majority of these genes. In this review we discuss some initial applications of these new tools to the fields of lymphocyte and monocyte differentiation pathways, the tolerance or immunity decision process, and B cell transformation. These examples illustrate the power of unbiased, 'wide-net', approaches both to drive immunological research in previously unexpected directions and to confirm classic tenets of immunology.

 

Goldgar, D. E. (2001). "Major strengths and weaknesses of model-free methods." Adv Genet 42: 241-51.

            This chapter discusses some of the principal advantages and disadvantages inherent in the use of model-free (MF) methods. The principal advantage is that one does not need to specify, a priori, a genetic model for the trait of interest, which often is not known for many complex phenotypes of interest. On the other hand, as with all nonparametric approaches, use of model-free methods results in reduced power for detection of linkage compared with model-based methods when the model is correctly specified. The MF methods also have a potential for computational simplicity and are ideally suited for analysis of specific relative sets such as affected sibpairs. The MF methods are ideally suited to the analysis of quantitative traits for which finding and implementing a suitable genetic model for use in a parametric linkage analysis may be cumbersome. On the other hand, for discrete traits, most model-free methods allow for only a simple definition of "affected," making it difficult to consider such factors as age at onset, diagnostic accuracy of phenotype, or sex-specific disease risks. A factor that can be viewed as both a strength and weakness of MF methods is the large number of statistical approaches and implementation options of model-free methods; while providing a number of choices for the more sophisticated users, such variety also may lead to the risk of overanalysis of the data by selecting the approach that gives the desired result. In the end, the choice between model-free and model-based methods will largely depend on the nature of the phenotype under study and the existing knowledge base about its underlying mode of inheritance.

 

Golub, T. R. (2001). "Genomic approaches to the pathogenesis of hematologic malignancy." Curr Opin Hematol 8(4): 252-61.

            Recent advances in genome technologies and computational biology have facilitated genome-wide views of hematologic malignancy. In particular, comparative gene expression methods using DNA microarrays have allowed for the analysis of gene expression patterns in both primary patient material and model systems of hematopoietic development. This review provides an overview of the basic technologies underlying these approaches and provides a summary of recent progress in the genome-wide molecular classification of human acute leukemias and lymphomas and of initial attempts to define oncogene-mediated transcriptional programs using DNA microarrays.

 

Grandi, G. (2001). "Antibacterial vaccine design using genomics and proteomics." Trends Biotechnol 19(5): 181-8.

            After 200 years of practice, vaccinology has proved to be very effective in preventing infectious diseases. However, several human and animal pathogens exist for which vaccines have not yet been discovered. As for other fields of medical sciences, it is expected that vaccinology will greatly benefit from the emerging genomics technologies such as bioinformatics, proteomics and DNA microarrays. In this article the potential of these technologies applied to bacterial pathogens is analyzed, taking into account the few existing examples of their application in vaccine discovery.

 

Grant, S. G. and W. P. Blackstock (2001). "Proteomics in neuroscience: from protein to network." J Neurosci 21(21): 8315-8.

            Proteomic tools offer a new platform for studies of complex biological functions involving large numbers and networks of proteins. Intracellular networks of proteins perform key functions in neurons and glia. The unicellular eukaryote Saccharomyces cerevisiae has been the prototype for eukaryotic proteomic studies, and when combined with genomics, microarrays, genetics, and pharmacology, new insights into the integrated function of the cell emerge. The anatomical complexity of the nervous system both in cell types and in the vast number of synapses introduces novel technical and biological issues regarding the subcellular organization of protein networks. Here we will discuss the technology of proteomics and its applications to the nervous system.

 

Grant, W. N. and M. E. Viney (2001). "Post-genomic nematode parasitology." Int J Parasitol 31(9): 879-88.

            The future direction of post-genomic nematode parasitology should focus on the function of the genes that are defined by large-scale expressed sequence tag sequencing and on broader questions about the genetic basis of parasitism. Functional characterisation will require the application of high throughput technologies that have been developed in other fields, including genome mapping strategies and DNA microarray analysis. These will be greatly aided by the development and application of appropriate model organisms. It is crucial that the field make the transition from a narrow focus on one or a few genes at a time to a focus on whole genomes in order to fully realise the potential of the expressed sequence tag and other genomic projects currently under way.

 

Gras, R. and M. Muller (2001). "Computational aspects of protein identification by mass spectrometry." Curr Opin Mol Ther 3(6): 526-32.

            Recent developments in proteomics and genomics provide huge quantities of data to analyze. Automatic interpretation of mass spectrometry data has become essential for high-throughput processes aiming to study complete proteomes. There exist two main sources of mass spectrometric data: peptide mass fingerprint and fragmentation spectra, both of which require specific bioinformatic algorithms. We present a survey of these algorithms and discuss the efficiency of the different approaches and the possible improvements that may lead to a complete automatic high-throughput identification process.

 

Gray, S. G. and T. J. Ekstrom (2001). "The human histone deacetylase family." Exp Cell Res 262(2): 75-83.

            Since the identification of the first histone deacetylase (Taunton et al., Science 272, 408-411), several new members have been isolated. They can loosely be separated into entities on the basis of their similarity to various yeast histone deacetylases. The first class is represented by its closeness to the yeast Rpd3-like proteins, and the second most recently discovered class has similarities to yeast Hda1-like proteins. However, due to the fact that several different research groups isolated the Hda1-like histone deacetylases independently, there have been various different nomenclatures used to describe the various members, which can lead to confusion in the interpretation of this family's functions and interactions. With the discovery of another novel murine histone deacetylase, homologous to yeast Sir2, the number of members of this family is set to increase, as 7 human homologues of this gene have been isolated. In the light of these recent discoveries, we have examined the literature data and conducted a database analysis of the isolated histone deacetylases and potential candidates. The results obtained suggest that the number of histone deacetylases within the human genome may be as high as 17 and are discussed in relation to their homology to the yeast histone deacetylases.

 

Guillouzo, A. (2001). "Applications of biotechnology to pharmacology and toxicology." Cell Mol Biol (Noisy-le-grand) 47(8): 1301-8.

            Strategies for the development of new more efficient drugs at a lower cost and for the evaluation of the effects of chemicals and metals on tissue and cell function are changing considerably. This is made possible by recent progress in various areas, particularly biotechnology and bioinformatics. The recent sequencing of the human genome and the design of more and more sophisticated technologies will largely influence the fields of pharmacology and toxicology. Thus, identification of new molecular targets, development of more powerful cell models, design of miniaturized and automated tests for high throughput screening of thousands of compounds synthesized by combinatorial chemistry and progress in genomic and proteomic technologies that permit simultaneous analysis of thousands of genes and their products, offer new investigative ways that will still widely be extended in the next future.

 

Hamadeh, H. K., P. Bushel, et al. (2001). "Discovery in toxicology: mediation by gene expression array technology." J Biochem Mol Toxicol 15(5): 231-42.

            Toxicogenomics is a term that represents the merging of toxicology with novel genomics techniques. Data generated in the new-age era of toxicology is relatively complex, requires new bioinformatics tools for adequate interpretation, and allows for the rapid generation of testable hypotheses. Hazard identification and risk assessment processes will advance from the use of genomics techniques, which will lead to greater understanding of mechanism(s) of action of toxicants, development of novel biomarkers of exposure and effect, and better identification of sensitive subpopulations.

 

Hamilton, B. A. and W. N. Frankel (2001). "Of mice and genome sequence." Cell 107(1): 13-6.

            Availability of the mouse genome sequence will have a major impact on the study of vertebrate evolution, mammalian biology, and animal models of human disease. Resources to explore genome biology in mice will maximize the effect of this watershed event.

 

Hansch, C., A. Kurup, et al. (2001). "Chem-bioinformatics and QSAR: a review of QSAR lacking positive hydrophobic terms." Chem Rev 101(3): 619-72.

           

Hastie, N. (2001). "Future perspectives." Essays Biochem 37: 121-7.

            Transcription is coupled to splicing and other post-transcriptional processes. The importance of transcription factors in developmental biology and disease is underlined by genetic analysis in flies and humans. The genome project is identifying large numbers of novel transcription factors and RNA-binding proteins. Proteins may have multiple functions, acting at the transcriptional and post-transcriptional levels. The vast amount of novel biological information requires new, high-throughput approaches and bioinformatics.

 

Hasty, J., D. McMillen, et al. (2001). "Computational studies of gene regulatory networks: in numero molecular biology." Nat Rev Genet 2(4): 268-79.

            Remarkable progress in genomic research is leading to a complete map of the building blocks of biology. Knowledge of this map is, in turn, setting the stage for a fundamental description of cellular function at the DNA level. Such a description will entail an understanding of gene regulation, in which proteins often regulate their own production or that of other proteins in a complex web of interactions. The implications of the underlying logic of genetic networks are difficult to deduce through experimental techniques alone, and successful approaches will probably involve the union of new experiments and computational modelling techniques.

 

Hondermarck, H., A. S. Vercoutter-Edouart, et al. (2001). "Proteomics of breast cancer for marker discovery and signal pathway profiling." Proteomics 1(10): 1216-32.

            Breast cancer is the most common form of cancer among women and the identification of markers to discriminate tumorigenic from normal cells, as well as the different stages of this pathology, is of critical importance. Two-dimensional electrophoresis has been used before for studying breast cancer, but the progressive completion of human genomic sequencing and the introduction of mass spectrometry, combined with advanced bioinformatics for protein identification, have considerably increased the possibilities for characterizing new markers and therapeutic targets. Breast cancer proteomics has already identified markers of potential clinical interest (such as the molecular chaperone 14-3-3 sigma) and technological innovations such as large scale and high throughput analysis are now driving the field. Methods in functional proteomics have also been developed to study the intracellular signaling pathways that underlie the development of breast cancer. As illustrated with fibroblast growth factor-2, a mitogen and motogen factor for breast cancer cells, proteomics is a powerful approach to identify signaling proteins and to decipher the complex signaling circuitry involved in tumor growth. Together with genomics, proteomics is well on the way to molecularly characterizing the different types of breast tumor, and thus defining new therapeutic targets for future treatment.

 

Ikeo, K. (2001). "[Getting the sequence world: How to use multiple alignment software]." Tanpakushitsu Kakusan Koso 46(9): 1299-305.

           

Imai, E., M. Takenaka, et al. (2001). "[Gene therapy and tissue engineering in nephrology and renal transplantation]." Nippon Rinsho 59(1): 65-71.

            Human genome project will be completed in 2003 and we will soon obtain the information of the whole DNA sequence of the human genome. This should affect the therapy of progressive renal diseases since we have no effective remedy to cure the renal diseases. Gene therapy, renal engineering and generation of new drug can be achieved by using the information of human genome. In this context, we described our recent endeavors concerning the gene therapy of transplant kidney, seeking the renal stem cells and reprogramming factors, and exploring genes related to renal fibrosis. Completion of bioinformatics, can facilitate the above post-genome project.

 

Imanishi, T. and S. Miyazaki (2001). "[Comparison with other sequences: sequence similarity searches]." Tanpakushitsu Kakusan Koso 46(7): 856-62.

           

Imura, H. (2001). "[Perspectives on postgenome medicine in the 21st century]." Nippon Rinsho 59(1): 7-10.

            Since the human genome project has been almost completed in 2000, the year of 2001 is the first year of the postgenomic era. A variety of postgenome studies will be done in the next decade, including functional, comparative and structural genomics. These studies may open new area in medicine, because disease susceptibility and drug metabolism would be predicted from genetic characteristics of individuals. Genome studies may also shed a light on cell biology, brain research and regeneration medicine and promote these studies. Bioinformatics will become a basis of postgenome biology and medicine.

 

Ishikawa, K. and G. Tsujimoto (2001). "[New strategy on medical research after completion of genome sequencing]." Nippon Yakurigaku Zasshi 118(3): 170-6.

            Real advances in biotechnology made it possible to complete human whole genome sequencing within a short duration. Although the genome includes a huge amount of information about biological functions and the interest is now directed to the study using genomic information, the genomic strategy is not clearly understood. The following 4 studies were therefore presented and discussed about the strategy after the completion of the genomic sequence in the 74th Annual Meeting of Japanese Pharmacological Society: 1) Asthma and atopic dermatitis: models for genetic and genomic investigations of complex genetic diseases, by W.C.O. Cookson (University of Oxford, Asthma Genetics Group, Wellcome Trust Centre for Human Genetics); 2) Molecular classification by global gene expression profiling: application on oncogenomic research, by H. Aburatani (Genome Science Division, Research Center for Advanced Science and Technology, University of Tokyo); 3) Functional genomic search of disease-related genes using microarrays with normalized rat cDNA library, by G. Tsujimoto, et al. (Department of Molecular, Cell Pharmacology, National Children's Medical Research Center: and 4) Acute ischemic change of mRNA expression in the hippocampus by GeneChip array analysis: a starting point for post-genome strategy, by S. Asai, et al.

 

Ito, T., T. Chiba, et al. (2001). "Exploring the protein interactome using comprehensive two-hybrid projects." Trends Biotechnol 19(10 Suppl): S23-7.

            Large-scale two-hybrid projects were used in an approach to examine protein-protein interactions. Despite the various limitations of this approach, these projects revealed a wealth of novel interactions, and the protein interactome may be much larger than expected.

 

Jones, D. T. (2001). "Protein structure prediction in genomics." Brief Bioinform 2(2): 111-25.

            As the number of completely sequenced genomes rapidly increases, including now the complete Human Genome sequence, the post-genomic problems of genome-scale protein structure determination and the issue of gene function identification become ever more pressing. In fact, these problems can be seen as interrelated in that experimentally determining or predicting or the structure of proteins encoded by genes of interest is one possible means to glean subtle hints as to the functions of these genes. The applicability of this approach to gene characterisation is reviewed, along with a brief survey of the reliability of large-scale protein structure prediction methods and the prospects for the development of new prediction methods.

 

Jung, D. R., R. Kapur, et al. (2001). "Topographical and physicochemical modification of material surface to enable patterning of living cells." Crit Rev Biotechnol 21(2): 111-54.

            Precise control of the architecture of multiple cells in culture and in vivo via precise engineering of the material surface properties is described as cell patterning. Substrate patterning by control of the surface physicochemical and topographic features enables selective localization and phenotypic and genotypic control of living cells. In culture, control over spatial and temporal dynamics of cells and heterotypic interactions draws inspiration from in vivo embryogenesis and haptotaxis. Patterned arrays of single or multiple cell types in culture serve as model systems for exploration of cell-cell and cell-matrix interactions. More recently, the patterned arrays and assemblies of tissues have found practical applications in the fields of Biosensors and cell-based assays for Drug Discovery. Although the field of cell patterning has its origins early in this century, an improved understanding of cell-substrate interactions and the use of microfabrication techniques borrowed from the microelectronics industry have enabled significant recent progress. This review presents the important early discoveries and emphasizes results of recent state-of-the-art cell patterning methods. The review concludes by illustrating the growing impact of cell patterning in the areas of bioelectronic devices and cell-based assays for drug discovery.

 

Jungblut, P. R. (2001). "Proteome analysis of bacterial pathogens." Microbes Infect 3(10): 831-40.

            Combining two-dimensional electrophoresis with mass spectrometry resulted in a powerful technology ideally suited to recognize and identify proteins of pathogenic microorganisms. This classical proteome analysis is now complemented by capillary chromatography/mass spectrometry combinations, miniaturization by chip technology and protein interaction investigations. Comparative proteomics is used to reveal vaccine candidates and pathogenicity factors. Immunoproteomics identifies specific and nonspecific antigens. For the management of the huge data amounts, bioinformatics is a valuable instrument for the construction of complex protein databases.

 

Kallioniemi, O. P. (2001). "Biochip technologies in cancer research." Ann Med 33(2): 142-7.

            Development of high-throughput 'biochip' technologies has dramatically enhanced our ability to study biology and explore the molecular basis of disease. Biochips enable massively parallel molecular analyses to be carried out in a miniaturized format with a very high throughput. This review will highlight applications of the various biochip technologies in cancer research, including analysis of 1) disease predisposition by using single-nucleotide polymorphism (SNP) microarrays, 2) global gene expression patterns by cDNA microarrays, 3) concentrations, functional activities or interactions of proteins with proteomic biochips, and 4) cell types or tissues as well as clinical endpoints associated with molecular targets by using tissue microarrays. One can predict that individual cancer risks can, in the future, be estimated accurately by a microarray profile of multiple SNPs in critical genes. Diagnostics of cancer will be facilitated by biochip readout of activity levels of thousands of genes and proteins. Biochip diagnostics coupled with informatics solutions will form the basis of individualized treatment decisions for cancer patients.

 

Kellam, P. (2001). "Post-genomic virology: the impact of bioinformatics, microarrays and proteomics on investigating host and pathogen interactions." Rev Med Virol 11(5): 313-29.

            Post-genomic research encompasses many diverse aspects of modern science. These include the two broad subject areas of computational biology (bioinformatics) and functional genomics. Laboratory based functional genomics aims to measure and assess either the messenger RNA (mRNA) levels (transcriptome studies) or the protein content (proteome studies) of cells and tissues. All of these methods have been applied recently to the study of host and pathogen interactions for both bacteria and viruses. A basic overview of the technology is given in this review together with approaches to data analysis. The wealth of information produced from even these preliminary studies has shown the generalities, subtleties and specificities of host-pathogen interactions. Such research should ultimately result in new methods for diagnosing and treating infectious diseases.

 

Kidera, A. (2001). "[Knowing similarity in protein 3D structure]." Tanpakushitsu Kakusan Koso 46(15): 2198-204.

           

Kim, J. (2001). "Descartes' fly: the geometry of genomic annotation." Funct Integr Genomics 1(4): 241-9.

            The completion of the Drosophila melanogaster genome marks another significant milestone in the growth of sequence information. But it also contributes to the ever-widening gap between sequence information and biological knowledge. One important approach to reducing this gap is theoretical inference through computational technologies. Many computer programs have been designed to annotate genomic sequence information with biologically relevant information. Here, I suggest that all of these methods have a common structure in which the sequence fragments are "coordinated" by some method of description such as Hidden Markov models. The key to the algorithms lies in constructing the most efficient set of coordinates that allow extrapolation and interpolation from existing knowledge. Efficient extrapolation and interpolation are produced if the sequence fragments acquire a natural geometrical structure in the coordinated description. Finding such a coordinate frame is an inductive problem with no algorithmic solution. The greater part of the problem of genomic annotation lies in biological modeling of the data rather than in algorithmic improvements.

 

Kobayashi, K. (2001). "[Getting functional information from your sequence by the use of protein signature databases]." Tanpakushitsu Kakusan Koso 46(14): 2098-103.

           

Kondo, S. (2001). "[Computer simulation as a tool to study complex phenomena of biology]." Tanpakushitsu Kakusan Koso 46(16 Suppl): 2461-7.

           

Korenberg, M. J., R. David, et al. (2001). "Parallel cascade identification and its application to protein family prediction." J Biotechnol 91(1): 35-47.

            Parallel cascade identification is a method for modeling dynamic systems with possibly high order nonlinearities and lengthy memory, given only input/output data for the system gathered in an experiment. While the method was originally proposed for nonlinear system identification, two recent papers have illustrated its utility for protein family prediction. One strength of this approach is the capability of training effective parallel cascade classifiers from very little training data. Indeed, when the amount of training exemplars is limited, and when distinctions between a small number of categories suffice, parallel cascade identification can outperform some state-of-the-art techniques. Moreover, the unusual approach taken by this method enables it to be effectively combined with other techniques to significantly improve accuracy. In this paper, parallel cascade identification is first reviewed, and its use in a variety of different fields is surveyed. Then protein family prediction via this method is considered in detail, and some particularly useful applications are pointed out.

 

Krawetz, S. A. and D. D. Womble (2001). "Design and implementation of an introductory course for computer applications in molecular genetics. A case study." Mol Biotechnol 17(1): 27-41.

            Formal training in computational biology was initiated at Wayne State University in 1990 to meet the needs of the faculty. This was still at a time when the molecular databases and analysis tools could be housed in what is now equivalent to a modern but dated desktop computer. In 1995 the course was expanded to include graduate students to provide these senior students with a foundation in computational biology. This course has armed our students with a requisite set of basic skills that are necessary for a successful career in molecular genetics. It is now an integral component of the graduate program of the Center for Molecular Medicine and Genetics and our experiences in course delivery have been detailed (BioInformatics Methods and Protocols, S. Misener and S. A. Krawetz, eds., Humana Press, Totowa, NJ, 2000.). The course was expanded to a campus-wide unlimited enrollment program for the summer of 2000 to address the needs of our student body. In this review we present our experience with delivering a multidisciplinary campus-wide computational biology course to a new and widely diverse student body.

 

Kurella, M., L. L. Hsiao, et al. (2001). "DNA microarray analysis of complex biologic processes." J Am Soc Nephrol 12(5): 1072-8.

            DNA microarrays, or gene chips, allow surveys of gene expression, (i.e., mRNA expression) in a highly parallel and comprehensive manner. The pattern of gene expression produced, known as the expression profile, depicts the subset of gene transcripts expressed in a cell or tissue. At its most fundamental level, the expression profile can address qualitatively which genes are expressed in disease states. However, with the aid of bioinformatics tools such as cluster analysis, self-organizing maps, and principle component analysis, more sophisticated questions can be answered. Microarrays can be used to characterize the functions of novel genes, identify genes in a biologic pathway, analyze genetic variation, and identify therapeutic drug targets. Moreover, the expression profile can be used as a tissue or disease "fingerprint." This review details the fabrication of arrays, data management tools, and applications of microarrays to the field of renal research and the future of clinical practice.

 

Kuroda, Y., E. Chikayama, et al. (2001). "[A protein domain selection system for high-throughput structural genomics]." Tanpakushitsu Kakusan Koso 46(14): 2066-72.

           

Kusunoki, M. (2001). "[Acquisition of structural data of biological macromolecules: how to utilize PDB]." Tanpakushitsu Kakusan Koso 46(13): 2003-8.

           

Legrain, P., J. Wojcik, et al. (2001). "Protein--protein interaction maps: a lead towards cellular functions." Trends Genet 17(6): 346-52.

            The availability of complete genome sequences now permits the development of tools for functional biology on a proteomic scale. Several experimental approaches or in silico algorithms aim at clustering proteins into networks with biological significance. Among those, the yeast two-hybrid system is the technology of choice to detect protein-protein interactions. Recently, optimized versions were applied at a genomic scale, leading to databases on the web. However, as with any other 'genetic' assay, yeast two-hybrid assays are prone to false positives and false negatives. Here we discuss these various technologies, their general limitations and the potential advances they make possible, especially when in combination with other functional genomics or bioinformatics analyses.

 

Lesch, K. P. (2001). "Molecular foundation of anxiety disorders." J Neural Transm 108(6): 717-46.

            Genetic epidemiology has assembled convincing evidence that anxiety and related disorders are influenced by genetic factors and that the genetic component is highly complex, polygenic, and epistatic. Although several genes which may contribute to the genetic variance of anxiety-related traits or modify the phenotypic expression of pathologic anxiety are currently under investigation, molecular genetics has so far failed to identify a genomic variation that can consistently contribute susceptibility of anxiety disorders. Investigation of gene-gene and gene-environment interactions in humans and nonhuman primates as well as gene inactivation studies in mice further intensify the identification of genes that are essential for development and adult plasticity of the brain related to complex anxiety responses. Because the modes of inheritance of anxiety disorders are complex, it has been concluded that multiple genes of small effect, in interaction with each other and with nongenetic neurodevelopmental events, produce vulnerability to the disorder. Future research directions will take advantage of the completion of the sequencing the human and mouse genome coinciding with the revolution in bioinformatics. More than 1.4 million single nucleotide polymorphisms (SNPs) in the human genome have been identified. This collection should allow the initiation of genome-wide linkage disequilibrium mapping of the genes influencing anxiety in the human population. Integration of these emerging tools and technologies for genetic analysis will provide the groundwork for an advanced stage of gene identification and functional studies in anxiety and related disorders.

 

Luo, Z. and D. H. Geschwind (2001). "Microarray applications in neuroscience." Neurobiol Dis 8(2): 183-93.

            Advances in all facets of technology from molecular biology to imaging and computational biology offer unprecedented opportunities for improving our understanding of the brain in health and disease. Oligonucleotide and cDNA microarray analysis, using a variety of "DNA chips," is a recently developed high-throughput technique that allows for tour-de-force analysis of gene expression. We review this powerful technique, developed in genetics laboratories, with reference to applications in neurologic diseases in humans and the use of animal models. The typical microarray experiment is multistaged and includes preparation or purchase of arrays, preparation of target DNA and probe, target DNA hybridization, microarray scanning, and image analysis. The power and pitfalls of this technology are discussed in the context of neuroscience paradigms. Since unprecedented amounts of data are produced from microarray experiments, bioinformatics and modeling expertise are increasingly becoming critical components of this approach.

 

Lupas, A. N., C. P. Ponting, et al. (2001). "On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world?" J Struct Biol 134(2-3): 191-203.

            This paper presents and discusses evidence suggesting how the diversity of domain folds in existence today might have evolved from peptide ancestors. We apply a structure similarity detection method to detect instances where localized regions of different protein folds contain highly similar sequences and structures. Results of performing an all-on-all comparison of known structures are described and compared with other recently published findings. The numerous instances of local sequence and structure similarities within different protein folds, together with evidence from proteins containing sequence and structure repeats, argues in favor of the evolution of modern single polypeptide domains from ancient short peptide ancestors (antecedent domain segments (ADSs)). In this model, ancient protein structures were formed by self-assembling aggregates of short polypeptides. Subsequently, and perhaps concomitantly with the evolution of higher fidelity DNA replication and repair systems, single polypeptide domains arose from the fusion of ADSs genes. Thus modern protein domains may have a polyphyletic origin.

 

Luscombe, N. M., D. Greenbaum, et al. (2001). "What is bioinformatics? A proposed definition and overview of the field." Methods Inf Med 40(4): 346-58.

            BACKGROUND: The recent flood of data from genome sequences and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. OBJECTIVES: Here we propose a definition for this new field and review some of the research that is being pursued, particularly in relation to transcriptional regulatory systems. METHODS: Our definition is as follows: Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale. RESULTS AND CONCLUSIONS: Analyses in bioinformatics predominantly focus on three types of large datasets available in molecular biology: macromolecular structures, genome sequences, and the results of functional genomics experiments (e.g. expression data). Additional information includes the text of scientific papers and "relationship data" from metabolic pathways, taxonomy trees, and protein-protein interaction networks. Bioinformatics employs a wide range of computational techniques including sequence and structural alignment, database design and data mining, macromolecular geometry, phylogenetic tree construction, prediction of protein structure and function, gene finding, and expression data clustering. The emphasis is on approaches integrating a variety of computational methods and heterogeneous data sources. Finally, bioinformatics is a practical discipline. We survey some representative applications, such as finding homologues, designing drugs, and performing large-scale censuses. Additional information pertinent to the review is available over the web at http://bioinfo.mbb.yale.edu/what-is-it.

 

Ma, D. (2001). "Applications of yeast in drug discovery." Prog Drug Res 57: 117-62.

            The yeast Saccharomyces cerevisiae is perhaps the best-studied eukaryotic organism. Its experimental tractability, combined with the remarkable conservation of gene function throughout evolution, makes yeast the ideal model genetic organism. Yeast is a non-pathogenic model of fungal pathogens used to identify antifungal targets suitable for drug development and to elucidate mechanisms of action of antifungal agents. As a model of fundamental cellular processes and metabolic pathways of the human, yeast has improved our understanding and facilitated the molecular analysis of many disease genes. The completion of the Saccharomyces genome sequence helped launch the post-genomic era, focusing on functional analyses of whole genomes. Yeast paved the way for the systematic analysis of large and complex genomes by serving as a test bed for novel experimental approaches and technologies, tools that are fast becoming the standard in drug discovery research

 

Maecker, B., B.-B. von, et al. (2001). "Linking genomics to immunotherapy by reverse immunology--'immunomics' in the new millennium." Curr Mol Med 1(5): 609-19.

            The disclosure of the human genome sequence and rapid advances in genomic expression profiling have revolutionized our knowledge about molecular changes in malignant diseases. Rapidly growing gene expression databases and improvements in bioinformatics tools set the stage for new approaches using large-scale molecular information to develop specific therapeutics in cancer. On one hand, the ability to detect clusters of genes differentially expressed in normal and malignant tissue may lead to widely applicable targeting of defined molecular structures. On the other hand, analyzing the 'molecular fingerprint' of an individual tumor raises the possibility of developing customized therapeutics. One approach to use the emerging new datasets for the development of novel therapeutics is to identify genes that are specifically expressed in tumors as targets for immune intervention. This review will focus on the process from in silico analysis of expression databases and screening of potential candidate genes by bioinformatics to the in vitro and in vivo analysis to determine the immunogenicity of candidate tumor antigens. Basic biological principles of 'reverse immunology' as well as technical advantages and difficulties will be addressed.

 

Maggio, E. T. and K. Ramnarayan (2001). "Recent developments in computational proteomics." Trends Biotechnol 19(7): 266-72.

            The mapping of the human genome was completed earlier this year and efforts are underway to understand the role of gene products (i.e. proteins) in biological pathways and human disease and to exploit their functional roles to derive protein therapeutics and protein-based drugs. A key component to the next revolution in the 'post-genomic' era will be the increasingly widespread use of protein structure in rational experimental design. Improvements in quality, availability and utility of large-scale 3D and 4D protein structural information are enabling a revolution in rational design, having particular impact on drug discovery and optimization. New computational methodologies now yield modeled structures that are, in many cases, quantitatively comparable with crystal structures, at a fraction of the cost.

 

Mahalingam, S., K. Clark, et al. (2001). "Antiviral potential of chemokines." Bioessays 23(5): 428-35.

            In the past few years, a large number of new chemokines (chemotactic cytokines) and chemokine receptors have been discovered. The growth in knowledge about these molecules has been achieved largely through advances in bioinformatics and the expansion of expression sequence tag (EST) databases. It is now clear that chemokines are crucial in controlling both the development and functioning of leukocytes and that their role is not restricted to cell attraction, as originally assumed. In particular, recent findings provide strong support for the idea that chemokines and their receptors are especially important in the control of viral infection and replication. Thus, specific chemokines are now known to enhance the cytotoxic activity of infected cells, thus inhibiting further virus replication. In addition, some chemokines orchestrate the recruitment of activated leukocytes to foci of infection to aid viral clearance. Viruses, in turn, have evolved various defences against chemokines. These range from the production of proteins that inhibit biological activity of the host chemokine to the hijacking of the chemokine system, whereby certain viruses utilize chemokine receptors for their entry. The latter viral defence can itself be blocked by chemokines. Altogether, these findings illustrate the central role of chemokines in many different phases of the immune response, particularly those aspects involving antiviral defence, a variety and versatility that was not fully appreciated even a few years ago.

 

Manfredi-Romanini, M. G. (2001). "The year of encroaching genomics." Eur J Histochem 45(1): 5-6.

           

Mano, H. (2001). "[DNA chip analysis of hematological disorders]." Rinsho Ketsueki 42(9): 671-9.

           

Masihi, K. N. (2001). "Fighting infection using immunomodulatory agents." Expert Opin Biol Ther 1(4): 641-53.

            The last decade has seen the emergence of immunomodulators as promising therapeutic agents in infectious diseases. A diverse array of recombinant, synthetic and natural immunomodulatory preparations for prophylaxis and treatment of various infections are available today. Some of these substances, such as granulocyte colony-stimulating factor (G-CSF), interferons, imiquimod and bacterial-derived preparations are already licensed for use in patients. Others including IL-12, various chemokines, synthetic cytosine phosphate-guanosine (CpG) oligodeoxynucleotides and glucans are being investigated extensively in clinical and preclinical studies. Immunomodulatory regimens offer an attractive approach as an adjunct modality for control of microbial diseases in the era of antibiotic resistance. Practical application of the advances in molecular biology, bioinformatics, genomic mining and high-throughput peptide synthesis should foster future discovery and development of novel immunomodulators contingent upon scientific evidence rather than dictates of discursive empiricism.

 

Maughan, N. J., F. A. Lewis, et al. (2001). "An introduction to arrays." J Pathol 195(1): 3-6.

            DNA microarrays are a new technology that allows the analysis of large numbers of genes at a high resolution by the hybridization of labelled DNA, which may be reverse-transcribed from mRNA, to a substrate containing thousands of spotted cDNAs or oligonucleotides. The amount of hybridized target is analysed, giving information on gene expression, polymorphisms or mutations present and allowing the gene profiling of different subtypes of disease. This technique has massive implications for the further understanding of the complicated genetic alterations involved in tumourigenesis and other disease processes and also for the generation of accurate prognostic information and optimization of treatment in these situations.

 

McCulloch, A. D. and R. Mazhari (2001). "Regional myocardial mechanics: integrative computational models of flow-function relations." J Nucl Cardiol 8(4): 506-19.

            Many cardiac disorders result in regionally altered myocardial mechanics. Although myocardial strain distributions can be measured experimentally and clinically, regional wall stresses must be computed from computational models. Combining these approaches can provide insight into the structural basis of regional dysfunction under conditions such as acute myocardial infarction and ischemia-reperfusion. Recently, 3-dimensional computational models have helped to elucidate the structural basis of the functional border zone adjacent to acutely ischemic myocardium. They have also shown that heterogeneous dysfunction in ischemic-reperfused stunned myocardium does not necessarily imply heterogeneous myofilament injury. Now that computational models are able to reproduce many complex features of the 3-dimensional patterns of regional myocardial deformation observed experimentally, we suggest possible roles for such integrative models in clinical diagnosis.

 

Mendonca, E. A., J. J. Cimino, et al. (2001). "Accessing heterogeneous sources of evidence to answer clinical questions." J Biomed Inform 34(2): 85-98.

            The large and rapidly growing number of information sources relevant to health care, and the increasing amounts of new evidence produced by researchers, are improving the access of professionals and students to valuable information. However, seeking and filtering useful, valid information can be still very difficult. An online information system that conducts searches based on individual patient data can have a beneficial influence on the particular patient's outcome and educate the healthcare worker. In this paper, we describe the underlying model for a system that aims to facilitate the search for evidence based on clinicians' needs. This paper reviews studies of information needs of clinicians, describes principles of information retrieval, and examines the role that standardized terminologies can play in the integration between a clinical system and literature resources, as well as in the information retrieval process. The paper also describes a model for a digital library system that supports the integration of clinical systems with online information sources, making use of information available in the electronic medical record to enhance searches and information retrieval. The model builds on several different, previously developed techniques to identify information themes that are relevant to specific clinical data. Using a framework of evidence-based practice, the system generates well-structured questions with the intent of enhancing information retrieval. We believe that by helping clinicians to pose well-structured clinical queries and including in them relevant information from individual patients' medical records, we can enhance information retrieval and thus can improve patient-care.

 

Miller, W. (2001). "Comparison of genomic DNA sequences: solved and unsolved problems." Bioinformatics 17(5): 391-7.

            MOTIVATION: The DNA sequences of entire genomes are being determined at a rapid rate. Whereas initial genome sequencing efforts were for organisms chosen to be widely spaced in the tree of life, there is a growing emphasis on projects to sequence a species that is sufficiently similar to an already-sequenced species to allow direct comparison of those two DNA sequences. This and other changes in genome sequencing strategies have created a strong need for new methods to compare genomic sequences. RESULTS: We sketch the current state of software for comparing genomic DNA sequences and outline research directions that we believe are likely to result in important advances in practice.

 

Miyachi, H. (2001). "[The present status and future prospect of the molecular diagnostic tests]." Rinsho Byori 49(2): 139-49.

            Assays for DNA or RNA sequences to diagnose infectious, neoplastic and genetic diseases have been widely used through recent progress in the molecular biology and biotechnology, and are now essential in care of patients under the advanced medicine through earlier and more accurate diagnosis. Automated systems have been developed for amplification and detection of nucleic acid sequence for infectious agents, using various nucleic acid amplification technology such as PCR. A fully automated PCR system and automated extraction of specific sequence for infectious agents such as hepatitis C virus RNA has been developed. These automated systems have provided improvement of not only assay efficiency but also quality control of the tests and have contributed to the standardization of them. Importance of development of systems for quality assessment and laboratory accreditation has been emphasized, particularly in those that still have been performed with manual methods. Based on the information on the genome sequence as the outcome of the human genome project, functions of genes and proteins have been studied by post-genomics such as expression profiling using DNA microarray, proteomics, single nucleotide polymorphisms analysis, coupled with bioinformatics. Along with advances in pharmacogenomics, these studies have raised the prospect of the development of tests for individualized medicine based on genetic information such as those predicting individual susceptibility to diseases for prevention and responsiveness to drugs for choice of treatment. For practice of such medicine, each genetic information and tests for it must be carefully evaluated and determined whether it is appropriate for cost-effective medicine through contributions to efficient process of decision-makings on patient care for prevention or avoidance of diseases and thus to cost savings.

 

Mori, H., T. Horiuchi, et al. (2001). "[Post sequence genome analysis of Escherichia coli]." Tanpakushitsu Kakusan Koso 46(13): 1977-85.

           

Morishita, S. and J. Sese (2001). "[Computational analysis of gene expression patterns]." Tanpakushitsu Kakusan Koso 46(16 Suppl): 2575-9.

           

Nakai, K. (2001). "[Computational analyses of signal information encoded within genome data]." Tanpakushitsu Kakusan Koso 46(16 Suppl): 2544-9.

           

Nakai, K. (2001). "[An opinion on where bioinformatics should proceed]." Tanpakushitsu Kakusan Koso 46(11 Suppl): 1488-95.

           

Nakai, K. (2001). "Review: prediction of in vivo fates of proteins in the era of genomics and proteomics." J Struct Biol 134(2-3): 103-16.

            Even after a nascent protein emerges from the ribosome, its fate is still controlled by its own amino acid sequence information. Namely, it may be co-/posttranslationally modified (e.g., phosphorylated, N-/O-glycosylated, and lipidated); it may be inserted into the membrane, translocated to an organelle, or secreted to the outside milieu; it may be processed for maturation or selective degradation; finally, its fragment may be presented on the cell surface as an antigen. Here, prediction methods of such protein fates from their amino acid sequences are reviewed. In many cases, artificial neural network techniques have been effectively used. The prediction of in vivo fates of proteins will be useful for characterizing newly identified candidate genes in a genome or for interpreting multiple spots in proteome analyses.

 

Naruya, S. (2001). "[In search of gene evolution from sequences]." Tanpakushitsu Kakusan Koso 46(10): 1410-3.

           

Norton, S. M., P. Huyn, et al. (2001). "Data mining of spectroscopic data for biomarker discovery." Curr Opin Drug Discov Devel 4(3): 325-31.

            The goals of precise diagnosis, prevention and treatment of disease can be realized through the discovery of biological markers. Spectroscopic tools can simultaneously detect and quantify multiple small molecule and macromolecular components of biological samples, and are therefore ideal methods for the discovery of previously uncharacterized markers. However, the identification of meaningful spectral features is complicated by the lack of foreknowledge of the molecular nature of a disease, spectral noise and biological variability that is uncorrelated with the disease state. Pattern recognition techniques, both statistical and machine-learning, have been increasingly used in recent years with spectroscopic data to identify markers and classify patients into disease subsets. This review summarizes recent developments, limitations and future prospects in the use of data mining techniques with magnetic resonance spectroscopy, mass spectrometry and optical spectroscopy for the discovery of biomarkers.

 

Novatchkova, M. and F. Eisenhaber (2001). "Can molecular mechanisms of biological processes be extracted from expression profiles? Case study: endothelial contribution to tumor-induced angiogenesis." Bioessays 23(12): 1159-75.

            Whereas the genome contains all potential developmental programs, expression profiles permit the determination of genes that are actively transcribed under defined physiological conditions. In this article, the idea of extracting biological mechanisms from expression data is tested. Molecular processes of the endothelial contribution to angiogenesis are derived from recently published expression profiles. The analysis reveals the sensitivity limits of experimental detection of transcriptional changes and how sequence-analytic techniques can help to identify the function of genes in question. We conclude that the transcripts (http://mendel.imp.univie.ac.at/SEQUENCES/TEMS/) found to be up-regulated in angiogenesis are involved in extracellular matrix remodeling, cellular migration, adhesion, cell-cell communication rather than in angiogenesis initiation or integrative control. Comparison with tissue-specific patterns of EST occurrence shows that, indeed, the presumptive tumor-specific endothelial markers are more generally expressed by cell types involved in migration and matrix remodeling processes. This exemplary study demonstrates how bioinformatics approaches can be helpful in deriving mechanistic information from diverse sources of experimental data.

 

Okamoto, M. (2001). "[System for the inference of genetic networks]." Tanpakushitsu Kakusan Koso 46(16 Suppl): 2515-20.

           

Olsson, T. and T. I. Oprea (2001). "Cheminformatics: a tool for decision-makers in drug discovery." Curr Opin Drug Discov Devel 4(3): 308-13.

            Cheminformatics is a tool that aims at facilitating the decision-making process across various preclinical stages of drug discovery. Access to biological and chemical data, but not the data themselves, is an integral part of cheminformatics. Emerging tools that allow storage of, and access to, chemical, structural-chemical and biological information are only now beginning to reach maturity. Recent advances in cheminformatics include virtual library analysis without enumeration and novel methods to investigate global chemical similarity and diversity voids. The most important task for cheminformatics is to constantly reevaluate itself and its utility in the area of drug discovery, in order to provide probabilistic, rather than categorical predictions.

 

Papac, D. I. and Z. Shahrokh (2001). "Mass spectrometry innovations in drug discovery and development." Pharm Res 18(2): 131-45.

            This review highlights the many roles mass spectrometry plays in the discovery and development of new therapeutics by both the pharmaceutical and the biotechnology industries. Innovations in mass spectrometer source design, improvements to mass accuracy, and implementation of computer-controlled automation have accelerated the purification and characterization of compounds derived from combinatorial libraries, as well as the throughput of pharmacokinetics studies. The use of accelerator mass spectrometry, chemical reaction interface-mass spectrometry and continuous flow-isotope ratio mass spectrometry are promising alternatives for conducting mass balance studies in man. To meet the technical challenges of proteomics, discovery groups in biotechnology companies have led the way to development of instruments with greater sensitivity and mass accuracy (e.g., MALDI-TOF, ESI-Q-TOF, Ion Trap), the miniaturization of separation techniques and ion sources (e.g., capillary HPLC and nanospray), and the utilization of bioinformatics. Affinity-based methods coupled to mass spectrometry are allowing rapid and selective identification of both synthetic and biological molecules. With decreasing instrument cost and size and increasing reliability, mass spectrometers are penetrating both the manufacturing and the quality control arenas. The next generation of technologies to simplify the investigation of the complex fate of novel pharmaceutical entities in vitro and in vivo will be chip-based approaches coupled with mass spectrometry.

 

Patel, V. L., J. F. Arocha, et al. (2001). "Methods of cognitive analysis to support the design and evaluation of biomedical systems: the case of clinical practice guidelines." J Biomed Inform 34(1): 52-66.

            This article provides a theoretical and methodological framework for the use of cognitive analysis to support the representation of biomedical knowledge and the design of clinical systems, using clinical-practice guidelines (CPGs) as an example. We propose that propositional and semantic analyses, when used as part of the system-development process, can improve the validity, usability, and comprehension of the resulting biomedical applications. The framework we propose is based on a large body of research on the study of how people mentally represent information and subsequently use it for problem solving. This research encompasses many areas of psychology, but the more important ones are the study of memory and the study of comprehension. Of particular relevance is research devoted to investigating the comprehension and memory of language, expressed verbally or in text. In addition, research on how contextual variables affect performance is informative because these psychological processes are influenced by situational variables (e.g., setting, culture). One important factor limiting the acceptance and use of clinical-practice guidelines (CPGs) may be the mismatch between a guideline's recommended actions and the physician-user's mental models of what seems appropriate in a given case. Furthermore, CPGs can be semantically complex, often composed of elaborate collections of prescribed procedures with logical gaps or contradictions that can promote ambiguity and hence frustration on the part of those who attempt to use them. An improved understanding of the semantics and structure of CPGs may help to improve such matching, and ultimately the comprehensibility and usability of CPGs. Cognitive methods of analysis can help guideline designers and system builders throughout the development process, from the conceptual design of a computer-based system to its implementation phases. By studying how guideline creators and developers represent guidelines, both mentally and in text, and how end-users understand and make decisions with such guidelines, we can inform the development of technologies that seek to improve the match between the representations of experts and practitioners. We urge informaticians to recognize the potential relevance of cognitive analysis methods and to begin more extensive experimentation with the their use in biomedical informatics research.

 

Paterson, A. H., T. H. Lan, et al. (2001). "Brassica genomics: a complement to, and early beneficiary of, the Arabidopsis sequence." Genome Biol 2(3): REVIEWS1011.

            Those studying the genus Brassica will be among the early beneficiaries of the now-completed Arabidopsis sequence. The remarkable morphological diversity of Brassica species and their relatives offers valuable opportunities to advance our knowledge of plant growth and development, and our understanding of rapid phenotypic evolution.

 

Pavelic, K. and K. Gall-Troselj (2001). "Recent advances in molecular genetics of breast cancer." J Mol Med 79(10): 566-73.

            Breast cancer is among the most common tumors affecting women. It is characterized by a number of genetic aberrations. Some 5-10% of cases are thought to be inherited. The hereditary breast and ovarian cancer syndrome includes genetic alterations of various susceptibility genes, particularly BRCA1 and BRCA2. Breast tumors of patients with germ-line mutations in the BRCA1 and BRCA2 genes have more genetic defects than sporadic breast tumors. Here we review new findings in the function of BRCA1 gene function. Accumulation of somatic genetic changes during tumor progression map follows a specific and more aggressive pathway of chromosome damage in these individuals. A major BRCA1 downstream target gene is the DNA damage-responsive gene GADD45. Induction of BRCA1 triggers apoptosis by activation of c-Jun N-terminal kinase/stress-activated protein kinase (JNK/SAPK). BRCA1 interacts with SWI/SNF, a chromatin remodeling complex important in gene expression. Recent advances in genomics and bioinformatics, particularly in DNA-sequencing approaches and DNA-chip technology are expected to improve identification of small molecules, which might be drugable targets. New knowledge about the genetic portrait of breast tumor is coming from differential gene expression profiling using microarrays. Human genome studies, as well as development of "DNA chips," provide a window for observing patterns of gene activity in cells, which will contribute to more accurate cancer classification. However, substantial work connected with analytical and statistical tools must still be carried out to confirm the function of differentially expressed genes. Knowledge of the molecular characteristics of breast tumor has already started to make possible the identification of breast cancer patients who could benefit from therapies that target those features. Progress in basic research into signaling provides the opportunity to attack at least some signal-transduction targets involved in proliferation, survival, invasion, angiogenesis, metastasis, and resistance. Exciting knowledge in breast cancer biology is rapidly accumulating in parallel with recent developments in rational selection and validation of relevant targets that provide unique opportunities for development of "intelligent" therapeutics.

 

Peale, F. V., Jr. and M. E. Gerritsen (2001). "Gene profiling techniques and their application in angiogenesis and vascular development." J Pathol 195(1): 7-19.

            The analysis of gene expression in specific tissues and physiological processes has evolved over the last 20 years from the painstaking identification of selected genes to the relatively efficient and open-ended surveying of potentially all genes expressed in a tissue. Current art for gene discovery includes the use of large-scale arrays of cDNA sequences or oligonucleotides, and molecular 'tagging' techniques such as GeneCalling and SAGE. Common to each of these techniques is a reliance on the increasingly comprehensive databases of human and mouse EST and full-length gene sequences. Early efforts to characterize candidate genes were limited by their narrow scope, while current efforts are confounded by the enormous volume of data returned. Sophisticated software tools are an integral part of the analysis, helping to organize information into coherent groups with temporal or functional similarity. These techniques, in conjunction with the continued analysis of human genetic syndromes, transgenic, and knockout mice, have driven genetic analysis of angiogenesis and vascular development from describing which individual genes are involved to defining the outlines of regulatory networks.

 

Pearl, G. M., S. Livingston-Carr, et al. (2001). "Integration of computational analysis as a sentinel tool in toxicological assessments." Curr Top Med Chem 1(4): 247-55.

            Computational toxicity modeling can have significant impact in the drug discovery process, especially when utilized as a sentinel filter for common drug safety liabilities, such as mutagenicity, carcinogenicity and teratogenicity. This review will focus on the strengths and limitations of the current computational models for predicting these drug safety liabilities, and the various strategies for incorporating these predictive models into the drug discovery process.

 

Pellegrini, M., M. Thompson, et al. (2001). "Computational method to assign microbial genes to pathways." J Cell Biochem Suppl Suppl 37: 106-9.

            We present techniques that mine fully sequenced microbial genomes for functional relationships between genes. We show that genes related by one of four techniques are more likely to belong to the same cellular pathways. Furthermore, we demonstrate that the pathway of an uncharacterized gene may be inferred from those of its functionally related partners. Therefore, we are now able to assign most of the genes within bacteria to cellular pathways.

 

Pellegrini, M. (2001). "Computational methods for protein function analysis." Curr Opin Chem Biol 5(1): 46-50.

            Two recent advances have had the greatest impact on protein function analysis so far: the complete sequences of genomes and mRNA expression level profiles. The former has spurred the development of novel techniques to study protein function: phylogenetic profiles and gene clusters. The latter has introduced a method, not based on sequence homology, that enables one to group together functionally related genes.

 

Pepperkok, R., J. C. Simpson, et al. (2001). "Being in the right location at the right time." Genome Biol 2(9): REVIEWS1024.

            Taking each coding sequence from the human genome in turn and identifying the subcellular localization of the corresponding protein would be a significant contribution to understanding the function of each of these genes and to deciphering functional networks. This article highlights current approaches aimed at achieving this goal.

 

Phair, R. D. and T. Misteli (2001). "Kinetic modelling approaches to in vivo imaging." Nat Rev Mol Cell Biol 2(12): 898-907.

            The ability to visualize protein dynamics and biological processes by in vivo microscopy is revolutionizing many areas of biology. These methods generate large, kinetically complex data sets, which often cannot be intuitively interpreted. The combination of dynamic imaging and computational modelling is emerging as a powerful tool for the quantitation of biophysical properties of molecules and processes. The new discipline of computational cell biology will be essential in uncovering the pathways, mechanisms and controls of biological processes and systems as they occur in vivo.

 

Planet, P. J., R. DeSalle, et al. (2001). "Systematic analysis of DNA microarray data: ordering and interpreting patterns of gene expression." Genome Res 11(7): 1149-55.

           

Ponting, C. P. (2001). "Issues in predicting protein function from sequence." Brief Bioinform 2(1): 19-29.

            Identifying homologues, defined as genes that arose from a common evolutionary ancestor, is often a relatively straightforward task, thanks to recent advances made in estimating the statistical significance of sequence similarities found from database searches. The extent by which homologues possess similarities in function, however, is less amenable to statistical analysis. Consequently, predicting function by homology is a qualitative, rather than quantitative, process and requires particular care to be taken. This review focuses on the various approaches that have been developed to predict function from the scale of the atom to that of the organism. Similarities in homologues' functions differ considerably at each of these different scales and also vary for different domain families. It is argued that due attention should be paid to all available clues to function, including orthologue identification, conservation of particular residue types, and the co-occurrence of domains in proteins. Pitfalls in database searching methods arising from amino acid compositional bias and database size effects are also discussed.

 

Quackenbush, J. (2001). "Computational analysis of microarray data." Nat Rev Genet 2(6): 418-27.

            Microarray experiments are providing unprecedented quantities of genome-wide data on gene-expression patterns. Although this technique has been enthusiastically developed and applied in many biological contexts, the management and analysis of the millions of data points that result from these experiments has received less attention. Sophisticated computational tools are available, but the methods that are used to analyse the data can have a profound influence on the interpretation of the results. A basic understanding of these computational tools is therefore required for optimal experimental design and meaningful data analysis.

 

Read, S. J., A. A. Parsons, et al. (2001). "Stroke genomics: approaches to identify, validate, and understand ischemic stroke gene expression." J Cereb Blood Flow Metab 21(7): 755-78.

            Sequencing of the human genome is nearing completion and biologists, molecular biologists, and bioinformatics specialists have teamed up to develop global genomic technologies to help decipher the complex nature of pathophysiologic gene function. This review will focus on differential gene expression in ischemic stroke. It will discuss inheritance in the broader stroke population, how experimental models of spontaneous stroke might be applied to humans to identify chromosomal loci of increased risk and ischemic sensitivity, and also how the gene expression induced by stroke is related to the poststroke processes of brain injury, repair, and recovery. In addition, we discuss and summarise the literature of experimental stroke genomics and compare several approaches of differential gene expression analyzes. These include a comparison of representational difference analysis we have provided using an experimental stroke model that is representative of stroke evolution observed most often in man, and a summary of available data on stroke differential gene expression. Issues regarding validation of potential genes as stroke targets, the verification of message translation to protein products, the relevance of the expression of neuroprotective and neurodestructive genes and their specific timings, and the emerging problems of handling novel genes that may be discovered during differential gene expression analyses will also be addressed.

 

Rehm, B. H. (2001). "Bioinformatic tools for DNA/protein sequence analysis, functional assignment of genes and protein classification." Appl Microbiol Biotechnol 57(5-6): 579-92.

            The development of efficient DNA sequencing methods has led to the achievement of the DNA sequence of entire genomes from (to date) 55 prokaryotes, 5 eukaryotic organisms and 10 eukaryotic chromosomes. Thus, an enormous amount of DNA sequence data is available and even more will be forthcoming in the near future. Analysis of this overwhelming amount of data requires bioinformatic tools in order to identify genes that encode functional proteins or RNA. This is an important task, considering that even in the well-studied Escherichia coli more than 30% of the identified open reading frames are hypothetical genes. Future challenges of genome sequence analysis will include the understanding of gene regulation and metabolic pathway reconstruction including DNA chip technology, which holds tremendous potential for biomedicine and the biotechnological production of valuable compounds. The overwhelming volume of information often confuses scientists. This review intends to provide a guide to choosing the most efficient way to analyze a new sequence or to collect information on a gene or protein of interest by applying current publicly available databases and Web services. Recently developed tools that allow functional assignment of genes, mainly based on sequence similarity of the deduced amino acid sequence, using the currently available and increasing biological databases will be discussed.

 

Reidhaar-Olson, J. F., B. K. Rhees, et al. (2001). "Genomics approaches to drug discovery." J Cell Biochem Suppl Suppl 37: 110-9.

            New approaches to drug discovery have come about in recent years as a result of important advances in genomics and bioinformatics. The availability of genome-scale sequence data, the development of new tools for high-throughput gene expression monitoring, and improvements in the ability to analyze large data sets have revolutionized the field. In this article, we discuss three applications of genomics data in the drug discovery process: target discovery, prodrug strategies, and vaccine development.

 

Rifai, A., L. D. Dworkin, et al. (2001). "Genomic approaches to elucidating the pathophysiology of renal diseases." Zhonghua Yi Xue Za Zhi (Taipei) 64(10): 555-62.

            The physiological and pathological processes of the kidney as a whole can now be analyzed with a molecular precision at a genomic-scale. Using massively parallel cDNA microarray technology, the mRNA expression of thousands of genes can be quantified simultaneously. The advantages of microarray analyses include the ability to examine the interaction of several genes or the entire genome in a single experiment. Bioinformatics approaches such as data mining through mathematical condensation of the massive gene expression profiles are essential for elucidating molecular and biological logic underlying gene expression programs. Genes that encode similar protein components are often coordinately regulated. Recent application of gene expression profiling to the normal human renal cortical tissue, experimental in vitro and in vivo models has shown that cellular activation is accompanied by changes of hundreds of genes in parallel. The databases of gene expression emerging from these studies will be used to interpret the pathological changes in gene expression that accompany a variety of human renal diseases.

 

Riggins, G. J. (2001). "Using Serial Analysis of Gene Expression to identify tumor markers and antigens." Dis Markers 17(2): 41-8.

            Tumor markers and antigens are normally highly expressed in malignant tissue, but not in the surrounding normal tissue. Serial Analysis of Gene Expression (SAGE) is a technology that counts mRNA transcripts and can be used to find those genes most highly induced in malignant tissues. SAGE produces a comprehensive profile of gene expression and can be used to search for tumor biomarkers in a limited number of samples. Public sources of SAGE data, in particular through the Cancer Genome Anatomy Project, increase the value of this technology by making a large source of information on many tumors and normal tissues available for comparison. Although the perfect tumor-specific gene does not exist, the differences in gene expression between tumor and normal can be exploited for therapeutic or diagnostic purposes.

 

Rodriguez-Tome, P. (2001). "EBI databases and services." Mol Biotechnol 18(3): 199-212.

            The EMBL Outstation-European Bioinformatics Institute (EBI) is a center for research and services in bioinformatics. It serves researchers in molecular biology, genetics, medicine, and agriculture from academia, and the agricultural, biotechnology, chemical, and pharmaceutical industries. The Institute manages and makes available databases of biological data including nucleic acid, protein sequences, and macromolecular structures. It provides to this community bioinformatics services relevant to molecular biology free of charge over the Internet. Some of these databases and services are described in this review.

 

Rossignol, M. (2001). "Analysis of the plant proteome." Curr Opin Biotechnol 12(2): 131-4.

            For many years the analysis of plant proteomes has been restricted to the construction of descriptive catalogues or the search for markers. The analysis of plant proteomes is now gaining a functional dimension, however, because the focus has shifted onto well-defined plant-specific tissues and organelles, the simultaneous mining of proteomic and physiological data and specific methodological efforts.

 

Rost, B. (2001). "Review: protein secondary structure prediction continues to rise." J Struct Biol 134(2-3): 204-18.

            Methods predicting protein secondary structure improved substantially in the 1990s through the use of evolutionary information taken from the divergence of proteins in the same structural family. Recently, the evolutionary information resulting from improved searches and larger databases has again boosted prediction accuracy by more than four percentage points to its current height of around 76% of all residues predicted correctly in one of the three states, helix, strand, and other. The past year also brought successful new concepts to the field. These new methods may be particularly interesting in light of the improvements achieved through simple combining of existing methods. Divergent evolutionary profiles contain enough information not only to substantially improve prediction accuracy, but also to correctly predict long stretches of identical residues observed in alternative secondary structure states depending on nonlocal conditions. An example is a method automatically identifying structural switches and thus finding a remarkable connection between predicted secondary structure and aspects of function. Secondary structure predictions are increasingly becoming the work horse for numerous methods aimed at predicting protein structure and function. Is the recent increase in accuracy significant enough to make predictions even more useful? Because the recent improvement yields a better prediction of segments, and in particular of beta strands, I believe the answer is affirmative. What is the limit of prediction accuracy? We shall see.

 

Rusnak, J. M., R. M. Kisabeth, et al. (2001). "Pharmacogenomics: a clinician's primer on emerging technologies for improved patient care." Mayo Clin Proc 76(3): 299-309.

            Pharmacogenomics is a term recently coined to embody the concept of individualized and rational drug selection based on the genotype of a particular patient. Customization of drug therapy offers the potential for optimal safety and efficacy in an individual patient. Such a process contrasts current prescribing practices, which use medications shown to be safe and effective in patient populations or based on anecdotal experiences. Within patient populations, medications vary in their efficacy among individual patients. More importantly, a medication that is safe and effective in one patient may be ineffective or even harmful in another. Underlying many of these phenotypic differences are genotypic variants (polymorphisms) of key enzymes and proteins that affect the safety and efficacy of a drug in an individual patient. An understanding of these polymorphisms has the potential to enhance patient care by allowing physicians to customize the selection of medication to meet individual patient needs. Pharmacogenomics may also lead to improved compliance and shorter time to optimal disease management, thereby reducing morbidity and mortality. Significant cost savings could result from reductions in polypharmacy as well as from fewer physician encounters and hospitalizations for exacerbations of underlying illness and because of adverse drug reactions.

 

Ryan, B. M., T. J. Dougherty, et al. (2001). "Efflux in bacteria: what do we really know about it?" Expert Opin Investig Drugs 10(8): 1409-22.

            Efflux is the process in which bacteria transport compounds outside the cell which are potentially toxic, such as drugs or chemicals or compounds. Efflux pumps can be identified not only by biochemical, microbiological, or molecular means but with the availability of microbial genomic sequences, by the application of bioinformatics analysis of DNA sequences for key conserved structure motifs. Efflux has been identified as a relevant contributor to bacterial resistance in the clinic and is now recognised as one of the most important causes of intrinsic antibiotic resistance in bacteria, especially in Pseudomonas aeruginosa. With the recognition of efflux as a major factor in bacterial resistance, several companies have invested in the identification and development of bacterial efflux pump inhibitors. Among those, Microcide, Pfizer, Paratek and several academic laboratories are in the process of exploring efflux pump inhibitors from synthetic, natural products and peptidomimetics. Inhibiting bacterial efflux with a non-antibiotic inhibitor would restore activity of an antibiotic subject to efflux (similar to the use of beta-lactamase inhibitors to combat beta-lactamase production by bacteria). The feasibility of such an approach has been experimentally demonstrated in vitro and in vivo for efflux reversal of levofloxacin.

 

Sauer, U. (2001). "Evolutionary engineering of industrially important microbial phenotypes." Adv Biochem Eng Biotechnol 73: 129-69.

            The tremendous complexity of dynamic interactions in cellular systems often impedes practical applications of metabolic engineering that are largely based on available molecular or functional knowledge. In contrast, evolutionary engineering follows nature's 'engineering' principle by variation and selection. Thus, it is a complementary strategy that offers compelling scientific and applied advantages for strain development and process optimization, provided a desired phenotype is amenable to direct or indirect selection. In addition to simple empirical strain development by random mutation and direct selection on plates, evolutionary engineering also encompasses recombination and continuous evolution of large populations over many generations. Two distinct evolutionary engineering applications are likely to gain more relevance in the future: first, as an integral component in metabolic engineering of strains with improved phenotypes, and second, to elucidate the molecular basis of desired phenotypes for subsequent transfer to other hosts. The latter will profit from the broader availability of recently developed methodologies for global response analysis at the genetic and metabolic level. These methodologies facilitate identification of the molecular basis of evolved phenotypes. It is anticipated that, together with novel analytical techniques, bioinformatics, and computer modeling of cellular functions and activities, evolutionary engineering is likely to find its place in the metabolic engineer's toolbox for research and strain development. This review presents evolutionary engineering of whole cells as an emerging methodology that draws on the latest advances from a wide range of scientific and technical disciplines.

 

Schaffer, A. A., L. Aravind, et al. (2001). "Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements." Nucleic Acids Res 29(14): 2994-3005.

            PSI-BLAST is an iterative program to search a database for proteins with distant similarity to a query sequence. We investigated over a dozen modifications to the methods used in PSI-BLAST, with the goal of improving accuracy in finding true positive matches. To evaluate performance we used a set of 103 queries for which the true positives in yeast had been annotated by human experts, and a popular measure of retrieval accuracy (ROC) that can be normalized to take on values between 0 (worst) and 1 (best). The modifications we consider novel improve the ROC score from 0.758 +/- 0.005 to 0.895 +/- 0.003. This does not include the benefits from four modifications we included in the 'baseline' version, even though they were not implemented in PSI-BLAST version 2.0. The improvement in accuracy was confirmed on a small second test set. This test involved analyzing three protein families with curated lists of true positives from the non-redundant protein database. The modification that accounts for the majority of the improvement is the use, for each database sequence, of a position-specific scoring system tuned to that sequence's amino acid composition. The use of composition-based statistics is particularly beneficial for large-scale automated applications of PSI-BLAST.

 

Schultze, J. L. and R. H. Vonderheide (2001). "From cancer genomics to cancer immunotherapy: toward second-generation tumor antigens." Trends Immunol 22(9): 516-23.

            Clinically successful specific cancer immunotherapy depends on the identification of tumor-rejection antigens (Ags). Historically, tumor Ags have been identified by analyzing either T-cell or antibody responses of cancer patients against the autologous cancer cells. The unveiling of the sequence of the human genome, improved bioinformatics tools and optimized immunological analytical tools have made it possible to screen any given protein for immunogenic epitopes. Overexpressed genes in cancer can be identified by gene-expression profiling; immunogenic epitopes can be predicted based on HLA-binding motifs; candidate peptides can be identified by mass spectrometry of tumor-cell-derived HLA molecules; and peptide-specific T cells can be qualitatively and quantitatively analyzed at the single-cell level using ELISPOT and tetramer technologies. Here, we suggest that, based on these advancements, a new class of tumor Ags can be identified by directly linking cancer genomics to cancer immunology and immunotherapy.

 

Scott, H. S. and R. Chrast (2001). "Global transcript expression profiling by Serial Analysis of Gene Expression (SAGE)." Genet Eng (N Y) 23: 201-19.

           

Shirai, H. and K. Mizuguchi (2001). "[Genome analysis on the basis of protein structures]." Tanpakushitsu Kakusan Koso 46(11 Suppl): 1496-503.

           

Simpson, M. L., G. S. Sayler, et al. (2001). "Whole-cell biocomputing." Trends Biotechnol 19(8): 317-23.

            The ability to manipulate systems on the molecular scale naturally leads to speculation about the rational design of molecular-scale machines. Cells might be the ultimate molecular-scale machines and our ability to engineer them is relatively advanced when compared with our ability to control the synthesis and direct the assembly of man-made materials. Indeed, engineered whole cells deployed in biosensors can be considered one of the practical successes of molecular-scale devices. However, these devices explore only a small portion of cellular functionality. Individual cells or self-organized groups of cells perform extremely complex functions that include sensing, communication, navigation, cooperation and even fabrication of synthetic nanoscopic materials. In natural systems, these capabilities are controlled by complex genetic regulatory circuits, which are only partially understood and not readily accessible for use in engineered systems. Here, we focus on efforts to mimic the functionality of man-made information-processing systems within whole cells.

 

Sobral, B. W., H. Mangalam, et al. (2001). "Bioinformatics for rice resources." Novartis Found Symp 236: 59-81; discussion 81-4.

            The distinguishing feature of the 'new biology' is that it is information intensive. Not only does it demand access to and assimilation of vast data sets accumulated by engineered laboratory processes, but it also demands a previously unimaginable level of data integration across data types and sources. There are various information resources available for rice. In addition, there are various information resources that are not focused on rice but that contain rice data. The challenge for rice researchers and breeders is to access this wealth of data meaningfully. This challenge will grow significantly as international efforts aimed at sequencing the entire rice genome come into full swing. Only through concerted efforts in bioinformatics will the power of these public data be brought to bear on the needs of rice researchers and breeders worldwide. These efforts will need to focus on two large but distinct areas: (1) development of an effective bioinformatics infrastructure (hardware systems, software systems, and software engineers and support staff) and (2) computational biology research in visualization and analysis of very large, complex data sets, such as those that will be developed using high-throughput expression technologies, large-scale insertional mutagenesis, and biochemical profiling of various types. In the midst of the large flow of high-throughput data that the international rice genome sequencing efforts will produce, it is also imperative that integration of those data with unique germplasm data held in trust by the CGIAR be a part of the informatics infrastructure. This paper will focus on the state of rice information resources, the needs of the rice community, and some proposed bioinformatics activities to support these needs.

 

Southan, C. (2001). "A genomic perspective on human proteases." FEBS Lett 498(2-3): 214-8.

            Over 400 human proteases documented in secondary databases can already be delineated in genomic sequence. A Genome Ontology annotation of 30585 sequences in the provisional human proteome set recognises 498 proteases, i.e. 1.6%. Homology searches against finished sequence and comparisons between mouse and zebrafish are likely to increase this total. However, the data already indicate that the mechanistic class, sequence family and domain distribution of the genomic complement of proteases is unlikely to shift significantly from that already observed in the transcript data. Genomically derived novel sequences will require bioinformatic analysis and biochemical verification. The increasing availability of annotated genomic data will enable studies of splice variants, transcriptional control, polymorphisms, pseudogenes, inactive homologues and evolution. Comparative work on complete human protease families should produce a more integrated picture of their biochemistry and physiology. Genomic data will also lead to the identification of new protease involvement in disease processes and their evaluation as drug targets.

 

Sreekumar, K. R., L. Aravind, et al. (2001). "Computational analysis of human disease-associated genes and their protein products." Curr Opin Genet Dev 11(3): 247-57.

            The complete genome sequences for human, Drosophila melanogaster and Arabidopsis thaliana have been reported recently. With the availability of complete sequences for many bacteria and archaea, and five eukaryotes, comparative genomics and sequence analysis are enabling us to identify counterparts of many human disease genes in model organisms, which in turn should accelerate the pace of research and drug development to combat human diseases. Continuous improvement of specialized protein databases, together with sensitive computational tools, have enhanced the power and reliability of computational prediction of protein function.

 

Srinivas, P. R., S. Srivastava, et al. (2001). "Proteomics in early detection of cancer." Clin Chem 47(10): 1901-11.

            Early detection is critical in cancer control and prevention. Biomarkers help in this process by providing valuable information about a the status of a cell at any given point in time. As a cell transforms from nondiseased to neoplastic, distinct changes occur that could be potentially detected through the identification of the appropriate biomarkers. Biomarker research has benefited from advances in technology such as proteomics. We discuss here ongoing research in this field, focusing on proteomic technologies. The advances in two-dimensional electrophoresis and mass spectrometry are discussed in light of their contribution to biomarker research. Chip-based techniques, such as surface-enhanced laser desorption, and ionization and emerging methods, such as tissue and antibody arrays, are also discussed. The development of bioinformatic tools that have and are being developed in parallel to proteomics is also addressed. This report brings into focus the efforts of the Early Detection Research Network at the National Cancer Institute in harnessing scientific expertise from leading institutions to identify and validate biomarkers for early detection and risk assessment.

 

Strausberg, R. L., S. F. Greenhut, et al. (2001). "In silico analysis of cancer through the Cancer Genome Anatomy Project." Trends Cell Biol 11(11): S66-71.

            The Cancer Genome Anatomy Project (CGAP) was designed and implemented to provide public datasets, material resources and informatics tools to serve as a platform to support the elucidation of the molecular signatures of cancer. This overview of CGAP describes the status of this effort to develop resources based on gene expression, polymorphism identification and chromosome aberrations, and we describe a variety of analytical tools designed to facilitate in silico analysis of these datasets.

 

Strausberg, R. L. (2001). "The Cancer Genome Anatomy Project: new resources for reading the molecular signatures of cancer." J Pathol 195(1): 31-40.

            The Cancer Genome Anatomy Project (CGAP) has built informational, technological, and physical resources to interface genomics with basic and clinical cancer research. The CGAP web site (http://cgap.nci.nih.gov) provides informatics tools for in silico analysis of the CGAP datasets as well as information for accessing each of the CGAP resources. Published in 2001 by John Wiley & Sons, Ltd.

 

Suhnel, J. (2001). "Beyond nucleic acid base pairs: from triads to heptads." Biopolymers 61(1): 32-51.

            Hydrogen-bonded base pairs are an important determinant of nucleic acid structure and function. However, other interactions such as base-base stacking, base-backbone, and backbone-backbone interactions as well as effects exerted by the solvent and by metal or NH(4)(+) ions also have to be taken into account. In addition, hydrogen-bonded base complexes involving more than two bases can occur. With the rapidly increasing number and structural diversity of nucleic acid structures known at atomic detail higher-order hydrogen-bonded base complexes, base polyads, have attracted much interest. This review provides an overview on the occurrence of base polyads in nucleic acid structures and describes computational studies on these nucleic acid building blocks.

 

Sumner-Smith, M. (2001). "Beginning to manage drug discovery and development knowledge." Curr Opin Drug Discov Devel 4(3): 319-24.

            Knowledge management approaches and technologies are beginning to be implemented by the pharmaceutical industry in support of new drug discovery and development processes aimed at greater efficiencies and effectiveness. This trend coincides with moves to reduce paper, coordinate larger teams with more diverse skills that are distributed around the globe, and to comply with regulatory requirements for electronic submissions and the associated maintenance of electronic records. Concurrently, the available technologies have implemented web-based architectures with a greater range of collaborative tools and personalization through portal approaches. However, successful application of knowledge management methods depends on effective cultural change management, as well as proper architectural design to match the organizational and work processes within a company.

 

Tang, C. M. and E. R. Moxon (2001). "The impact of microbial genomics on antimicrobial drug development." Annu Rev Genomics Hum Genet 2: 259-69.

            There is an urgent need to develop novel classes of antibiotics to counter the threat of the spread of multiply resistant bacterial pathogens. The availability of the complete genome sequence of many pathogenic microbes provides information on every potential drug target and is an invaluable resource in the search for novel compounds. Here, we review the approaches being taken to exploit the genome databases through a combination of bioinformatics, transcriptional analysis, and a further understanding of the molecular basis of the disease process. The emphasis is changing from compound screening to target hunting, as the latter offers flexible ways to design and optimize the next generation of broad-spectrum antibiotics.

 

Terstappen, G. C. and A. Reggiani (2001). "In silico research in drug discovery." Trends Pharmacol Sci 22(1): 23-6.

            Target and lead discovery constitute the main components of today's early pharmaceutical research. The aim of target discovery is the identification and validation of suitable drug targets for therapeutic intervention, whereas lead discovery identifies novel chemical molecules that act on those targets. With the near completion of the human genome sequencing, bioinformatics has established itself as an essential tool in target discovery and the in silico analysis of gene expression and gene function are now an integral part of it, facilitating the selection of the most relevant targets for a disease under study. In lead discovery, advances in chemoinformatics have led to the design of compound libraries in silico that can be screened virtually. Moreover, computational methods are being developed to predict the drug-likeness of compounds. Thus, drug discovery is already on the road towards electronic R&D.

 

Thornton, J. M. (2001). "From genome to function." Science 292(5524): 2095-7.

           

Toda, T. (2001). "Proteome and proteomics for the research on protein alterations in aging." Ann N Y Acad Sci 928: 71-8.

            Functional decline of tissues in aged animals is a result of cellular aging. Though any process of somatic cell aging basically depends on genomic instructions, phenotypes of aged cells are expressed in a given internal environment of each cell type that was made with translated proteins and post-translationally modified products. Therefore, research on age-dependent protein alterations in each cell type is very important in clarifying mechanisms of aging. The novel term "proteome" is a compound of "protein" and "genome," which means constitutive whole proteins including post-translationally modified products in a cell type. Proteomics is a novel strategy for analyzing proteomes. In proteomics, high resolution two-dimensional electrophoresis is exclusively performed for isolation of proteins followed by mass spectrometry for identification of proteins and determination of modifications. Thus, proteomics is becoming appreciated as a powerful tool to find out proteins responsible for cellular aging, symptoms of senility and geriatric diseases.

 

Triche, T. J., D. Schofield, et al. (2001). "DNA microarrays in pediatric cancer." Cancer J 7(1): 2-15.

            Childhood cancer, like all cancer, is at heart a genetic disease. Consequently, fundamental understanding of the oncogenic process is likely to be beneficially addressed by genetic methodology. Current methods have largely focused on single-gene defects, like chimeric genes, which are present in many sarcomas and leukemias. Real understanding is more likely to derive from a genome-wide analysis of these malignancies. Recent technologic advances have made it possible to simultaneously assess the entire expressed gene profile, or transcriptome, of a given cancer. Foremost among these methods is gene expression profiling using DNA microarrays. Two basic approaches predominate: spotted arrays and photolithography arrays. Regardless of the method, the resulting information can be used to create disease profiles, but only if appropriate bioinformatic solutions are employed. Common analytic approaches include two-way expression comparisons, or scatter analyses; outlier gene analysis, to identify significantly dysregulated genes; dendrogram analyses, as pioneered by Eisen; cluster analyses to identify diagnostic or biologic groups; and various forms of functional analyses to identify relevant genes and biologic pathways. Studies of both adult and pediatric cancer have demonstrated the feasibility of such analyses to identify both diagnostic and prognostic groups of tumors. Acute childhood leukemias have been grouped into myelogenous and lymphoid, and even B- and T-cell subsets. Breast cancer prognostic groups have been identified on the basis of a small subset of expressed genes. In addition, preliminary data on childhood sarcomas appear to identify both diagnostic and prognostic subsets. Specifically, embryonal rhabdomyosarcoma could be distinguished from alveolar rhabdomyosarcoma, and even morphologically mixed embryonal and alveolar rhabdomyosarcoma showed similar gene expression profiles in both histologies. Further, collaborative studies using clustering analyses appear to identify prognostic groups of diverse sarcomas. Larger institutional and cooperative group studies are currently underway to validate these preliminary findings.

 

Tsuda, M. (2001). "[Molecular science of the living organism: the case of G-proteins]." Yakugaku Zasshi 121(7): 523-34.

            The concept of Molecular Science of the Living Organism was described, where the living state is explained as the purposive flows of the quantum mechanically controlled chemical reaction systems which support the homeostasis of the living organism. In the 21st century, the post genomic sequence era, the concept may be a self-evident truth. Molecular Science of the Living Organism was presented in the case of G-proteins: i.e., the atomically controlled mechanism of 1. the carcinogenesis which originates from the point mutation of ras p21, 2. the activation of a receptor protein at the cell membrane, especially in the case of bacteriorhodopsin, 3. the activation of an inactive G-protein by the activated receptor protein.

 

Tsujimoto, G. (2001). "['Millennium Project' of MHLW]." Nippon Rinsho 59(10): 1884-8.

            Human Genome Project (information and technology) provides insights so profound that it has the ability to change the way we understand, predict, prevent, diagnose, and treat disease. Because SNPs, single nucleotide polymorphisms, are the most common type of polymorphism, they can have significant effects on both susceptibility to disease as well as drug response. In Japan, 'Millennium Project' based on SNPs has started in 2000, and National Research Centers of MHLW are performing genome-wide association study on five common diseases (dementia, cancer, diabetes mellitus, hypertension, and asthma/allergy). National Children's Hospital group studies asthma/allergy and its drug therapy. Here the overview of 'Millennium Project' and the actual research framework of National Children's Hospital group is reviewed.

 

Tucker, C. L., J. F. Gera, et al. (2001). "Towards an understanding of complex protein networks." Trends Cell Biol 11(3): 102-6.

            Large-scale two-hybrid screens have generated a wealth of information describing potential protein--protein interactions. When compiled with data from systematic localizations of proteins, mutant screens and other functional tests, a network of interactions among proteins and between proteins and other components of eukaryotic cells can be deduced. These networks can be viewed as maps of the cell, depicting potential signaling pathways and interactive complexes. Most importantly, they provide potential clues to the function of previously uncharacterized proteins. Focusing on recent experiments, we explore these protein-interaction studies and the maps derived from such efforts.

 

van Pelt, J., A. van Ooyen, et al. (2001). "The need for integrating neuronal morphology databases and computational environments in exploring neuronal structure and function." Anat Embryol (Berl) 204(4): 255-65.

            Neurons connect to each other through a myriad of dendritic and axonal arborisations. Dendritic structures provide the substrate for integration of postsynaptic potentials and control of action potential generation. Axonal structures provide the substrate for action potential dissemination and signalling to target neurons. The morphological complexity of dendritic arborisations is assumed to play a critical role in the transformation of spatio-temporal patterns of postsynaptic potentials into time-structured series of action potentials. Although these transformations lie at the basis of information processing in the brain, it is still far from understood how their details are influenced by dendritic shape. To facilitate research in this area, it is necessary that data on both the morphology and electrical properties of neurons, as well as computational tools for analysis, become available in an integrated way. This requires a combined effort from the fields of informatics and neurosciences (together called neuroinformatics) in order to create data acquisition, databasing and computational tools. Focusing on neuronal morphology, this chapter will give a brief review of the current neuroinformatics developments in both reconstruction techniques, morphological quantification, modeling of morphological complexity, modeling of function and the need for databasing neuronal morphologies. Additionally, one of the dendritic modeling approaches is described in more detail in the Appendix.

 

van Wijk, K. J. (2001). "Challenges and prospects of plant proteomics." Plant Physiol 126(2): 501-8.

           

Verma, M., G. L. Wright, Jr., et al. (2001). "Proteomic approaches within the NCI early detection research network for the discovery and identification of cancer biomarkers." Ann N Y Acad Sci 945: 103-15.

            In the postgenome era, proteomics provides a powerful approach for the analysis of normal and transformed cell functions, for the identification of disease-specific targets, and for uncovering novel endpoints for the evaluation of chemoprevention agents and drug toxicity. Unfortunately, the genomic information that has greatly expounded the genetic basis of cancer does not allow an accurate prediction of what is actually occurring at the protein level within a given cell type at any given time. The gene expression program of a given cell is affected by numerous factors in the in vivo environment resulting from tissue complexity and organ system orchestration, with cells acting in concert with each other and responding to changes in their microenvironment. Repositories of genomic information can be considered master "inventory lists" of genes and their maps, which need to be supplemented with protein-derived information. The National Cancer Institute's Early Detection Research Network is employing proteomics, or "protein walking", in the discovery and evaluation of biomarkers for cancer detection and for the identification of high-risk subjects. Armed with microdissection techniques, including the use of Laser Capture Microdissection (LCM) to procure pure populations of cells directly from human tissue, the Network is facilitating the development of technologies that can overcome the problem of tissue heterogeneity and address the need to identify markers in easily accessible biological fluids. Proteomic approaches complement plasma-based assays of circulating DNA for cancer detection and risk assessment. LCM, coupled with downstream proteomics applications, such as two-dimensional polyacrylamide gel electrophoresis and SELDI (surface enhanced laser desorption ionization) separation followed by mass spectrometry (MS) analysis, may greatly facilitate the characterization and identification of protein expression changes that track normal and disease phenotypes. We highlight recent work from Network investigators to demonstrate the potential of proteomics to identify proteins present in cancer tissues and body fluids that are relevant for cancer screening.

 

Vidal, M. (2001). "A biological atlas of functional maps." Cell 104(3): 333-9.

           

Voigt, C. A., S. L. Mayo, et al. (2001). "Computationally focusing the directed evolution of proteins." J Cell Biochem Suppl Suppl 37: 58-63.

            Directed evolution has proven to be a successful strategy for the modification of enzyme properties. To date, the preferred experimental procedure has been to apply mutations or crossovers randomly throughout the gene. With the emergence of powerful computational methods, it has become possible to develop focused combinatorial searches, guided by computer algorithms. Here, we describe several computational methods that have emerged to aid the optimization of mutant libraries, the targeting of specific residues for mutagenesis, and the design of recombination experiments.

 

Walke, D. W., C. Han, et al. (2001). "In vivo drug target discovery: identifying the best targets from the genome." Curr Opin Biotechnol 12(6): 626-31.

            A vast number of genes of unknown function threaten to clog drug discovery pipelines. To develop therapeutic products from novel genomic targets, it will be necessary to correlate biology with gene sequence information. Industrialized mouse reverse genetics is being used to determine gene function in the context of mammalian physiology and to identify the best targets for drug development.

 

Wallace, B. A. and R. W. Janes (2001). "Synchrotron radiation circular dichroism spectroscopy of proteins: secondary structure, fold recognition and structural genomics." Curr Opin Chem Biol 5(5): 567-71.

            Recent developments in instrumentation and bioinformatics show that the technique of synchrotron radiation circular dichroism spectroscopy can provide novel information on protein secondary structures and folding motifs, and has the potential to play an important role in structural genomics studies, both as a means of target selection and as a high-throughput, low-sample-requiring screening method. This is possible because of the additional information content in the low-vacuum ultraviolet wavelength data obtainable with intense synchrotron radiation light sources, compared with that present in spectra from conventional lab-based circular dichroism instruments.

 

Wang, J. Z. (2001). "Wavelets and imaging informatics: a review of the literature." J Biomed Inform 34(2): 129-41.

            Modern medicine is a field that has been revolutionized by the emergence of computer and imaging technology. It is increasingly difficult, however, to manage the ever-growing enormous amount of medical imaging information available in digital formats. Numerous techniques have been developed to make the imaging information more easily accessible and to perform analysis automatically. Among these techniques, wavelet transforms have proven prominently useful not only for biomedical imaging but also for signal and image processing in general. Wavelet transforms decompose a signal into frequency bands, the width of which are determined by a dyadic scheme. This particular way of dividing frequency bands matches the statistical properties of most images very well. During the past decade, there has been active research in applying wavelets to various aspects of imaging informatics, including compression, enhancements, analysis, classification, and retrieval. This review represents a survey of the most significant practical and theoretical advances in the field of wavelet-based imaging informatics.

 

Watkins, S. M., B. D. Hammock, et al. (2001). "Individual metabolism should guide agriculture toward foods for improved health and nutrition." Am J Clin Nutr 74(3): 283-6.

            Genomics and bioinformatics have the vast potential to identify genes that cause disease by investigating whole-genome databases. Comparison of an individual's geno-type with a genomic database will allow the prescription of drugs to be tailored to an individual's genotype. This same bioinformatic approach, applied to the study of human metabolites, has the potential to identify and validate targets to improve personalized nutritional health and thus serve to define the added value for the next generation of foods and crops. Advances in high-throughput analytic chemistry and computing technologies make the creation of a vast database of metabolites possible for several subsets of metabolites, including lipids and organic acids. In creating integrative databases of metabolites for bioinformatic investigation, the current concept of measuring single biomarkers must be expanded to 3 dimensions to 1) include a highly comprehensive set of metabolite measurements (a profile) by multiparallel analyses, 2) measure the metabolic profile of individuals over time rather than simply in the fasted state, and 3) integrate these metabolic profiles with genomic, expression, and proteomic databases. Application of the knowledge of individual metabolism will revolutionize the ability of nutrition to deliver health benefits through food in the same way that knowledge of genomics will revolutionize individual treatment of dis-ease with pharmaceuticals.

 

Weir, M., M. Swindells, et al. (2001). "Insights into protein function through large-scale computational analysis of sequence and structure." Trends Biotechnol 19(10 Suppl): S61-6.

            Functional genomic and proteomic technologies are producing biological data relating to hundreds, or even thousands of proteins per experiment. Rapid and accurate computational analysis of the molecular function of these proteins is therefore crucial in order to interpret these data and prioritize further experiments.

 

Werner, T. (2001). "Cluster analysis and promoter modelling as bioinformatics tools for the identification of target genes from expression array data." Pharmacogenomics 2(1): 25-36.

            Expression arrays yield enormous amounts of data linking genes, via their cDNA sequences, to gene expression patterns. This now allows the characterisation of gene expression in normal and diseased tissues, as well as the response of tissues to the application of therapeutic reagents. Expression array data can be analysed with respect to the underlying protein sequences, which facilitates the precise determination of when and where certain groups of genes are expressed. More recent developments of clustering algorithms take additional parameters of the experimental set-up into account, focusing more directly on co-regulated set of genes. However, the information concerning transcriptional regulatory networks responsible for the observed expression patterns is not contained within the cDNA sequences used to generate the arrays. Regulation of expression is determined to a large extent by the promoter sequences of the individual genes (and/or enhancers). The complete sequence of the human genome now provides the molecular basis for the identification of many regulatory regions. Promoter sequences for specific cDNAs can be obtained reliably from genomic sequences by exon mapping. In the many cases in which cDNAs are 5'-incomplete, high quality promoter prediction tools can be used to locate promoters directly in the genomic sequence. Once sufficient numbers of promoter sequences have been obtained, a comparative promoter analysis of the co-regulated genes and groups of genes can be applied in order to generate models describing the higher order levels of transcription factor binding site organisation within these promoter regions. Such modules represent the molecular mechanisms through which regulatory networks influence gene expression, and candidates can be determined solely by bioinformatics. This approach also provides a powerful alternative for elucidating the functional features of genes with no detectable sequence similarity, by linking them to other genes on the basis of their common promoter structures.

 

Werner, T. (2001). "Target gene identification from expression array data by promoter analysis." Biomol Eng 17(3): 87-94.

            DNA microchips and expression arrays yield enormous amounts of data linking cDNA sequences to gene expression patterns. This now allows the characterization of gene expression in normal and diseased tissues as well as the response of tissues to the application of therapeutic reagents. Software currently exists to analyze DNA array/chip data with respect to corresponding mRNA sequences, which facilitates the precise determination of when and where certain groups of genes are expressed. The information concerning transcriptional regulatory networks responsible for the observed expression patterns is not contained within the cDNA sequences used to generate the arrays, but resides often within the promoter sequences of the individual genes (and/or enhancers). The complete sequence of the human genome will provide the molecular basis for the identification of such regulatory regions. Promoter sequences for specific cDNAs can be obtained reliably from genomic sequences simply by exon mapping. Promoter prediction tools can also be used to locate promoters directly in the genomic sequence in many cases in which cDNAs are 5'-incomplete. Once sufficient numbers of promoter sequences have been obtained, the comparative promoter analysis of the co-regulated genes and groups of genes can be applied in order to generate models describing the higher order levels of the transcription factor binding site organization within these promoter regions. As evident from several examples, this approach can identify promoter modules responsible for the common regulation of promoters solely by the application of bioinformatics methods. Such modules represent the molecular mechanisms through which regulatory networks influence gene expression. Another advantage of this approach is that it also provides a powerful alternative for elucidating functional features of genes with no detectable sequence similarity, by linking them to other genes on the basis of their common promoter structures.

 

West, M. J. (2001). "Design based stereological methods for estimating the total number of objects in histological material." Folia Morphol (Warsz) 60(1): 11-9.

            The principle that formed the basis of the most popular "assumption based" stereological methods for counting cells that were available prior to the advent of the more recently developed "design based" methods will be described in general terms. The major weaknesses inherent in the older methods will be described, along with how they have been eliminated by the design based methods.

 

Wishart, D. S. and S. Fortin (2001). "The BioTools Suite. A comprehensive suite of platform-independent bioinformatics tools." Mol Biotechnol 19(1): 59-77.

            The BioTools Suite is a set of three comprehensive, platform-independent software packages (PepTool, GeneTool, and ChromaTool) developed for sequence assembly and analysis. In addition to supporting a large number of standard bioinformatics functions, these programs also incorporate a number of useful innovations including uniform graphical-user interface (GUI) design, direct internet connectivity, a novel approach to feature annotation, and a variety of enhanced algorithms for large scale proteome and genome analysis. This article describes the key features, recent changes, and general operation of all three programs.

 

Wishart, M. J., G. S. Taylor, et al. (2001). "PTEN and myotubularin phosphoinositide phosphatases: bringing bioinformatics to the lab bench." Curr Opin Cell Biol 13(2): 172-81.

            Phosphoinositides play an integral role in a diverse array of cellular signaling processes. Although considerable effort has been directed toward characterizing the kinases that produce inositol lipid second messengers, the study of phosphatases that oppose these kinases remains limited. Current research is focused on the identification of novel lipid phosphatases such as PTEN and myotubularin, their physiologic substrates, signaling pathways and links to human diseases. The use of bioinformatics in conjunction with genetic analyses in model organisms will be essential in elucidating the roles of these enzymes in regulating phosphoinositide-mediated cellular signaling.

 

Woolfson, D. N. (2001). "Core-directed protein design." Curr Opin Struct Biol 11(4): 464-71.

            For various reasons, it seems sensible to redesign or design proteins from the inside out. Past approaches in this field have involved iterations of mutagenesis and characterisation to 'evolve' designs. Increasingly, combinatorial approaches are being taken to select 'fit' sequences from libraries of variant proteins. In particular, in silico methods have been used to good effect. More recently, experimental methods have been developed and improved. We are now in a position to redesign stability and function into natural protein frameworks confidently and to attempt de novo designs for more ambitious targets.

 

Wu, T. D. (2001). "Analysing gene expression data from DNA microarrays to identify candidate genes." J Pathol 195(1): 53-65.

            Microarray data analysis can be divided into two tasks: grouping of genes to discover broad patterns of biological behaviour, and filtering of genes to identify specific genes of interest. Whereas the gene-grouping task is largely addressed by cluster analysis, the gene-filtering task relies primarily on hypothesis testing. This review article surveys analytical methods for the gene-filtering task. Various types of data analysis are discussed for four basic types of experimental protocols: a comparison of two biological samples; a comparison of two biological conditions; each represented by a set of replicate samples; a comparison of multiple biological conditions; and analysis of covariate information.

 

Yagil, Y. and C. Yagil (2001). "Genetic models of hypertension in experimental animals." Exp Nephrol 9(1): 1-9.

            Genetic animal models are central to ongoing efforts to elucidate the pathophysiology and genetic basis of hypertension. The rat is the leading species in experimental hypertension. Several rat models of hypertension are available for research, including inbred strains, congenic lines, transgenic animals and recombinant inbred strains. Each of these models has been designed to express different phenotypes, including spontaneous hypertension, salt sensitivity, stress sensitivity and susceptibility to end-organ damage. All these models have been extremely useful in the search for the physiological mechanisms that underlie hypertension, but some of them have been specifically designed for detecting the hypertension genes. This latter task is extremely complex in spontaneous hypertension, but genetic animal models may simplify the task by enabling to focus on specific phenotypes. Despite intensive efforts over nearly 3 decades, the genetic basis of hypertension has not been unveiled so far in the rat or in other species. Recent dense mapping of the rat genome, the development of new strategies and technologies in molecular genetics including differential gene expression, expressed sequence tags and DNA biochips render hope that the formidable task of identification of new candidate genes in hypertension will move another major step forward. Once these genes are identified, their function and role in hypertension will have to be determined, utilizing functional genomic strategies and bioinformatics. Finally, the findings in genetic animal models of hypertension will have to be extrapolated to humans by homology and syntenic mapping strategies.

 

Yamamoto, M. and M. Nakao (2001). "[Bioinformatics and physiology--measurement, analysis, and interpretation of biological data]." Nippon Seirigaku Zasshi 63(1): 13-6.

           

Yao, T. (2001). "[Recent movements on the bioinformatics in USA: from the genome informatics to the systems biology]." Tanpakushitsu Kakusan Koso 46(12): 1886-92.

           

Yaspo, M. L. (2001). "Taking a functional genomics approach in molecular medicine." Trends Mol Med 7(11): 494-501.

            The elucidation of genetic components of human diseases at the molecular level provides crucial information for developing future causal therapeutic intervention. High-throughput genome sequencing and systematic experimental approaches are fuelling strategic programs designed to investigate gene function at the biochemical, cellular and organism levels. Bioinformatics is one important tool in functional genomics, although showing clear limitations in predicting ab initio gene structures, gene function and protein folds from raw sequence data. Systematic large-scale data-set generation, using the same type of experiments that are used to decipher the function of single genes, are being applied on entire genomes. Comparative genomics, establishment of gene catalogues, and investigation of cellular and tissue molecular profiles are providing essential tools for understanding gene function in complex biological networks.

 

Yoshie, O., T. Imai, et al. (2001). "Chemokines in immunity." Adv Immunol 78: 57-110.

            Chemokines are a superfamily of small, heparin-binding cytokines that induce directed migration of various types of leukocytes through interactions with a group of seven-transmembrane G protein-coupled receptors. At present, over 40 members have been identified in humans. Until a few years ago, chemokines were mainly known as potent attractants for leukocytes such as neutrophils and monocytes, and were thus mostly regarded as the mediators of acute and chronic inflammatory responses. They had highly complex ligand-receptor relationships and their genes were regularly mapped on chromosomes 4 and 17 in humans. Recently, novel chemokines have been identified in rapid succession, mostly through application of bioinformatics on expressed sequence tag databases. A number of surprises have followed the identification of novel chemokines. They are constitutively expressed in lymphoid and other tissues with individually characteristic patterns. Most of them turned out to be highly specific for lymphocytes and dendritic cells. They have much simpler ligand-receptor relationships, and their genes are mapped to chromosomal loci different from the traditional chemokine gene clusters. Thus, the emerging chemokines are functionally and genetically quite different from the classical "inflammatory chemokines" and may be classified as "immune (system) chemokines" because of their profound importance in the genesis, homeostasis and function of the immune system. The emergence of immune chemokines has brought about a great deal of impact on the current immunological research, leading us to a better understanding on the fine traffic regulation of lymphocytes and dendritic cells. The immune chemokines and their receptors are also likely to be important future targets for therapeutic intervention of our immune responses.

 

Zagursky, R. J. and D. Russell (2001). "Bioinformatics: use in bacterial vaccine discovery." Biotechniques 31(3): 636, 638, 640, passim.

            Bioinformatics has now become a common laboratory name for groups studying genomic sequences. It is composed of many different, yet interrelated scientific fields such as genomics, proteomics, and transcriptional profiling. The availability of complete genomic sequences, especially prokaryotic organisms, allows one to rapidly identify, analyze, and clone genes of interest. For bacterial vaccine discovery, one can "mine" the genomic sequence for potential surface targets using various algorithms, characterize these gene targets, and produce primers for cloning, all before one enters the wet laboratory. This review will focus on various genomic mining tools/algorithms available for predicting open reading frames and their associated annotation (if known), physical and functional characterization, and cellular localization. Finally, examples are given of how all of this is being used for the identification of potential bacterial vaccine candidates.

 

Zavolan, M. and T. B. Kepler (2001). "Statistical inference of sequence-dependent mutation rates." Curr Opin Genet Dev 11(6): 612-5.

            Several lines of research are now converging towards an integrated understanding of mutational mechanisms and their evolutionary implications. Experimentally, crystal structures reveal the effect of sequence context on polymerase fidelity; large-scale sequencing projects generate vast amounts of sequence polymorphism data; and locus-specific databases are being constructed. Computationally, software and analytical tools have been developed to analyze mutational data, to identify mutational hot spots, and to compare the signatures of mutagenic agents.

  

Back to Top