|
Home
About Us
eMedicine Search
Drug Development
Feedback
Google Scholar Search
Intranet |
|
Enhanced by
Neuroinformation Bioinformatics Reviews: 2002 (187 References) Altman, R. B. and T. E. Klein (2002). "Challenges for biomedical informatics and pharmacogenomics." Annu Rev Pharmacol Toxicol 42: 113-33. Pharmacogenomics requires the integration and analysis of genomic, molecular, cellular, and clinical data, and it thus offers a remarkable set of challenges to biomedical informatics. These include infrastructural challenges such as the creation of data models and databases for storing these data, the integration of these data with external databases, the extraction of information from natural language text, and the protection of databases with sensitive information. There are also scientific challenges in creating tools to support gene expression analysis, three-dimensional structural analysis, and comparative genomic analysis. In this review, we summarize the current uses of informatics within pharmacogenomics and show how the technical challenges that remain for biomedical informatics are typical of those that will be confronted in the postgenomic era.
Andrews, P. D., I. S. Harper, et al. (2002). "To 5D and beyond: quantitative fluorescence microscopy in the postgenomic era." Traffic 3(1): 29-36. Digital fluorescence microscopy is now a standard technology for assaying molecular localisation in cells and tissues. The choice of laser scanning (LSM) and wide-field microscopes (WFM) largely depends on the type of sample, with LSMs performing best on thick samples and WFMs performing best on thin ones. These systems are increasingly used to collect large multidimensional datasets. We propose a unified image structure that considers space, time, and fluorescence wavelength as integral parts of the image. Moreover, the application of fluorescence imaging to large-scale screening means that large datasets are now routinely acquired. We propose that analysis of these data requires querying tools based on relational databases and describe one such system.
Aravind, L. and L. M. Iyer (2002). "Intraproteomic networks: new forays into predicting interaction partners." Genome Res 12(8): 1156-8.
Bains, W., R. Gilbert, et al. (2002). "Evolutionary computational methods to predict oral bioavailability QSPRs." Curr Opin Drug Discov Devel 5(1): 44-51. This review discusses evolutionary and adaptive methods for predicting oral bioavailability (OB) from chemical structure. Genetic Programming (GP), a specific form of evolutionary computing, is compared with some other advanced computational methods for OB prediction. The results show that classifying drugs into 'high' and 'low' OB classes on the basis of their structure alone is solvable, and initial models are already producing output that would be useful for pharmaceutical research. The results also suggest that quantitative prediction of OB will be tractable. Critical aspects of the solution will involve the use of techniques that can: (i) handle problems with a very large number of variables (high dimensionality); (ii) cope with 'noisy' data; and (iii) implement binary choices to sub-classify molecules with behavior that are qualitatively different. Detailed quantitative predictions will emerge from more refined models that are hybrids derived from mechanistic models of the biology of oral absorption and the power of advanced computing techniques to predict the behavior of the components of those models in silico.
Bancroft, I. (2002). "Insights into cereal genomes from two draft genome sequences of rice." Genome Biol 3(6): REVIEWS1015. Draft genome sequences have been reported for two subspecies of rice. The drafts include the sequences of an estimated 99% of all rice genes and provide major advances in our understanding of the content and complexity of cereal genomes in general and the rice genome in particular.
Baron, M. (2002). "Manic-depression genes and the new millennium: poised for discovery." Mol Psychiatry 7(4): 342-58. Manic-depressive illness is a common psychiatric disorder with complex etiology that likely involves multiple genes and non-genetic influences. The uncertain path to gene discovery has spurred considerable debate over genetic findings and gene-finding strategies. In this article, I review the main findings, with a focus on: (1) putative linked loci on chromosomes 1q31-32, 4p16, 6pter-p24, 10p14, 10q21-26, 12q23-24, 13q31-32, 18p11, 18q21-23, 21q22, 22q11-13, and Xq24-28; and (2) association studies with candidate genes, dynamic mutations, mitochondrial mutations, and chromosomal aberrations. Although no gene has been identified, promising findings are emerging. I then discuss the challenges and opportunities ahead, with special emphasis on gene-finding methods-in particular, questions pertaining to phenotype definition, linkage and association mapping, gene markers, sampling, study population, multigene systems, lessons from other disorders, animal models, and bioinformatics. The progress to date, together with rapid advances in genomics, analytical and computational methods, and bioinformatics, holds promise for new insights into the genetics of manic-depression, in the new millennium.
Barratt, C. L., D. C. Hughes, et al. (2002). "Functional genomics in reproductive medicine." Hum Fertil (Camb) 5(1): 3-5. The British Fertility Society organised a workshop on Functional Genomics in Reproductive Medicine at the University of Birmingham on 13-14 September 2001. The primary aim was to inform delegates about the power of the technology that has been made available after completion of the sequencing of the human genome, and to stimulate debate about using functional genomics to address both clinical and scientific questions in reproductive medicine. Three specific areas were addressed: proteomics, gene expression and bioinformatics. Although the sophistication and plethora of techniques available were obvious, major limitations in the technology were also discussed. The future promises to be very challenging indeed.
Bayat, A. (2002). "Science, medicine, and the future: Bioinformatics." Bmj 324(7344): 1018-22.
Beutler, B. and M. Rehli (2002). "Evolution of the TIR, tolls and TLRs: functional inferences from computational biology." Curr Top Microbiol Immunol 270: 1-21. The mammalian toll-like receptors (TLRs) are products of an evolutionary process that began prior to the separation of plants and animals. The most conserved protein motif within the TLRs is the TIR, which denotes Toll, the Interleukin-1 receptor, and plant disease Resistance genes. To trace the ancestry of the TLRs, it is desirable to draw upon the sequences of TIR domains from TLRs of diverse vertebrate species, including species with known dates of divergence (i.e., representatives of Mammalia and Aves) in order to establish a relationship between time and genetic divergence. It appears that a gene ancestral to modern TLRs 1 and 6 duplicated approximately 130 million years ago, only shortly before the speciation event that led to humans and mice. Though it is not represented in mice, TLR10 split from the TLR[1/6] precursor about 300 million years ago. The origins of other TLRs are more ancient, dating to the origins of vertebrate life, and some present-day vertebrate species appear to have many more TLRs than others. Moreover, the patterns of TLR expression are quite variable at the level of tissues, even among closely related species. A given TLR in species that are related by descent from a common ancestor may acquire different duties within each descendant line, so that some microbial inducers are avidly recognized in one species but not in others; likewise the intensity and the antomic location of an innate immune response may vary considerably. In this review, we discuss the computational methods used to analyze divergence of the TIR, and the conclusions that may be safely drawn.
Bevan, M. (2002). "Genomics and plant cells: application of genomics strategies to Arabidopsis cell biology." Philos Trans R Soc Lond B Biol Sci 357(1422): 731-6. In this review I seek to describe how the complete catalogue of plant genes and proteins, revealed by genome sequencing, can provide novel insights into cell biology. Many new analytical methods have been developed to digest the flood of genome sequence data, including analysis of the transcriptome, proteome and metabolites. High-throughput analysis of protein targeting and other methods will ascribe new information to proteins and create important links with other large datasets. To fulfil the potential revealed by this genomic information, many challenges have to be met. Among these are organizational changes needed to create common datasets accessible to all scientists, and bioinformatics solutions to capture and integrate diverse datasets. Once harnessed, these new strategies will irrevocably change the way we conduct plant science.
Bickmore, W. A. and H. G. Sutherland (2002). "Addressing protein localization within the nucleus." Embo J 21(6): 1248-54. Bridging the gap between the number of gene sequences in databases and the number of gene products that have been functionally characterized in any way is a major challenge for biology. A key characteristic of proteins, which can begin to elucidate their possible functions, is their subcellular location. A number of experimental approaches can reveal the subcellular localization of proteins in mammalian cells. However, genome databases now contain predicted sequences for a large number of potentially novel proteins that have yet to be studied in any way, let alone have their subcellular localization determined. Here we ask whether using bioinformatics tools to analyse the sequence of proteins whose subnuclear localizations have been determined can reveal characteristics or signatures that might allow us to predict localization for novel protein sequences.
Blaschke, C., L. Hirschman, et al. (2002). "Information extraction in molecular biology." Brief Bioinform 3(2): 154-65. Information extraction has become a very active field in bioinformatics recently and a number of interesting papers have been published. Most of the efforts have been concentrated on a few specific problems, such as the detection of protein-protein interactions and the analysis of DNA expression arrays, although it is obvious that there are many other interesting areas of potential application (document retrieval, protein functional description, and detection of disease-related genes to name a few). Paradoxically, these exciting developments have not yet crystallised into general agreement on a set of standard evaluation criteria, such as the ones developed in fields such as protein structure prediction, which makes it very difficult to compare performance across these different systems. In this review we introduce the general field of information extraction, we outline the status of the applications in molecular biology, and we then discuss some ideas about possible standards for evaluation that are needed for the future development of the field.
Bloom, G. C., P. Gieser, et al. (2002). "Linking image quantitation and data analysis." Methods Mol Biol 184: 15-27.
Bornberg-Bauer, E. and N. W. Paton (2002). "Conceptual data modelling for bioinformatics." Brief Bioinform 3(2): 166-80. Current research in the biosciences depends heavily on the effective exploitation of huge amounts of data. These are in disparate formats, remotely dispersed, and based on the different vocabularies of various disciplines. Furthermore, data are often stored or distributed using formats that leave implicit many important features relating to the structure and semantics of the data. Conceptual data modelling involves the development of implementation-independent models that capture and make explicit the principal structural properties of data. Entities such as a biopolymer or a reaction, and their relations, eg catalyses, can be formalised using a conceptual data model. Conceptual models are implementation-independent and can be transformed in systematic ways for implementation using different platforms, eg traditional database management systems. This paper describes the basics of the most widely used conceptual modelling notations, the ER (entity-relationship) model and the class diagrams of the UML (unified modelling language), and illustrates their use through several examples from bioinformatics. In particular, models are presented for protein structures and motifs, and for genomic sequences.
Braam, G. B., H. A. Bluyssen, et al. (2002). "[Gene-expression analysis using DNA microarrays]." Ned Tijdschr Geneeskd 146(40): 1867-73. Parallel to the efforts to unravel the human genome code, techniques are currently being developed to analyse the activity of all genes and proteins in a cell population or tissue. The most advanced of these functional genomic techniques is that used to study gene expression using DNA microarrays, also known as 'DNA chips'. This allows the expression of thousands of different genes to be compared in two different samples (for example, one from a sick person and one from a healthy one). Bioinformatics is essential in this technique. The expression profiles obtained in this way can be used to characterise complex biological situations (e.g., cell division and apoptosis) and diseases. There have already been reports on the opportunities in the diagnostic work-up for leukaemias and breast cancer. There are also applications on the more basic level, such as discovering precisely how the transcription apparatus works, and finding new genes and identifying their role. The use of microarrays in medicine is still in its infancy. It is anticipated that this and similar genome-wide analysis techniques will help in the elucidation of pathophysiological mechanisms, in making diagnoses and prognoses, and in monitoring treatment. The justifiable enthusiasm should, however, be accompanied by quality control, international standardisation and a critical approach towards the interpretation of results.
Breinbauer, R., I. R. Vetter, et al. (2002). "From protein domains to drug candidates-natural products as guiding principles in the design and synthesis of compound libraries." Angew Chem Int Ed Engl 41(16): 2879-90. In the continuing effort to find small molecules that alter protein function and ultimately might lead to new drugs, combinatorial chemistry has emerged as a very powerful tool. Contrary to original expectations that large libraries would result in the discovery of many hit and lead structures, it has been recognized that the biological relevance, design, and diversity of the library are more important. As the universe of conceivable compounds is almost infinite, the question arises: where is a biologically validated starting point from which to build a combinatorial library? Nature itself might provide an answer: natural products have been evolved to bind to proteins. Recent results in structural biology and bioinformatics indicate that the number of distinct protein families and folds is fairly limited. Often the same structural domain is used by many proteins in a more or less modified form created by divergent evolution. Recent progress in solid-phase organic synthesis has enabled the synthesis of combinatorial libraries based on the structure of complex natural products. It can be envisioned that natural-product-based combinatorial synthesis may permit hit or lead compounds to be found with enhanced probability and quality.
Brendel, V. and W. Zhu (2002). "Computational modeling of gene structure in Arabidopsis thaliana." Plant Mol Biol 48(1-2): 49-58. Computational gene identification by sequence inspection remains a challenging problem. For a typical Arabidopsis thaliana gene with five exons, at least one of the exons is expected to have at least one of its borders predicted incorrectly by ab initio gene finding programs. More detailed analysis for individual genomic loci can often resolve the uncertainty on the basis of EST evidence or similarity to potential protein homologues. Such methods are part of the routine annotation process. However, because the EST and protein databases are constantly growing, in many cases original annotation must be re-evaluated, extended, and corrected on the basis of the latest evidence. The Arabidopsis Genome Initiative is undertaking this task on the whole-genome scale via its participating genome centers. The current Arabidopsis genome annotation provides an excellent starting point for assessing the protein repertoire of a flowering plant. More accurate whole-genome annotation will require the combination of high-throughput and individual gene experimental approaches and computational methods. The purpose of this article is to discuss tools available to an individual researcher to evaluate gene structure prediction for a particular locus.
Brive, L. and R. Abagyan (2002). "Computational structural proteomics." Ernst Schering Res Found Workshop(38): 149-66.
Brizuela, L., A. Richardson, et al. (2002). "The FLEXGene repository: exploiting the fruits of the genome projects by creating a needed resource to face the challenges of the post-genomic era." Arch Med Res 33(4): 318-24. Thanks to the results of the multiple completed and ongoing genome sequencing projects and to the newly available recombination-based cloning techniques, it is now possible to build gene repositories with no precedent in their composition, formatting, and potential. This new type of gene repository is necessary to address the challenges imposed by the post-genomic era, i.e., experimentation on a genome-wide scale. We are building the FLEXGene (Full Length EXpression-ready) repository. This unique resource will contain clones representing the complete ORFeome of different organisms, including Homo sapiens as well as several pathogens and model organisms. It will consist of a comprehensive, characterized (sequence-verified), and arrayed gene repository. This resource will allow full exploitation of the genomic information by enabling genome-wide scale experimentation at the level of functional/phenotypic assays as well as at the level of protein expression, purification, and analysis. Here we describe the rationale and construction of this resource and focus on the data obtained from the Saccharomyces cerevisiae project.
Buchanan, S. G. (2002). "Structural genomics: bridging functional genomics and structure-based drug design." Curr Opin Drug Discov Devel 5(3): 367-81. Considerable advances in structural genomics have been witnessed in the last year. Several pilot studies have begun to report their initial results, and new centers have been funded to join the endeavor. The legacies of the genome sequencing efforts, namely high-throughput molecular biology and whole-organism genome sequences, have been integrated as front-end modules for structural genomics pipelines. Impressive advances have been made in NMR spectroscopy and X-ray crystallography. New methods in structural bioinformatics and computational chemistry have been published that provide the means to exploit the wealth of new information in drug discovery. Not surprisingly, the biopharmaceutical industry has been quick to recognize the benefits of these new developments and has begun to adopt them. This article reviews recent results from structural genomics initiatives and the potential applications of new information and technologies in the drug discovery process.
Buchanan, S. G., J. M. Sauder, et al. (2002). "The promise of structural genomics in the discovery of new antimicrobial agents." Curr Pharm Des 8(13): 1173-88. Structural Genomics stands out among the emerging fields of proteomics since it influences the drug discovery process at so many points. Recent developments in protein expression technologies, x-ray crystallography and NMR spectroscopy provide the essential elements for high-throughput structure determination platforms. Bioinformatics methods to interrogate the resulting data will provide comprehensive, genome-wide databases of protein structure. Genomic sequencing and methods for high-throughput expression and protein purification are furthest advanced for microbial genes and so these have been the early targets for structural genomics initiatives. The information will be invaluable in understanding gene function, designing broad-spectrum small molecule inhibitors and in better understanding drug-host interactions.
Cacabelos, R. (2002). "Pharmacogenomics in Alzheimer's disease." Mini Rev Med Chem 2(1): 59-84. Alzheimer's disease (AD) is a complex disorder associated with multiple genetic defects either mutational or of susceptibility. Information available on AD genetics does not explain in full the etiopathogenesis of AD, suggesting that environmental factors and/or epigenetic phenomena may also contribute to AD pathology and phenotypic expression of dementia. The genomics of AD is still in its infancy, but is helping to understand novel aspects of the disease including genetic epidemiology, multifactorial risk factors, pathogenic mechanisms associated with genetic networks and genetically-regulated metabolic cascades. AD genomics is also helping to develop new strategies in pharmacogenomic research and prevention. Functional genomics, proteomics, pharmacogenomics, high-throughput methods, combinatorial chemistry and modern bioinformatics will greatly contribute to accelerate drug development for AD and other complex disorders. Main genes involved in AD include mutational loci (APP, PS1, PS2, TAU) and multiple susceptibility loci (APOE, A2M, AACT, LRP1, IL1A, TNF, ACE, BACE, BCHE, CST3, MTHFR, GSK3B, NOS) distributed across the human genome. Genomic associations integrate bigenic, trigenic, tetragenic or polygenic matrix models to investigate the genomic organization of AD in comparison to the control population. Similar genetic models are used in pharmacogenomics to elucidate genotype-specific responses of AD patients to a particular drug or combination of drugs. Using APOE-related monogenic models it has been demonstrated that the therapeutic response to drugs in AD is genotype-specific. A multifactorial therapy combining 3 different drugs yielded positive results during the 6-12 months in approximately 60% of the patients. With this therapeutic strategy, APOE-4/4 carriers were the worst responders, and patients with the APOE-3/4 genotype were the best responders. In bigenic and trigenic models it was possible to differentiate the influencial effect of PS1 and PS2 polymorphic variants on mental performance in response to multifactorial therapy. The application of functional genomics to AD can be a suitable strategy for harmonization in molecular diagnosis and drug clinical trials. Furthermore, the pharmacogenomics of AD may contribute in the future to optimise drug development and therapeutics, increasing efficacy and safety, and reducing side-effects and unnecessary costs.
Cariou, A., J. D. Chiche, et al. (2002). "The era of genomics: impact on sepsis clinical trial design." Crit Care Med 30(5 Suppl): S341-8. OBJECTIVE: This article aims to address the predictable impact of genetics on the design of clinical trials in the field of critical care medicine, with emphasis on the pathophysiology of sepsis and its treatment. DATA SOURCES: Published articles reporting studies on sepsis and septic shock or assessing the influence of genetics and pharmacogenomics in the treatment of critical illnesses. DATA ANALYSIS: Because most common diseases including sepsis have been shown to be influenced by inherited differences in our genes, completion of the Human Genome Project and the concomitant publication of the human single nucleotide polymorphism map both contribute to change our approach to medicine. Advances in genotyping techniques and bioinformatics enabling detection of single nucleotide polymorphisms have caused an explosion in pharmacogenomics-the research dealing with the interactions of an individual's genotype and the outcome of a drug therapy. Pharmacogenomics will undoubtedly be used to improve future health care and clinical research in different ways. Whereas treatment allocation has been based mainly on phenotype, genetic characterization will help researchers to identify suitable subjects for clinical trials, to facilitate interpretation of the results of clinical trials, and to identify novel targets for future drugs or new markets for current products. As interindividual variability in drug response is a substantial clinical problem, the second major objective of pharmacogenomic research is to decrease adverse responses to therapy through determination of adequate therapeutic targets and genetic polymorphisms that alter drug specificity and toxicity. Ultimately, genetic information will be used to select the most effective therapeutic agent and the optimal dosage to elicit the expected drug response for a given individual. Implementation of genetic criteria for stratification of patient populations and individual assessment of treatment risks and benefits emerges as a major challenge to the pharmaceutical industry. CONCLUSIONS: In the future, technologies such as gene chip array will enhance genetic medicine and provide novel insights into a patient's susceptibility to disease, enabling a better assessment of prognostic risk factors, quicker diagnosis, and accurate prediction of individual responsiveness to drugs. The predictable consequences of such an approach on the prevention and treatment of diseases could revolutionize medicine.
Chakravarti, D. N. (2002). "From the decline and fall of protein chemistry to proteomics." Biotechniques Suppl: 2-3.
Chakravarti, D. N., B. Chakravarti, et al. (2002). "Informatic tools for proteome profiling." Biotechniques Suppl: 4-10, 12-5. In recent years, the practice of proteomics research has experienced a dramatic shift within the pharmaceutical and biotechnology industry with the widespread implementation of novel applications. The areas of interest extend all the way from discovery of novel drug, vaccine, and diagnostic targets, characterization of protein-based products, toxicology, and identification of surrogate markers of activity in clinical research, to the ability to provide information on the mechanisms of drug action. The power of two-dimensional gel electrophoresis as well as advances in mass spectrometric techniques combined with sequence database correlation have enabled speed and accuracy in identification of proteins in complex mixtures. This article surveys currently available software and informatic tools related to these methods for proteome profiling. The broad acceptance of these technologies, however, has not been accompanied by significant advances in the informatics and software tools necessary to support the analysis and management of the massive amounts of data generated in the process. In this context, this article also discusses the importance of relational databases for protein identification data management.
Chance, M. R., A. R. Bresnick, et al. (2002). "Structural genomics: a pipeline for providing structures for the biologist." Protein Sci 11(4): 723-38.
Chen, C. W., C. H. Huang, et al. (2002). "Once the circle has been broken: dynamics and evolution of Streptomyces chromosomes." Trends Genet 18(10): 522-9. Chromosomal instability has been a hallmark of Streptomyces genetics. Deletions and circularization often occur in the less-conserved terminal sequences of the linear chromosomes, which contain swarms of transposable elements and other horizontally transferred elements. Intermolecular recombination involving these regions also generates gross exchanges, resulting in terminal inverted repeats of heterogeneous size and context. The structural instability is evidently related to evolution of the Streptomyces chromosomes, which is postulated to involve linearization of hypothetical circular progenitors via integration of a linear plasmid. This scenario is supported by several bioinformatic analyses.
Chiu, W., M. L. Baker, et al. (2002). "Deriving folds of macromolecular complexes through electron cryomicroscopy and bioinformatics approaches." Curr Opin Struct Biol 12(2): 263-9. Intermediate-resolution (7-9A) structures of large macromolecular complexes can be obtained by electron cryomicroscopy. This structural information, combined with bioinformatics data for the individual protein components or domains, can lead to a fold model for the entire complex. Such approaches have been demonstrated with the 6.8 A structure of the rice dwarf virus to derive models for the major capsid shell proteins.
Clark, D. E. and P. D. Grootenhuis (2002). "Progress in computational methods for the prediction of ADMET properties." Curr Opin Drug Discov Devel 5(3): 382-90. This review surveys recent progress in the development and application of computational techniques for the prediction of absorption, distribution, metabolism, elimination and toxicity (ADMET) properties, including intestinal permeability, blood-brain barrier penetration, active transport/efflux, aqueous solubility, metabolism and toxicity. While much effort continues to be expended in this field with some success on existing datasets, perhaps the most pressing need at this time is for larger, high-quality sets of experimental data to provide a sound basis for model building.
Cole, J. and F. Isik (2002). "Human genomics and microarrays: implications for the plastic surgeon." Plast Reconstr Surg 110(3): 849-58. The Human Genome Project was launched in 1989 in an effort to sequence the entire span of human DNA. Although coding sequences are important in identifying mutations, the static order of DNA does not explain how a cell or organism may respond to normal and abnormal biological processes. By examining the mRNA content of a cell, researchers can determine which genes are being activated in response to a stimulus.Traditional methods in molecular biology generally work on a "one gene: one experiment" basis, which means that the throughput is very limited and the "whole picture" of gene function is hard to obtain. To study each of the 60,000 to 80,000 genes in the human genome under each biological circumstance is not practical. Recently, microarrays (also known as gene or DNA chips) have emerged; these allow for the simultaneous determination of expression for thousands of genes and analysis of genome-wide mRNA expression.The purpose of this article is twofold: first, to provide the clinical plastic surgeon with a working knowledge and understanding of the fields of genomics, microarrays, and bioinformatics and second, to present a case to illustrate how these technologies can be applied in the study of wound healing.
Cole, S. T. (2002). "Comparative and functional genomics of the Mycobacterium tuberculosis complex." Microbiology 148(Pt 10): 2919-28.
Croston, G. E. (2002). "Functional cell-based uHTS in chemical genomic drug discovery." Trends Biotechnol 20(3): 110-5. The availability of genomic information significantly increases the number of potential targets available for drug discovery, although the function of many targets and their relationship to disease is unknown. In a chemical genomic research approach, ultra-high throughput screening (uHTS) of genomic targets takes place early in the drug discovery process, before target validation. Target-selective modulators then provide drug leads and pharmacological research tools to validate target function. Effective implementation of a chemical genomic strategy requires assays that can perform uHTS for large numbers of genomic targets. Cell-based functional assays are capable of the uHTS throughput required for chemical genomic research, and their functional nature provides distinct advantages over ligand-binding assays in the identification of target-selective modulators.
Dandekar, T. and R. Sauerborn (2002). "Comparative genome analysis and pathway reconstruction." Pharmacogenomics 3(2): 245-56. Pathway reconstruction builds on genome and biochemical data with the aim of reconstructing higher level interactions between identified enzymes in a specific genome, in particular the different enzyme pathways (species or individual/patient). Metabolite flow in a pathway is analyzed by different tools, such as elementary mode analysis. This reveals key enzymes and pharmacological targets in the enzyme network. An overview of bioinformatic tools and algorithms for these tasks, application examples and recent results from these techniques are presented. Target selection, drug development and optimization can all be sped up using these approaches.
Davidson, E. H., J. P. Rast, et al. (2002). "A genomic regulatory network for development." Science 295(5560): 1669-78. Development of the body plan is controlled by large networks of regulatory genes. A gene regulatory network that controls the specification of endoderm and mesoderm in the sea urchin embryo is summarized here. The network was derived from large-scale perturbation analyses, in combination with computational methodologies, genomic data, cis-regulatory analysis, and molecular embryology. The network contains over 40 genes at present, and each node can be directly verified at the DNA sequence level by cis-regulatory analysis. Its architecture reveals specific and general aspects of development, such as how given cells generate their ordained fates in the embryo and why the process moves inexorably forward in developmental time.
De Groot, A. S., H. Sbai, et al. (2002). "Immuno-informatics: Mining genomes for vaccine components." Immunol Cell Biol 80(3): 255-69. The complete genome sequences of more than 60 microbes have been completed in the past decade. Concurrently, a series of new informatics tools, designed to harness this new wealth of information, have been developed. Some of these new tools allow researchers to select regions of microbial genomes that trigger immune responses. These regions, termed epitopes, are ideal components of vaccines. When the new tools are used to search for epitopes, this search is usually coupled with in vitro screening methods; an approach that has been termed computational immunology or immuno-informatics.Researchers are now implementing these combined methods to scan genomic sequences for vaccine components. They are thereby expanding the number of different proteins that can be screened for vaccine development, while narrowing this search to those regions of the proteins that are extremely likely to induce an immune response.As the tools improve, it may soon be feasible to skip over many of the in vitro screening steps, moving directly from genome sequence to vaccine design. The present article reviews the work of several groups engaged in the development of immuno-informatics tools and illustrates the application of these tools to the process of vaccine discovery.
de Jong, H. (2002). "Modeling and simulation of genetic regulatory systems: a literature review." J Comput Biol 9(1): 67-103. In order to understand the functioning of organisms on the molecular level, we need to know which genes are expressed, when and where in the organism, and to which extent. The regulation of gene expression is achieved through genetic regulatory systems structured by networks of interactions between DNA, RNA, proteins, and small molecules. As most genetic regulatory networks of interest involve many components connected through interlocking positive and negative feedback loops, an intuitive understanding of their dynamics is hard to obtain. As a consequence, formal methods and computer tools for the modeling and simulation of genetic regulatory networks will be indispensable. This paper reviews formalisms that have been employed in mathematical biology and bioinformatics to describe genetic regulatory systems, in particular directed graphs, Bayesian networks, Boolean networks and their generalizations, ordinary and partial differential equations, qualitative differential equations, stochastic equations, and rule-based formalisms. In addition, the paper discusses how these formalisms have been used in the simulation of the behavior of actual regulatory systems.
Dougherty, E. R., J. Barrera, et al. (2002). "Inference from clustering with application to gene-expression microarrays." J Comput Biol 9(1): 105-26. There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes. The inference is that clusters represent a partition of the sample points according to which process they belong. This paper discusses a model-based clustering toolbox that evaluates cluster accuracy. Each random process is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random processes. Various clustering algorithms are evaluated based on process variance and the key issue of the rate at which algorithmic performance improves with increasing numbers of experimental replications. The model means can be selected by hand to test the separability of expected types of biological expression patterns. Alternatively, the model can be seeded by real data to test the expected precision of that output or the extent of improvement in precision that replication could provide. In the latter case, a clustering algorithm is used to form clusters, and the model is seeded with the means and variances of these clusters. Other algorithms are then tested relative to the seeding algorithm. Results are averaged over various seeds. Output includes error tables and graphs, confusion matrices, principal-component plots, and validation measures. Five algorithms are studied in detail: K-means, fuzzy C-means, self-organizing maps, hierarchical Euclidean-distance-based and correlation-based clustering. The toolbox is applied to gene-expression clustering based on cDNA microarrays using real data. Expression profile graphics are generated and error analysis is displayed within the context of these profile graphics. A large amount of generated output is available over the web.
Dougherty, T. J., J. F. Barrett, et al. (2002). "Microbial genomics and novel antibiotic discovery: new technology to search for new drugs." Curr Pharm Des 8(13): 1119-35. The process of prokaryotic drug discovery has been a model of success for over fifty years, yet the number of exploited bacterial targets is a mere fraction, less than 0.1% of the potential targets (based on total number of bacterial genes identified by gene sequence projects). To better understand the potential for drug intervention, multiple paradigms have been established in the pharmaceutical industry, all with some semblance of commonality and uniqueness to provide proprietary positioning, yet no company has been successful to date in taking a genomics approach to the finish line of having a genomics-based drug on the market. Within this overview, we provide a strategic overview of a sample process for the identification, validation and exploitation of novel antibacterial targets ascertained through a bioinformatics-based genomics drug discovery program.
Doytchinova, I. A. and D. R. Flower (2002). "Quantitative approaches to computational vaccinology." Immunol Cell Biol 80(3): 270-9. This article reviews the newly released JenPep database and two new powerful techniques for T-cell epitope prediction: (i) the additive method; and (ii) a 3D-Quantitative Structure Activity Relationships (3D-QSAR) method, based on Comparative Molecular Similarity Indices Analysis (CoMSIA). The JenPep database is a family of relational databases supporting the growing need of immunoinformaticians for quantitative data on peptide binding to major histocompatibility complexes and to the Transporters associated with Antigen Processing (TAP). It also contains an annotated list of T-cell epitopes. The database is available free via the Internet (http://www.jenner.ac.uk/JenPep). The additive prediction method is based on the assumption that the binding affinity of a peptide depends on the contributions from each amino acid as well as on the interactions between the adjacent and every second side-chain. In the 3D-QSAR approach, the influence of five physicochemical properties (steric bulk, electrostatic potential, local hydrophobicity, hydrogen-bond donor and hydrogen-bond acceptor abilities) on the affinity of peptides binding to MHC molecules were considered. Both methods were exemplified through their application to the well-studied problem of peptides binding to the human class I MHC molecule HLA-A*0201.
Eddy, S. R. (2002). "Computational genomics of noncoding RNA genes." Cell 109(2): 137-40. The number of known noncoding RNA genes is expanding rapidly. Computational analysis of genome sequences, which has been revolutionary for protein gene analysis, should also be able to address questions of the number and diversity of noncoding RNA genes. However, noncoding RNAs present computational genomics with a new set of challenges.
Egan, W. J. and G. Lauri (2002). "Prediction of intestinal permeability." Adv Drug Deliv Rev 54(3): 273-89. This review focuses on computational methods for the prediction of passive intestinal permeability. Existing computational models are surveyed and assessed in terms of descriptors, model type/complexity, speed of computation, predictive performance, and interpretability. Challenges to the successful computational prediction of intestinal permeability, i.e. data quantity, measurement imprecision, confounding factors such as solubility, metabolism, or active efflux, and the need for robust statistical methods, are also discussed.
Escribano, J. and M. Coca-Prados (2002). "Bioinformatics and reanalysis of subtracted expressed sequence tags from the human ciliary body: Identification of novel biological functions." Mol Vis 8: 315-32. PURPOSE: The ciliary body is largely known for its major roles in the regulation of aqueous humor secretion, intraocular pressure, and accommodation of the lens. In this review article we applied bioinformatics to re-examine hundreds of expressed sequence tags (ESTs) previously isolated by subtractive hybridization from a human ciliary body library [1]. The DNA sequences of these clones have been recently added to the web site of NEIBank. METHODS: DNA sequence comparisons of subtracted ESTs were performed against all entries in the last available release of the non-redundant database containing GenBank, EMBL, DDBJ and PDB sequences using the BlastN program accessed through NCBI's BLAST services on the internet (NCBI). Sequences were also compared and mapped using the Blast search program provided through the Internet by the Human Genome Project (UCSC). RESULTS: A total number of 284 independent ESTs were classified in 17 functional groups. Analysis of their relationships allowed to define the expression of five major groups of known genes: (i) protein synthesis, folding, secretion and degradation (20%); (ii) energy supply and biosynthesis (12%); (iii) contractility and cytoskeleton structure (6%); (iv) cellular signaling and cell cycle regulation (7%); and (v) nerve cell related tasks (2%), including neuropeptide processing and putative non-visual phototransduction and circadian rhythm control. The largest group contain unidentified sequences, a total of 105 sequences, accounting for 37% of ESTs. The unidentified sequences show similarity to genomic non-coding regions, or genes of unknown function. CONCLUSIONS: The most highly represented EST, correspond to myocilin, a gene involved in glaucoma. The data also confirms the secretory functions of the ciliary epithelium, and its high metabolism; the presence of a neuroendocrine peptidergic system presumably involved in the regulation of the intraocular pressure and/or aqueous humor secretion. Additional genes may be related to a non-visual phototransduction cascade and/or to circadian rhythms. Overall this initial group of subtracted ESTs can lead to uncover novel physiological functions of the ciliary body in normal and in disease, as well as novel candidate genes for ocular diseases.
Fabrega, S., P. Durand, et al. (2002). "[The active site of human glucocerebrosidase: structural predictions and experimental validations]." J Soc Biol 196(2): 151-60. Gaucher disease is a lysosomal storage disorder caused by a deficiency in glucocerebrosidase which cleaves the beta-glucosidic linkage of glucosylceramide, a normal intermediate in glycolipid metabolism. Glucocerebrosidase belongs to the clan GH-A of glycoside hydrolases, a large group of enzymes which function with retention of the anomeric configuration at the hydrolysis site. Accurate three-dimensional (3D) structure data for glucocerebrosidase should help to better understand the molecular bases of Gaucher disease. As such 3D structure data were not available, we used the two-dimensional hydrophobic cluster analysis (HCA) method to make structure predictions for the catalytic domains of clan GH-A glycoside hydrolases. We found that all the enzymes of clan GH-A may share a similar catalytic domain consisting of an (alpha/beta)8 barrel with the critical acid/base and nucleophile residues located at the C-terminal ends of strands beta 4 and beta 7, respectively. In the case of glucocerebrosidase, Glu 235 was predicted to be the putative acid/base catalyst whereas the nucleophile was located at Glu 340. Next, in order to obtain experimental evidence supporting these HCA-based predictions, we used retroviral vectors to express, in murine null cells, E235A and E340A mutant proteins, in which alanine residues unable to participate in the enzymatic reaction replace the presumed critical glutamic acid residues. Both mutants were found to be catalytically inactive although they were correctly folded/processed and sorted to the lysosome. Thus, Glu 235 and Glu 340 do indeed play key roles in the active site of human glucocerebrosidase as predicted by the HCA analysis. In a broader perspective, our work points out that bioinformatics approaches may be highly useful for generating structure-function predictions based on sequence-structure interrelationships, especially in the context of a rapid increase in protein sequence information through genome sequencing.
Fairlamb, A. H. (2002). "Metabolic pathway analysis in trypanosomes and malaria parasites." Philos Trans R Soc Lond B Biol Sci 357(1417): 101-7. Identification of novel drug targets is required for the development of new classes of drugs to overcome drug resistance and replace less efficacious treatments. In theory, knowledge of the entire genome of a pathogen identifies every potential drug target in any given microbe. In practice, the sheer complexity and the inadequate or inaccurate annotation of genomic information makes target identification and selection somewhat more difficult. Analysis of metabolic pathways provides a useful conceptual framework for the identification of potential drug targets and also for improving our understanding of microbial responses to nutritional, chemical and other environmental stresses. A number of metabolic databases are available as tools for such analyses. The strengths and weaknesses of this approach are discussed.
Fielden, M. R., J. B. Matthews, et al. (2002). "In silico approaches to mechanistic and predictive toxicology: an introduction to bioinformatics for toxicologists." Crit Rev Toxicol 32(2): 67-112. Bioinformatics, or in silico biology, is a rapidly growing field that encompasses the theory and application of computational approaches to model, predict, and explain biological function at the molecular level. This information rich field requires new skills and new understanding of genome-scale studies in order to take advantage of the rapidly increasing amount of sequence, expression, and structure information in public and private databases. Toxicologists are poised to take advantage of the large public databases in an effort to decipher the molecular basis of toxicity. With the advent of high-throughput sequencing and computational methodologies, expressed sequences can be rapidly detected and quantitated in target tissues by database searching. Novel genes can also be isolated in silico, while their function can be predicted and characterized by virtue of sequence homology to other known proteins. Genomic DNA sequence data can be exploited to predict target genes and their modes of regulation, as well as identify susceptible genotypes based on single nucleotide polymorphism data. In addition, highly parallel gene expression profiling technologies will allow toxicologists to mine large databases of gene expression data to discover molecular biomarkers and other diagnostic and prognostic genes or expression profiles. This review serves to introduce to toxicologists the concepts of in silico biology most relevant to mechanistic and predictive toxicology, while highlighting the applicability of in silico methods using select examples.
Frank, A. O., P. W. Walsh, et al. (2002). "Computational fluid dynamics and stent design." Artif Organs 26(7): 614-21. Stents are small, usually metallic tubes that are intended to prop open arteries blocked with atherosclerotic plaques. While stents have been used successfully in recent years, they still suffer from failure due to development of new tissue in stented segment (restenosis). Variations in the failure rates associated with different stent designs have led researchers to investigate the role of near-wall flow patterns. While there is no direct evidence yet, the patterns of flow stagnation as the blood flows past the stent struts may affect the restenosis process. Computational fluid dynamics (CFD) approaches are well suited for obtaining detailed information on stent flow patterns. Many CFD simulations make use of a two-dimensional model. The strong dependence of flow stagnation on stent strut spacing has been clearly demonstrated. These results have been employed to interpret the results of in vitro experiments designed to elucidate the mechanisms of restenosis.
Friedman, N. and N. Kaminski (2002). "Statistical methods for analyzing gene expression data for cancer research." Ernst Schering Res Found Workshop(38): 109-31.
Frishman, D., A. Kaps, et al. (2002). "Online genomics facilities in the new millennium." Pharmacogenomics 3(2): 265-71. The review begins by providing a brief typology of biological databases on the Internet, illustrated by examples of the most influential resources of each kind. We then take an insider look at one typical on-line genomic resource -- the yeast genome database hosted at the Munich Information Center for Protein Sequences (MIPS) -- and explain how and why it has evolved from a basic sequence repository to a multidomain knowledge base. The role of community efforts in curating and annotating genome data is discussed. The crucial role of data integration and interoperability in developing next-generation genomic facilities is underscored.
Fryer, R. M., J. Randall, et al. (2002). "Global analysis of gene expression: methods, interpretation, and pitfalls." Exp Nephrol 10(2): 64-74. Over the past 15 years, global analysis of mRNA expression has emerged as a powerful strategy for biological discovery. Using the power of parallel processing, robotics, and computer-based informatics, a number of high-throughput methods have been devised. These include DNA microarrays, serial analysis of gene expression, quantitative RT-PCR, differential-display RT-PCR, and massively parallel signature sequencing. Each of these methods has inherent advantages and disadvantages, often related to expense, technical difficulty, specificity, and reliability. Further, the ability to generate large data sets of gene expression has led to new challenges in bioinformatics. Nonetheless, this technological revolution is transforming disease classification, gene discovery, and our understanding of regulatory gene networks.
Fukami-Kobayashi, K. and N. Saito (2002). "[How to make good use of CLUSTALW]." Tanpakushitsu Kakusan Koso 47(9): 1237-9.
Gabius, H. J., S. Andre, et al. (2002). "The sugar code: functional lectinomics." Biochim Biophys Acta 1572(2-3): 165-77. Analysis of the genome and proteome assumes the focus of attention in efforts to relate biochemical coding with cell functionality. Among other chores in energy metabolism, the talents of carbohydrates to establish a high-density coding system give reason for a paradigmatic shift. The sequence complexity of glycans and glycan-processing enzymes (glycosyltransferases, glycosidases and enzymes introducing substituents such as sulfotransferases), the growing evidence for the importance of glycans from transgenic and knock-out animal models and the correlation of defects in glycosylation with diseases are substantial assets to portray oligosaccharides as code words in their own right. Matching the pace of progress in the work on glycoconjugates, the increasing level of refinement of our knowledge about lectins (definition of this term: carbohydrate-binding proteins, excluding sugar-specific antibodies, receptors of free mono- or disaccharides for transport or chemotaxis and enzymes modifying the bound carbohydrate) epitomizes the sphere of action of the sugar code (functional lectinomics). It encompasses, among other activities, intra- and intercellular transport processes, sensor branches of innate immunity, regulation of cell-cell (matrix) adhesion or migration and positive/negative growth control with implications for differentiation and malignancy. The Q & A approach taken in this review lists a series of arguments in a stepwise manner to make the reader wonder why it is only a rather recent process that the concept of the sugar code has taken root in deciphering the mechanistic versatility of biological information storage and transfer.
Gaucher, E. A., X. Gu, et al. (2002). "Predicting functional divergence in protein evolution by site-specific rate shifts." Trends Biochem Sci 27(6): 315-21. Most modern tools that analyze protein evolution allow individual sites to mutate at constant rates over the history of the protein family. However, Walter Fitch observed in the 1970s that, if a protein changes its function, the mutability of individual sites might also change. This observation is captured in the "non-homogeneous gamma model", which extracts functional information from gene families by examining the different rates at which individual sites evolve. This model has recently been coupled with structural and molecular biology to identify sites that are likely to be involved in changing function within the gene family. Applying this to multiple gene families highlights the widespread divergence of functional behavior among proteins to generate paralogs and orthologs.
Gendel, S. M. (2002). "Sequence analysis for assessing potential allergenicity." Ann N Y Acad Sci 964: 87-98. Sequence analysis plays an important role in assessing the potential allergenicity of proteins used in transgenic foods, particularly for proteins that have not previously been part of the food supply. Sequence comparisons are used to indicate potential unexpected cross reactivity to existing allergens and to assess the potential for developing new sensitivities. Although the concept of using sequence analysis is straightforward, implementing a bioinformatic analysis that is accurate and complete can be complex. Several factors need to be considered, including the design and content of the sequence database, the analysis strategy, and the criteria for evaluating the results.
Gentzel, M., T. Kocher, et al. (2002). "Proteomics in biological research: the challenge to make proteins speak." Ernst Schering Res Found Workshop(38): 167-89.
Gerhold, D. L., R. V. Jensen, et al. (2002). "Better therapeutics through microarrays." Nat Genet 32 Suppl: 547-51. DNA microarrays are an integral part of the process for therapeutic discovery, optimization and clinical validation. At an early stage, investigators use arrays to prioritize a few genes as potential therapeutic targets on the basis of various criteria. Subsequently, gene expression analysis assists in drug discovery and toxicology by eliminating poor compounds and optimizing the selection of promising leads. Integral to this process is the use of sophisticated statistics, mathematics and bioinformatics to define statistically valid observations and to deduce complex patterns of phenotypes and biological pathways. In short, microarrays are redefining the drug discovery process by providing greater knowledge at each step and by illuminating the complex workings of biological systems.
Gerlai, R. (2002). "Phenomics: fiction or the future?" Trends Neurosci 25(10): 506-9. The ease with which genetic mutations can be induced in or introduced into mammalian organisms, such as the mouse, has created a significant need for phenotypic analysis. Developments in computer technology, instrumentation and bioinformatics, as well as in numerous neuroscience disciplines, will help to meet the demands set by the molecular revolution. As a result, the field of 'phenomics' is being born. This will integrate multidisciplinary research, with the goal of understanding the complex phenotypic consequences of genetic mutations at the level of the organism. This paper focuses on one of the disciplines that show promising developments, behavioral science.
Gieser, P., G. C. Bloom, et al. (2002). "Introduction to microarray experimentation and analysis." Methods Mol Biol 184: 29-49.
Gohil, K. and L. Packer (2002). "Bioflavonoid-rich botanical extracts show antioxidant and gene regulatory activity." Ann N Y Acad Sci 957: 70-7. Reactive oxygen and nitrogen metabolites are obligatory and essential products of metabolism. Unregulated increase in their production is associated with a number of chronic illnesses. Diets rich in fruits, vegetables, and wines are implicated in the prevention of chronic diseases. Molecular mechanisms by which fruits and vegetables confer their disease-preventive actions are poorly defined. However, recent developments in the fields of genomics and bioinformatics provide powerful tools to investigate the mechanisms by which botanicals affect cellular functions. This monograph illustrates the potential of large-scale messenger RNA analysis to unravel the role of transcription in mediating the effects of botanical extracts with antioxidant properties. The application of microarrays and oligonucleotide arrays shows multiple effects of antioxidant extracts on the expression of a broad spectrum of genes.
Goldsmith, L. J. (2002). "Power and sample size considerations in molecular biology." Methods Mol Biol 184: 111-30.
Goodman, N. (2002). "Biological data becomes computer literate: new advances in bioinformatics." Curr Opin Biotechnol 13(1): 68-71. Bioinformatics is an art and science concerned with the use of computing in biological research areas such as genomics, transcriptomics, proteomics, genetics, and evolution. This review paints a broad picture of bioinformatics, drawing examples from genomic sequencing and microarray analysis. I highlight the role of bioinformatics at multiple points along the path from high-tech data generation to biological discovery.
Goto, S. (2002). "[Mastering Web-based analysis of gene networks]." Tanpakushitsu Kakusan Koso 47(5): 635-41.
Grass, G. M. and P. J. Sinko (2002). "Physiologically-based pharmacokinetic simulation modelling." Adv Drug Deliv Rev 54(3): 433-51. Drug selection is now widely viewed as an important and relatively new, yet largely unsolved, bottleneck in the drug discovery and development process. In order to achieve an efficient selection process, high quality, rapid, predictive and correlative ADME models are required in order for them to be confidently used to support critical financial decisions. Systems that can be relied upon to accurately predict performance in humans have not existed, and decisions have been made using tools whose capabilities could not be verified until candidates went to clinical trial, leading to the high failure rates historically observed. However, with the sequencing of the human genome, advances in proteomics, the anticipation of the identification of a vastly greater number of potential targets for drug discovery, and the potential of pharmacogenomics to require individualized evaluation of drug kinetics as well as drug effects, there is an urgent need for rapid and accurately computed pharmacokinetic properties.
Graves, P. R. and T. A. Haystead (2002). "Molecular biologist's guide to proteomics." Microbiol Mol Biol Rev 66(1): 39-63; table of contents. The emergence of proteomics, the large-scale analysis of proteins, has been inspired by the realization that the final product of a gene is inherently more complex and closer to function than the gene itself. Shortfalls in the ability of bioinformatics to predict both the existence and function of genes have also illustrated the need for protein analysis. Moreover, only through the study of proteins can posttranslational modifications be determined, which can profoundly affect protein function. Proteomics has been enabled by the accumulation of both DNA and protein sequence databases, improvements in mass spectrometry, and the development of computer algorithms for database searching. In this review, we describe why proteomics is important, how it is conducted, and how it can be applied to complement other existing technologies. We conclude that currently, the most practical application of proteomics is the analysis of target proteins as opposed to entire proteomes. This type of proteomics, referred to as functional proteomics, is always driven by a specific biological question. In this way, protein identification and characterization has a meaningful outcome. We discuss some of the advantages of a functional proteomics approach and provide examples of how different methodologies can be utilized to address a wide variety of biological problems.
Gupta, R., L. J. Jensen, et al. (2002). "Orphan protein function and its relation to glycosylation." Ernst Schering Res Found Workshop(38): 276-94.
Guzey, C. and O. Spigset (2002). "Genotyping of drug targets: a method to predict adverse drug reactions?" Drug Saf 25(8): 553-60. In the last decades, advances in molecular biology have led to modern pharmacogenetics, which started as a science that focused on investigating drug metabolising enzymes and genetic determinants of pharmacokinetic variability. As more evidence has become available on the structure of drug targets and the genes coding for them, increasing attention has been directed towards pharmacodynamic explanations of variability in therapeutic response as well as in the risk for adverse drug reactions. Traditionally, genetic drug safety research has focused on variations in single genes whose functions are known to be related to given adverse drug reactions. A few such examples, malignant hyperthermia, the long QT syndrome, venous thromboembolic disease, tardive dyskinesia, and drug addiction, are presented in this article. In the future, results from the Human Genome Project together with tools such as DNA microarray technology, high-output screening systems and advanced bioinformatics, will permit a more thorough elucidation than is currently possible of the genetic components of adverse drug reactions. By screening for a large number of single nucleotide polymorphisms (SNPs), SNP patterns associated with adverse drug reactions can be discovered even though the functions of the SNPs as such are completely unknown. On the basis of these findings, it can be expected that pharmacogenetic research will identify situations where a drug should be avoided in certain individuals in order to reduce the risk for adverse drug reactions. If so, it will be feasible to use molecular diagnostics to select drugs that are safe for the individual patient.
Haberkorn, U., A. Altmann, et al. (2002). "Functional genomics and proteomics--the role of nuclear medicine." Eur J Nucl Med Mol Imaging 29(1): 115-32. Now that the sequencing of the human genome has been completed, the basic challenges are finding the genes, locating their coding regions and predicting their functions. This will result in a new understanding of human biology as well as in the design of new molecular structures as potential novel diagnostic or drug discovery targets. The assessment of gene function may be performed using the tools of the genome program. These tools represent high-throughput methods used to evaluate changes in the expression of many or all genes of an organism at the same time in order to investigate genetic pathways for normal development and disease. This will lead to a shift in the scientific paradigm: In the pre-proteomics era, functional assignments were derived from hypothesis-driven experiments designed to understand specific cellular processes. The new tools describe proteins on a proteome-wide scale, thereby creating a new way of doing cell research which results in the determination of three-dimensional protein structures and the description of protein networks. These descriptions may then be used for the design of new hypotheses and experiments in the traditional physiological, biochemical and pharmacological sense. The evaluation of genetically manipulated animals or newly designed biomolecules will require a thorough understanding of physiology, biochemistry and pharmacology and the experimental approaches will involve many new technologies, including in vivo imaging with single-photon emission tomography and positron emission tomography. Nuclear medicine procedures may be applied for the determination of gene function and regulation using established and new tracers or using in vivo reporter genes such as enzymes, receptors, antigens or transporters. Pharmacogenomics will identify new surrogate markers for therapy monitoring which may represent potential new tracers for imaging. Also, drug distribution studies for new therapeutic biomolecules are needed, at least during preclinical stages of drug development. Finally, new biomolecules will be developed by bioengineering methods which may be used for isotope-based diagnosis and treatment of disease.
Halfon, M. S. and A. M. Michelson (2002). "Exploring genetic regulatory networks in metazoan development: methods and models." Physiol Genomics 10(3): 131-43. One of the foremost challenges of 21st century biological research will be to decipher the complex genetic regulatory networks responsible for embryonic development. The recent explosion of whole genome sequence data and of genome-wide transcriptional profiling methods, such as microarrays, coupled with the development of sophisticated computational tools for exploiting and analyzing genomic data, provide a significant starting point for regulatory network analysis. In this article we review some of the main methodological issues surrounding genome annotation, transcriptional profiling, and computational prediction of cis-regulatory elements and discuss how the power of model genetic organisms can be used to experimentally verify and extend the results of genomic research.
Halperin, I., B. Ma, et al. (2002). "Principles of docking: An overview of search algorithms and a guide to scoring functions." Proteins 47(4): 409-43. The docking field has come of age. The time is ripe to present the principles of docking, reviewing the current state of the field. Two reasons are largely responsible for the maturity of the computational docking area. First, the early optimism that the very presence of the "correct" native conformation within the list of predicted docked conformations signals a near solution to the docking problem, has been replaced by the stark realization of the extreme difficulty of the next scoring/ranking step. Second, in the last couple of years more realistic approaches to handling molecular flexibility in docking schemes have emerged. As in folding, these derive from concepts abstracted from statistical mechanics, namely, populations. Docking and folding are interrelated. From the purely physical standpoint, binding and folding are analogous processes, with similar underlying principles. Computationally, the tools developed for docking will be tremendously useful for folding. For large, multidomain proteins, domain docking is probably the only rational way, mimicking the hierarchical nature of protein folding. The complexity of the problem is huge. Here we divide the computational docking problem into its two separate components. As in folding, solving the docking problem involves efficient search (and matching) algorithms, which cover the relevant conformational space, and selective scoring functions, which are both efficient and effectively discriminate between native and non-native solutions. It is universally recognized that docking of drugs is immensely important. However, protein-protein docking is equally so, relating to recognition, cellular pathways, and macromolecular assemblies. Proteins function when they are bound to other molecules. Consequently, we present the review from both the computational and the biological points of view. Although large, it covers only partially the extensive body of literature, relating to small (drug) and to large protein-protein molecule docking, to rigid and to flexible. Unfortunately, when reviewing these, a major difficulty in assessing the results is the non-uniformity in the formats in which they are presented in the literature. Consequently, we further propose a way to rectify it here.
Hansch, C., D. Hoekman, et al. (2002). "Chem-bioinformatics: comparative QSAR at the interface between chemistry and biology." Chem Rev 102(3): 783-812.
Helfrich, J. P. (2002). "Raw data to knowledge warehouse in proteomic-based drug discovery: a scientific data management issue." Biotechniques Suppl: 48-50, 52-3.
Hirano, H. (2002). "[Analysis of expressed proteome]." Tanpakushitsu Kakusan Koso 47(8 Suppl): 889-97.
Hocker, B., S. Schmidt, et al. (2002). "A common evolutionary origin of two elementary enzyme folds." FEBS Lett 510(3): 133-5. The (beta alpha)(8)-barrel is the most frequent and most versatile fold among enzymes [Hocker et al., Curr. Opin. Biotechnol. 12 (2001) 376-381; Wierenga, FEBS Lett. 492 (2001) 193-198]. Structural and functional evidence suggests that (beta alpha)(8)-barrels evolved from an ancestral half-barrel, which consisted of four (beta alpha) units stabilized by dimerization [Lang et al., Science 289 (2000) 1546-550; Hocker et al., Nat. Struct. Biol. 8 (2001) 32-36; Gerlt and Babbitt, Nat. Struct. Biol. 8 (2001) 5-7]. Here, by performing a comprehensive database search, we detect a striking and unexpected structural and amino acid sequence similarity between (beta alpha)(4) half-barrels and members of the (beta alpha)(5) flavodoxin-like fold. These findings provoke the hypothesis that a large fraction of the modern-day enzymes evolved from a basic structural building block, which can be identified by a combination of sequence and structural analyses.
Holland, K. T. and R. A. Bojar (2002). "Cosmetics: what is their influence on the skin microflora?" Am J Clin Dermatol 3(7): 445-9. Human skin has a resident, transient and temporary resident microflora. This article considers the possibilities of topical products influencing the balance of the microflora. The resident micro-organisms are in a dynamic equilibrium with the host tissue and the microflora may be considered an integral component of the normal human skin. The great majority of these micro-organisms are gram-positive and reside on the skin surface and in the follicles. The host has a variety of structures, molecules and mechanisms which restrict the transient and temporary residents, as well as controlling the population and dominance of the resident group. These include local skin anatomy, hydration, nutrients and inhibitors of various types. The resident microflora is beneficial in occupying a niche and denying its access to transients, which may be harmful and infectious. Also, the residents are important in modifying the immune system. In the healthy host the microflora causes few and temporary problems. Therefore, it is of interest that topical products have little or no effect on the ecology of the microflora. A range of mechanisms by which long-term use of cosmetics may influence the microflora are considered. Although the risks associated are low, it is argued that it is necessary to monitor these changes in ecology and use technologies of modeling and bioinformatics to predict outcomes, whether good, neutral or of concern.
Holmes, I. (2002). "Transcendent elements: whole-genome transposon screens and open evolutionary questions." Genome Res 12(8): 1152-5.
Homma, K. and K. Nishikawa (2002). "[Protein structure information provided by the GTOP database and its applications]." Tanpakushitsu Kakusan Koso 47(8 Suppl): 1076-82.
Hurley, J. H., D. E. Anderson, et al. (2002). "Structural genomics and signaling domains." Trends Biochem Sci 27(1): 48-53. Many novel signal transduction domains are being identified in the wake of genome sequencing projects and improved sensitivity in homology-detection techniques. The functions of these domains are being discovered by hypothesis-driven experiments and structural genomics approaches. This article reviews the recent highlights of research on modular signaling domains, and the relative contributions and limitations of the various approaches being used.
Ichihara, H. and H. Toh (2002). "[Extraction of information about protein interaction using evolutionary trace]." Tanpakushitsu Kakusan Koso 47(13): 1863-9.
Iglesias, P. A. and A. Levchenko (2002). "Modeling the cell's guidance system." Sci STKE 2002(148): RE12. Cell locomotion can be directed by external gradients of diffusible substances leading to chemotaxis. Recently, the mechanisms of gradient sensing, the cell guidance system, came under scrutiny both in experimental analysis and computational modeling. Here, we review several recent computational models of gradient sensing in eukaryotic cells, demonstrating why some of them predict little sensitivity to changes in the gradient and response "locking," whereas others predict high gradient sensitivity at the expense of signal gain. We also propose a way to view chemotaxis regulation as a highly coupled combination of semi-independent control modules, leading to simplifying modeling of this complex cellular behavior.
Ito, T. (2002). "[Exploring interaction networks of the yeast proteins]." Tanpakushitsu Kakusan Koso 47(8 Suppl): 898-905.
Iwakawa, M., T. Imai, et al. (2002). "[RadGenomics project]." Nippon Igaku Hoshasen Gakkai Zasshi 62(9): 484-9. Human health conditions are largely determined by a complex interplay among genetic susceptibility, environmental factors, and aging. The RadGenomics project, which began in April 2001, promotes analysis of genes in response to irradiation, identification of their allelic variants in the human population, development of an effective procedure for quantitating individual radio-sensitivity, and analysis of the interrelationship between genetic heterogeneity and susceptibility to irradiation. Major groups of genes with which the project will concern itself include DNA repair genes, cell cycle genes, oncogenes, tumor suppressor genes, genes for programmed cell death, genes for signal transduction, and genes for oxidative processes. The outcome of the RadGenomics project should lead to improved protocols for personalized radiotherapy and reduce the possible side effects of treatment. The project will contribute to future research on the molecular mechanisms of radiation sensitivity in humans and stimulate the development of new high-throughput technology for a broader application of the biological and medical sciences. Identification of functionally important polymorphisms in the radiation response genes may determine individual differences in sensitivity to radiation exposure. The staff members, who are specialists in a variety of fields including genome science, radiation biology, medical science, molecular biology, and bioinformatics, have come to the RadGenomics project from various universities, companies, and research institutes.
Jain, K. K. (2002). "Recent advances in oncoproteomics." Curr Opin Mol Ther 4(3): 203-9. Advances in proteomics are contributing to the understanding of pathophysiology of cancer, cancer diagnosis and anticancer drug discovery. Laser capture microdissection (LCM) provides an ideal method for extraction of cells from specimens in which the exact morphologies of both the captured cells and the surrounding tissue are preserved. Differentially expressed proteins in tumor tissue are found by comparing the protein expression patterns generated using SELDI (surface-enhanced laser desorption/ionization)-based protein chip technology. Proteomic technologies have been used for the study of cancer of various organs. Continued refinement of techniques and methods to determine the abundance and status of proteins in vivo holds great promise for future study of cancer and development of personalized cancer therapies.
Ji, Y. (2002). "The role of genomics in the discovery of novel targets for antibiotic therapy." Pharmacogenomics 3(3): 315-23. The emergence of antibiotic resistance and multi-drug resistance in bacterial pathogens underscores the need for the development of novel classes of antibiotics. The availability of complete genome sequence data from many important human pathogens provides a wealth of fundamental information. This allows us to define each gene and thus to better understand molecular pathogenesis. New techniques have enabled the identification and characterization of genes that are critical for bacterial growth and survival during infection. The combination of genome sequence data and new technologies make it possible to systematically explore the function of each open reading frame in a genome and identify any potential molecular targets for drug discovery. With particular emphasis on antibacterial therapy, this review discusses genome-based technologies and their important applications to anti-infective drug discovery.
Ji, Z. L., J. Z. Liu, et al. (2002). "[Strategies of functional analysis of new genes]." Sheng Wu Gong Cheng Xue Bao 18(1): 117-20. Functional analysis of new genes is playing a central role in postgenomic era. Here we reviewed several main strategies including bioinformatics, gene transduction, antisense technology, certain gene silence induced by RNA interference (RNAi), transgene and gene knockout and artificial chromosome transduction.
Jiang, B., H. Bussey, et al. (2002). "Novel strategies in antifungal lead discovery." Curr Opin Microbiol 5(5): 466-71. There have been significant developments in fungal genomics over the past year. The recently released genome sequences of Aspergillus fumigatus and Cryptococcus neoformans have provided unprecedented opportunities for comparative genomics studies of many clinically relevant fungal pathogens. Emerging experimental analysis tools, such as fitness profiling and protein microarrays, have greatly enhanced our ability to conduct genome-wide functional studies.
Jorgensen, W. L. and E. M. Duffy (2002). "Prediction of drug solubility from structure." Adv Drug Deliv Rev 54(3): 355-66. The aqueous solubility of a drug is an important factor affecting its bioavailability. Numerous computational methods have been developed for the prediction of aqueous solubility from a compound's structure. A review is provided of the methodology and quality of results for the most useful procedures including the model implemented in the QikProp program. Viable methods now exist for predictions with less than 1 log unit uncertainty, which is adequate for prescreening synthetic candidates or design of combinatorial libraries. Further progress with predictive methods would require an experimental database of highly accurate solubilities for a large, diverse collection of drug-like molecules.
Kaban, L. B. (2002). "Biomedical technology revolution: opportunities and challenges for oral and maxillofacial surgeons." Int J Oral Maxillofac Surg 31(1): 1-12. During this 45-minute presentation, I have tried to describe my vision of the exciting future that awaits us. I have tried to impart my enthusiasm for the opportunities provided to us as surgeons by the advances in molecular biology and genetics, imaging, surgical technology and bioinformatics. Most of all, I hope I have transmitted my optimism for the future to our younger members. I think the following statement or observation by the great educator Margaret Mead accurately summarizes our current situation regarding the application of all this new knowledge that will become available to us as surgeons: 'We are now at the point where we must educate people (surgeons) in what nobody knew yesterday, and prepare in our schools (training programs) for what no one knows yet but what some people must know tomorrow.'
Katoh, M. (2002). "Strabismus (STB)/Vang-like (VANGL) gene family (Review)." Int J Mol Med 10(1): 11-5. Strabismus 1 (STB1/VANGL2) and Strabismus 2 (STB2/VANGL1), which have been cloned and characterized using bioinformatics and cDNA-PCR, are human homologues of Drosophila tissue polarity gene strabismus (stbm)/Van Gogh (Vang). STB1 and STB2 are tetra-membrane-spanning proteins with 73.1% total-amino-acid identity. Serine-rich domain and Strabismus-homology (STH1 and STH2) domains are conserved among human STB1, STB2, Xenopus Stbm, and Drosophila Stbm. STH2 domain with the C-terminal Ser/Thr-X-Val motif is implicated in binding with Dishevelled (DVL) proteins. STB1 gene is clustered with CASQ1 gene on human chromosome 1q21-q23, while STB2 gene is clustered with CASQ2 gene on human chromosome 1p13. STB1 and STB2 genes are located around cancer susceptibility loci or recombination hot spots in the human genome. STB1 is moderately expressed in K-562 (leukemia), G-361 (melanoma), and MKN7 (gastric cancer) cells. STB2 is highly expressed in MKN28, MKN74 (gastric cancer), BxPC-3, PSN-1, and Hs766T (pancreatic cancer) cells. On the other hand, STB1 and STB2 are significantly down-regulated in several cancer cell lines and primary tumors. Xenopus homologue of human STB1 and STB2 regulates negatively the WNT - beta-catenin signaling pathway. Loss-of-function mutations of genes encoding negative regulators of WNT - beta-catenin signaling pathway lead to carcinogenesis. Based on functional aspects and human chromosomal loci, STB1 gene and STB2 gene are predicted to be potent tumor suppressor gene candidates. STB1 and STB2 might be suitable targets for tissue engineering in the field of re-generative medicine and for chemoprevention and treatment in the field of clinical oncology.
Katoh, M. (2002). "GIPC gene family (Review)." Int J Mol Med 9(6): 585-9. GIPC1/GIPC/RGS19IP1, GIPC2, and GIPC3 genes constitute the human GIPC gene family. GIPC1 and GIPC2 show 62.0% total-amino-acid identity. GIPC1 and GIPC3 show 59.9% total-amino-acid identity. GIPC2 and GIPC3 show 55.3% total-amino-acid identity. GIPCs are proteins with central PDZ domain and GIPC homology (GH1 and GH2) domains. PDZ, GH1, and GH2 domains are conserved among human GIPCs, Xenopus GIPC/Kermit, and Drosophila GIPC/ LP09416. Bioinformatics revealed that GIPC genes are linked to prostanoid receptor genes and DNAJB genes in the human genome as follows: GIPC1 gene is linked to prostaglandin E receptor 1 (PTGER1) gene and DNAJB1 gene in human chromosome 19p13.2-p13.1 region; GIPC2 gene to prostaglandin F receptor (PTGFR) gene and DNAJB4 gene in human chromosome 1p31.1-p22.3 region; GIPC3 gene to thromboxane A2 receptor (TBXA2R) gene in human chromosome 19p13.3 region. GIPC1 and GIPC2 mRNAs are expressed together in OKAJIMA, TMK1, MKN45 and KATO-III cells derived from diffuse-type of gastric cancer, and are up-regulated in several cases of primary gastric cancer. PDZ domain of GIPC family proteins interact with Frizzled-3 (FZD3) class of WNT receptor, insulin-like growth factor-I (IGF1) receptor, receptor tyrosine kinase TrkA, TGF-beta type III receptor (TGF-beta RIII), integrin alpha6A subunit, transmembrane glycoprotein 5T4, and RGS19/RGS-GAIP. Because RGS19 is a member of the RGS family that regulate heterotrimeric G-protein signaling, GIPCs might be scaffold proteins linking heterotrimeric G-proteins to seven-transmembrane-type WNT receptor or to receptor tyrosine kinases. Therefore, GIPC1, GIPC2 and GIPC3 might play key roles in carcinogenesis and embryogenesis through modulation of growth factor signaling and cell adhesion.
Katze, M. G., Y. He, et al. (2002). "Viruses and interferon: a fight for supremacy." Nat Rev Immunol 2(9): 675-87. The action of interferons (IFNs) on virus-infected cells and surrounding tissues elicits an antiviral state that is characterized by the expression and antiviral activity of IFN-stimulated genes. In turn, viruses encode mechanisms to counteract the host response and support efficient viral replication, thereby minimizing the therapeutic antiviral power of IFNs. In this review, we discuss the interplay between the IFN system and four medically important and challenging viruses -- influenza, hepatitis C, herpes simplex and vaccinia -- to highlight the diversity of viral strategies. Understanding the complex network of cellular antiviral processes and virus-host interactions should aid in identifying new and common targets for the therapeutic intervention of virus infection. This effort must take advantage of the recent developments in functional genomics, bioinformatics and other emerging technologies.
Kennedy, S. (2002). "The role of proteomics in toxicology: identification of biomarkers of toxicity by protein expression analysis." Biomarkers 7(4): 269-90. Proteomics, i.e. the high throughput separation, display and identification of proteins, has the potential to be a powerful tool in drug development. It could increase the predictability of early drug development and identify non-invasive biomarkers of toxicity or efficacy. This review provides an introduction to modern proteomics, with particular reference to applications in toxicology. A literature search was carried out to identify studies in two broad classes: screening/predictive toxicology, and mechanistic toxicology. The strengths and limitations of current methods and the likely impact of techniques in drug development are also considered. Proteomics can increase the speed and sensitivity of toxicological screening by identifying protein markers of toxicity. Proteomics studies have already provided insights into the mechanisms of action of a wide range of substances, from metals to peroxisome proliferators. Current limitations involving speed of throughput are being overcome by increasing automation and the development of new techniques. The isotope-coded affinity tag (ICAT) method appears particularly promising. The application of proteomics to drug development has given rise to the new field of pharmacoproteomics. New associations between proteins and toxicopathological effects are constantly being identified, and major progress is on the horizon as we move into the post-genomic era.
Kidera, A. and M. Ikeguchi (2002). "[Protein structural dynamics]." Tanpakushitsu Kakusan Koso 47(8 Suppl): 1052-7.
Kinoshita, K. (2002). "[Insight into the relation between protein structure and function]." Tanpakushitsu Kakusan Koso 47(8 Suppl): 1064-70.
Kirkwood, T. B. (2002). "New science for an old problem." Trends Genet 18(9): 441-2.
Kitano, H. (2002). "Computational systems biology." Nature 420(6912): 206-10. To understand complex biological systems requires the integration of experimental and computational research -- in other words a systems biology approach. Computational biology, through pragmatic modelling and theoretical exploration, provides a powerful foundation from which to address critical scientific questions head-on. The reviews in this Insight cover many different aspects of this energetic field, although all, in one way or another, illuminate the functioning of modular circuits, including their robustness, design and manipulation. Computational systems biology addresses questions fundamental to our understanding of life, yet progress here will lead to practical innovations in medicine, drug discovery and engineering.
Kitano, H. (2002). "Systems biology: a brief overview." Science 295(5560): 1662-4. To understand biology at the system level, we must examine the structure and dynamics of cellular and organismal function, rather than the characteristics of isolated parts of a cell or organism. Properties of systems, such as robustness, emerge as central issues, and understanding these properties may have an impact on the future of medicine. However, many breakthroughs in experimental devices, advanced software, and analytical methods are required before the achievements of systems biology can live up to their much-touted potential.
Kloos, D. U., C. Choi, et al. (2002). "The TGF-beta--Smad network: introducing bioinformatic tools." Trends Genet 18(2): 96-103. The TGF-beta superfamily is an important class of intercellular signalling molecule, including TGF-beta and bone morphogenetic proteins. Intracellular signalling cascades triggered by these molecules eventually activate transcription factors of the Smad family, which then regulate expression of their respective target genes. This article will discuss the TGF-beta--Smad signalling networks and how these processes are represented in databases of signal transduction and transcription control mechanisms. These databases can provide a well-structured overview of the subject and a basis for advanced bioinformatics analyses to interpret the function of genomic sequences or to analyse signalling networks.
Krajewski, P. and J. Bocianowski (2002). "Statistical methods for microarray assays." J Appl Genet 43(3): 269-78. The paper shortly reviews statistical methods used in the area of DNA microarray studies. All stages of the experiment are taken into account: planning, data collection, data preprocessing, analysis and validation. Among the methods of data analysis, the algorithms for estimating differential expression, multivariate approaches, clustering methods, as well as classification and discrimination are reviewed. The need is stressed for routine statistical data processing protocols and for the search of links of microarray data analysis with quantitative genetic models.
Kuo, W. P., M. E. Whipple, et al. (2002). "Gene expression profiling by DNA microarrays and its application to dental research." Oral Oncol 38(7): 650-6. DNA microarray technology has been used for genome-wide gene expression studies that incorporate molecular genetics and computer science skills on massive levels. The technology permits the simultaneous analysis of tens of thousands of genes for the purposes of gene discovery, disease diagnosis. improved drug development, and therapeutics tailored to specific disease processes. OBJECTIVE: In this review, the two most common microarray technologies and their potential application to dental research will be discussed. The authors review current articles pertaining to the technologies and analysis of mRNA expression using DNA micro-arrays and its application to dental research. Since many genes contribute to normal functioning, research efforts are moving from the search for a disease specific gene to the understanding of the biochemical and molecular functioning of a variety of genes and how complicated networks of interaction can lead to a disease state, such as oral cancer. With the incorporation of DNA micro-array based research, we can look forward to more accurate diagnosis and surgical treatment/drug-delivery therapy based on an individual patient's genetic profile.
Kusunoki, M. (2002). "[Deposition with the Protein Data Bank]." Tanpakushitsu Kakusan Koso 47(6): 736-9.
Kwok, W. W., N. A. Ptacek, et al. (2002). "Use of class II tetramers for identification of CD4+ T cells." J Immunol Methods 268(1): 71-81. Multivalent MHC class II molecules containing peptide antigens are useful tools for the detection of antigen specific human CD4+ T cells. Tetramers produced by exogenous peptide loading onto empty class II molecules are comparable to tetramers with peptide tethered to the class II chain covalently, but have many practical advantages. Conditions for optimal peptide loading to generate tetramers are discussed and optimal conditions of using tetramers for staining T cells are examined. As the frequency of antigen specific CD4+ T cells in peripheral blood is low, we demonstrate that an in vitro expansion step is effective in detecting low frequency T cells. Two new applications with tetramers, their uses for mapping T cell epitopes and for the detection of low affinity T cells are described. In a clinical setting, potential applications include using these reagents for monitoring disease progression during clinical intervention.
Ladd, A. N. and T. A. Cooper (2002). "Finding signals that regulate alternative splicing in the post-genomic era." Genome Biol 3(11): reviews0008. Alternative splicing of pre-mRNAs is central to the generation of diversity from the relatively small number of genes in metazoan genomes. Auxiliary cis elements and trans-acting factors are required for the recognition of constitutive and alternatively spliced exons and their inclusion in pre-mRNA. Here, we discuss the regulatory elements that direct alternative splicing and how genome-wide analyses can aid in their identification.
Langowski, J. and A. Long (2002). "Computer systems for the prediction of xenobiotic metabolism." Adv Drug Deliv Rev 54(3): 407-15. The aim of pharmaceutical research and development is to ensure a continuing pipeline of new chemical entities (NCEs) displaying high therapeutic efficacy with few or no side effects. Failure of promising lead candidates late in the drug discovery processes is regarded as commercially unacceptable in today's increasingly competitive business environment. An inappropriate ADME/Toxicity profile in humans is the major cause of failure of lead candidates in late clinical stages of drug development. Combinatorial chemistry techniques coupled with high throughput screening protocols means that pharmaceutical companies are now dealing with an unprecedented number of NCEs on an annual basis. As a consequence, screening for undesirable ADME/Toxicity properties in the early stages of drug development, preferably pre-synthesis, is now considered the essential paradigm. In silico assessment of NCEs is rapidly emerging as the next wave of technology for early ADME/Toxicity prediction. In this review, we discuss the major commercially available products for the assessing the potential metabolic activity of xenobiotic substances in mammalian systems.
Lazaridis, E. N. and G. C. Bloom (2002). "Statistical contributions to molecular biology." Methods Mol Biol 184: 1-14.
Liberles, D. A. and M. L. Wayne (2002). "Tracking adaptive evolutionary events in genomic sequences." Genome Biol 3(6): REVIEWS1018. As more gene and genomic sequences from an increasing assortment of species become available, new pictures of evolution are emerging. Improved methods can pinpoint where positive and negative selection act in individual codons in specific genes on specific branches of phylogenetic trees. Positive selection appears to be important in the interaction between genotype, protein structure, function, and organismal phenotype.
Looney, S. W. (2002). "Statistical methods for assessing biomarkers." Methods Mol Biol 184: 81-109.
Makarov, V. (2002). "Computer programs for eukaryotic gene prediction." Brief Bioinform 3(2): 195-9. Seven popular programs for gene prediction in eukaryotic organisms are described and evaluated on the basis of availability for in-house and on-line use and prediction accuracy. This report outlines generally applicable approaches to computational gene prediction and known limitations in this field.
Manning, G., D. B. Whyte, et al. (2002). "The protein kinase complement of the human genome." Science 298(5600): 1912-34. We have catalogued the protein kinase complement of the human genome (the "kinome") using public and proprietary genomic, complementary DNA, and expressed sequence tag (EST) sequences. This provides a starting point for comprehensive analysis of protein phosphorylation in normal and disease states, as well as a detailed view of the current state of human genome analysis through a focus on one large gene family. We identify 518 putative protein kinase genes, of which 71 have not previously been reported or described as kinases, and we extend or correct the protein sequences of 56 more kinases. New genes include members of well-studied families as well as previously unidentified families, some of which are conserved in model organisms. Classification and comparison with model organism kinomes identified orthologous groups and highlighted expansions specific to human and other lineages. We also identified 106 protein kinase pseudogenes. Chromosomal mapping revealed several small clusters of kinase genes and revealed that 244 kinases map to disease loci or cancer amplicons.
Marshall, T. and K. M. Williams (2002). "Proteomics and its impact upon biomedical science." Br J Biomed Sci 59(1): 47-64. Proteomics is the protein equivalent of genomics and is the study of gene expression at a functional level. The proteome of an organism is the protein complement of its genome. However, unlike the genome, the proteome is dynamic: it varies according to the cell type and the functional state of the cell. In addition, the proteome shows characteristic perturbations in response to disease and external stimuli. Proteomics combines state of the art analytical methods with bioinformatics. Here, we review the concept and technology of proteomics with specific reference to applications in medical microbiology, cellular pathology, clinical chemistry, haematology/immunology, pharmacology and toxicology.
Martz, E. (2002). "Protein Explorer: easy yet powerful macromolecular visualization." Trends Biochem Sci 27(2): 107-9. Protein Explorer (PE, http://www.proteinexplorer.org) enables students, educators and other nonspecialists to visualize macromolecular structures easily. It also offers several advanced capabilities useful to protein structure specialists. Great attention has been given to making PE easy to use. Explanations, color keys and troubleshooting information are displayed automatically. There are also 'Frequently Asked Questions', a one-hour 'Quick-Tour', an alphabetical 'Help/Index/Glossary', and a detailed 'Tutorial'; all making PE much easier to use than either Chime or RasMol. Moreover, it is much more powerful; in addition to basic macromolecular visualization capabilities common to most similar programs, it offers one-click visualization of interfaces between moieties ('contacts'), cation-pi interactions and salt bridges, as well as easy-to-use routines to visualize regions of conservation in three-dimensional protein structures based on multiple sequence alignments.
Mathe, C., M. F. Sagot, et al. (2002). "Current methods of gene prediction, their strengths and weaknesses." Nucleic Acids Res 30(19): 4103-17. While the genomes of many organisms have been sequenced over the last few years, transforming such raw sequence data into knowledge remains a hard task. A great number of prediction programs have been developed that try to address one part of this problem, which consists of locating the genes along a genome. This paper reviews the existing approaches to predicting genes in eukaryotic genomes and underlines their intrinsic advantages and limitations. The main mathematical models and computational algorithms adopted are also briefly described and the resulting software classified according to both the method and the type of evidence used. Finally, the several difficulties and pitfalls encountered by the programs are detailed, showing that improvements are needed and that new directions must be considered.
Matsuo, Y. and K. Tani (2002). "[Selection and prioritization of targets for protein structure determination]." Tanpakushitsu Kakusan Koso 47(8 Suppl): 882-8.
Merlot, C., D. Domine, et al. (2002). "Fragment analysis in small molecule discovery." Curr Opin Drug Discov Devel 5(3): 391-9. Cheminformatics is playing an ever-increasing role in small molecule drug discovery. The widespread use of high-throughput screening (HTS) and combinatorial chemistry techniques has led to the generation of large amounts of pharmacological data which, in turn, has catalyzed the development of computational methods designed to reduce the time and cost in identifying molecules suitable for pharmaceutical development. This review focuses on recent advances in the field of substructure analysis, an increasingly popular data mining technique with applications at many levels of the discovery process, including HTS, compound library design, virtual screening and the prediction of biological activity.
Mitaku, S. (2002). "[Identification of membrane proteins in genome scale]." Tanpakushitsu Kakusan Koso 47(8 Suppl): 1102-8.
Mizuguchi, K. (2002). "[An informatic perspective on structural proteomics]." Tanpakushitsu Kakusan Koso 47(8 Suppl): 1058-63.
Monni, O., S. Hautaniemi, et al. (2002). "[Gene chip technique and associated bioinformatics]." Duodecim 118(11): 1157-66.
Morgan, K. T. (2002). "Gene expression analysis reveals chemical-specific profiles." Toxicol Sci 67(2): 155-6. The articles highlighted in this issue are "Gene Expression Analysis Reveals Chemical-Specific Profiles" by Hisham K. Hamadeh, Pierre R. Bushel, Supriya Jayadev, Karla Martin, Olimpia DiSorbo, Stella Sieber, Lee Bennett, Raymond Tennant, Raymond Stoll, J. Carl Barrett, Kerry Blanchard, Richard S. Paules, and Cynthia A. Afshari (pp. 219-231) and "Prediction of Compound Signature Using High Density Gene Expression Profiling" by Hisham K. Hamadeh, Pierre R. Bushel, Supriya Jayadev, Olimpia DiSorbo, Leping Li, Raymond Tennant, Raymond Stoll, J. Carl Barrett, Richard S. Paules, Kerry Blanchard, and Cynthia A. Afshari (pp. 232-240).
Morreale, A., I. Iriepa, et al. (2002). "The 5-HT(3) and nACh ionotropic receptors: a perspective from the computational chemistry point of view." Curr Med Chem 9(1): 99-125. Recent contributions applying Computational Chemistry to serotonin-3 and nicotinic acetylcholine ionotropic receptors are reviewed. These two receptors constitute a good example for the examination of the computational protocols that have been used to understand how they work. On the one hand, (5-HT(3)R) receptor mapping techniques have been mostly employed in its study and very few examples of receptor fitting have been appeared. On the other hand, (nAChR) has been studied mainly from the receptor fitting point of view, although many contributions using receptor mapping exist. In the first case, antagonists seems to be more important that agonists, so more works are devoted to them. In the second case, agonist development is the main issue. Although far for being complete, in either of the cases we have working pharmacophores as well as 3D models for their binding sites that are ready to be used as a starting guess to design potential drugs. It is noteworthy that the absence of crystallographic structure for these receptors has motivated the interest in their study, constituting an interesting and challenging field. Mutagenesis experiments have allowed the establishment of main amino acids that are essential in the receptor functioning and then, interaction models have been postulated. Although most of the models are speculative in nature, some of them have been proved to be valuable tools for drug design. This scientific field is already open and many areas are still unexplored. Computational tools for treating these issues exist in a wide variety and their rational application would produce the answers to the structure and functioning of these receptors.
Moxon, R. and R. Rappuoli (2002). "Bacterial pathogen genomics and vaccines." Br Med Bull 62: 45-58. Infectious diseases remain a major cause of deaths and disabilities in the world, the majority of which are caused by bacteria. Although immunisation is the most cost effective and efficient means to control microbial diseases, vaccines are not yet available to prevent many major bacterial infections. Examples include dysentery (shigellosis), gonorrhoea, trachoma, gastric ulcers and cancer (Helicobacter pylori). Improved vaccines are needed to combat some diseases for which current vaccines are inadequate. Tuberculosis, for example, remains rampant throughout most countries in the world and represents a global emergency heightened by the pandemic of HIV. The availability of complete genome sequences has dramatically changed the opportunities for developing novel and improved vaccines and facilitated the efficiency and rapidity of their development. Complete genomic databases provide an inclusive catalogue of all potential candidate vaccines for any bacterial pathogen. In conjunction with adjunct technologies, including bioinformatics, random mutagenesis, microarrays, and proteomics, a systematic and comprehensive approach to identifying vaccine discovery can be undertaken. Genomics must be used in conjunction with population biology to ensure that the vaccine can target all pathogenic strains of a species. A proof in principle of the utility of genomics is provided by the recent exploitation of the complete genome sequence of Neisseria meningitidis group B.
Mylvaganam, S. E., M. Prabhakaran, et al. (2002). "Structural proteomics: methods in deriving protein structural information and issues in data management." Biotechniques Suppl: 42-6. Structural proteomics is an emerging paradigm that is gaining importance in the post-genomic era as a valuable discipline to process the protein target information being deciphered. The field plays a crucial role in assigning function to sequenced proteins, defining pathways in which the targets are involved, and understanding structure-function relationships of the protein targets. A key component of this research sector is accessing the three-dimensional structures of protein targets by both experimental and theoretical methods. This then leads to the question of how to store, retrieve, and manipulate vast amounts of sequence (1-D) and structural (3-D) information in a relational format so that extensive data analysis can be achieved. We at SBI have addressed both of these fundamental requirements of structural proteomics. We have developed an extensive collection of three-dimensional protein structures from sequence data and have implemented a relational architecture for data management. In this article we will discuss our approaches to structural proteomics and the tools that life science researchers can use in their discovery efforts.
Naf, D., D. M. Krupke, et al. (2002). "The Mouse Tumor Biology Database: a public resource for cancer genetics and pathology of the mouse." Cancer Res 62(5): 1235-40. Developing genetic mouse models for cancer research has been recognized as an "exceptional opportunity" by the National Cancer Institute. The establishment of bioinformatics resources to facilitate access to published and unpublished data on the genetics and pathology of cancer in different strains of the laboratory mouse is critical to developing and using mouse models of human disease. In this article, we review the Mouse Tumor Biology Database (MTB), a public resource for information on cancer genetics, epidemiology, and pathology in genetically defined mice. We outline current content, data acquisition strategies, and query mechanisms for MTB. MTB is accessible on-line at http://tumor.informatics.jax.org.
Nakamura, H., N. Ito, et al. (2002). "[Development of PDBj: Advanced database for protein structures]." Tanpakushitsu Kakusan Koso 47(8 Suppl): 1097-101.
Nakayama, Y. (2002). "[Developing computer models of cellular processes]." Tanpakushitsu Kakusan Koso 47(14): 1956-61.
Naruya, S. (2002). "[How to improve gene phylogeny analysis from sequence data]." Tanpakushitsu Kakusan Koso 47(9): 1240-2.
Noble, D. (2002). "The rise of computational biology." Nat Rev Mol Cell Biol 3(6): 459-63. The year 2001 saw a remarkable burst of interest in biological simulation, with several international meetings on the subject, and the inclusion, by journals, of web site references from which published models can be downloaded. So, why has all this happened so suddenly?
Norinder, U. and M. Haeberlein (2002). "Computational approaches to the prediction of the blood-brain distribution." Adv Drug Deliv Rev 54(3): 291-313. This review attempts to summarise present knowledge related to the theoretical modelling of drug transport across the blood-brain barrier. Several computational protocols are described ranging from quantum mechanics-based approaches through molecular mechanics-related techniques to simple and fast procedures based on only the 2-D graph of the investigated structures. Amazingly, few descriptors have been shown to influence the derived relationships in a significant manner and a cornerstone in most of the described models are terms describing hydrogen bonding. A very quick quantitative assessment of the brain partitioning of a compound has also been devised using the following two rules: If N+O (the number of nitrogen and oxygen atoms) in a molecule is less than or equal to five, it has a high chance of entering the brain. The second rule predicts that if log P-(N+O) is positive then log BB is positive.
Nyholt, D. R. (2002). "GENEHUNTER: your 'one-stop shop' for statistical genetic analysis?" Hum Hered 53(1): 2-7. The past decade has brought a proliferation of statistical genetic (linkage) analysis techniques, incorporating new methodology and/or improvement of existing methodology in gene mapping, specifically targeted towards the localization of genes underlying complex disorders. Most of these techniques have been implemented in user-friendly programs and made freely available to the genetics community. Although certain packages may be more 'popular' than others, a common question asked by genetic researchers is 'which program is best for me?'. To help researchers answer this question, the following software review aims to summarize the main advantages and disadvantages of the popular GENEHUNTER package.
Ortiz De Solorzano, C., S. Costes, et al. (2002). "Applications of quantitative digital image analysis to breast cancer research." Microsc Res Tech 59(2): 119-27. Our studies of radiogenic carcinogenesis in mouse and human models of breast cancer are based on the view that cell phenotype, microenvironment composition, communication between cells and within the microenvironment are important factors in the development of breast cancer. This is complicated in the mammary gland by its postnatal development, cyclic evolution via pregnancy and involution, and dynamic remodeling of epithelial-stromal interactions, all of which contribute to breast cancer susceptibility. Microscopy is the tool of choice to examine cells in context. Specific features can be defined using probes, antibodies, immunofluorescence, and image analysis to measure protein distribution, cell composition, and genomic instability in human and mouse models of breast cancer. We discuss the integration of image acquisition, analysis, and annotation to efficiently analyze large amounts of image data. In the future, cell and tissue image-based studies will be facilitated by a bioinformatics strategy that generates multidimensional databases of quantitative information derived from molecular, immunological, and morphological probes at multiple resolutions. This approach will facilitate the construction of an in vivo phenotype database necessary for understanding when, where, and how normal cells become cancer.
Ota, M. (2002). "[Modern methods for protein fold recognition]." Tanpakushitsu Kakusan Koso 47(2): 181-6.
Ota, M. (2002). "[Secondary structure prediction me]." Tanpakushitsu Kakusan Koso 47(1): 85-90.
Ouzounis, C. A. and P. D. Karp (2002). "The past, present and future of genome-wide re-annotation." Genome Biol 3(2): COMMENT2001. Annotation, the process by which structural or functional information is inferred for genes or proteins, is crucial for obtaining value from genome sequences. We define the process of annotating a previously annotated genome sequence as 're-annotation', and examine the strengths and weaknesses of current manual and automatic genome-wide re-annotation approaches.
Paine, K. and D. R. Flower (2002). "Bacterial bioinformatics: pathogenesis and the genome." J Mol Microbiol Biotechnol 4(4): 357-65. As the number of completed microbial genome sequences continues to grow, there is a pressing need for the exploitation of this wealth of data through a synergistic interaction between the well-established science of bacteriology and the emergent discipline of bioinformatics. Antibiotic resistance and pathogenicity in virulent bacteria has become an increasing problem, with even the strongest drugs useless against some species, such as multi-drug resistant Enterococcus faecium and Mycobacterium tuberculosis. The global spread of Human Immunodeficiency Virus (HIV) and Acquired Immune Deficiency Syndrome (AIDS) has contributed to the re-emergence of tuberculosis and the threat from new and emergent diseases. To address these problems, bacterial pathogenicity requires redefinition as Koch's postulates become obsolete. This review discusses how the use of bacterial genomic information, and the in silico tools available at present, may aid in determining the definition of a current pathogen. The combination of both fields should provide a rapid and efficient way of assisting in the future development of antimicrobial therapies.
Pardanani, A., E. D. Wieben, et al. (2002). "Primer on medical genomics. Part IV: Expression proteomics." Mayo Clin Proc 77(11): 1185-96. Proteomics, simply defined, is the study of proteomes. More completely, proteomics is defined as the study of all proteins, including their relative abundance, distribution, posttranslational modifications, functions, and interactions with other macromolecules, in a given cell or organism within a given environment and at a specific stage in the cell cycle. Proteins carry out the biological functions encoded by genes; hence, once the initial stage of genome sequencing and gene discovery is completed, a study of the proteome must be undertaken to address fundamental biological questions. The 3 broad areas are expression proteomics, which catalogues the relative abundance of proteins; cell-mapping or cellular proteomics, which delineates functional protein-protein interactions and organelle-specific protein distribution; and structural proteomics, which characterizes the 3-dimensional structure of proteins. With these approaches, proteins are studied on a global scale using a synergistic combination of powerful, high-throughput technologies, including 2-dimensional polyacrylamide gel electrophoresis, mass spectrometry, multidimensional liquid chromatography, and bioinformatics. Mass spectrometry, which provides highly accurate molecular mass measurements, has emerged as the analytical technology of choice for protein identification, characterization, and sequencing. This task has been made considerably easier with the availability of complete, nonredundant, and annotated genome sequence databases for many organisms. This article reviews the area of expression proteomics.
Pattabiraman, N. (2002). "Analysis of ligand-macromolecule contacts: computational methods." Curr Med Chem 9(5): 609-21. Due to the many technological advancements in biology and development of new fields such as biotechnology and bioinformatics, our knowledge of cellular functions has been growing rapidly; and Biology has entered the Information Age. Along with the technological advancements has come a rapid increase in identification of biomolecular targets involved in diseases. Recently, structure-based drug design studies have emphasized integration of the clinical, cellular, biochemical, structural, and biophysical knowledge of the target. Due to advances in sequencing the human genome, in chemical synthesis and structure determination of biological targets using X-ray and NMR techniques, and in high-performance computing, many scientists from both experimental and theoretical fields focus on structure-based drug design. As scientists in such wide-ranging disciplines, we must understand the data from and educate one another about the strengths and weaknesses of our various disciplines. Since 1990, we have been using computers to visually evaluate ligand binding. In this review, the author will focus on computational methods that not only visualize but also quantify the nature and strength of ligand-macromolecule contacts. Such quantification can be very useful both for medicinal chemists to design ligands and for molecular biologists to design rational protein design experiments to study the effect of amino acid changes on ligand binding.
Pertea, M. and S. L. Salzberg (2002). "Computational gene finding in plants." Plant Mol Biol 48(1-2): 39-48. Automated methods for identifying protein coding regions in genomic DNA have progressed significantly in recent years, but there is still a strong need for more accurate computational solutions to the gene finding problem. Large-scale genome sequencing projects depend greatly on gene finding to generate accurate and complete gene annotation. Improvements in gene finding software are being driven by the development of better computational algorithms, a better understanding of the cell's mechanisms for transcription and translation, and the enormous increases in genomic sequence data. This paper reviews some of the most widely used algorithms for gene finding in plants, including technical descriptions of how they work and recent measurements of their success on the genomes of Arabidopsis thaliana and rice.
Petrovsky, N. and V. Brusic (2002). "Computational immunology: The coming of age." Immunol Cell Biol 80(3): 248-54. The explosive growth in biotechnology combined with major advances in information technology has the potential to radically transform immunology in the postgenomics era. Not only do we now have ready access to vast quantities of existing data, but new data with relevance to immunology are being accumulated at an exponential rate. Resources for computational immunology include biological databases and methods for data extraction, comparison, analysis and interpretation. Publicly accessible biological databases of relevance to immunologists number in the hundreds and are growing daily. The ability to efficiently extract and analyse information from these databases is vital for efficient immunology research. Most importantly, a new generation of computational immunology tools enables modelling of peptide transport by the transporter associated with antigen processing (TAP), modelling of antibody binding sites, identification of allergenic motifs and modelling of T-cell receptor serial triggering.
Phelps, T. J., A. V. Palumbo, et al. (2002). "Metabolomics and microarrays for improved understanding of phenotypic characteristics controlled by both genomics and environmental constraints." Curr Opin Biotechnol 13(1): 20-4. Advances in our understanding of functional genomics are best addressed by integrative studies that include measurements of mRNA, proteins, and low molecular weight metabolites over time and varied conditions. Bioinformatics can then be used to relate this data to the genome. Current technology allows for comprehensive and rapid mRNA expression profiling and mass spectrophotometric measurement of low molecular weight intermediates and metabolic products. In prokaryotic organisms, this combination provides a potentially powerful tool for identifying gene function and regulatory networks even in the absence of a combined proteomic approach.
Phoenix, D. A., F. Harris, et al. (2002). "The prediction of amphiphilic alpha-helices." Curr Protein Pept Sci 3(2): 201-21. A number of sequence-based analyses have been developed to identify protein segments, which are able to form membrane interactive amphiphilic alpha-helices. Earlier techniques attempted to detect the characteristic periodicity in hydrophobic amino acid residues shown by these structure and included the Molecular Hydrophobic Potential (MHP), which represents the hydrophobicity of amino acid residues as lines of isopotential around the alpha-helix and analyses based on Fourier transforms. These latter analyses compare the periodicity of hydrophobic residues in a putative alpha-helical sequence with that of a test mathematical function to provide a measure of amphiphilicity using either the Amphipathic Index or the Hydrophobic Moment. More recently, the introduction of computational procedures based on techniques such as hydropathy analysis, homology modelling, multiple sequence alignments and neural networks has led to the prediction of transmembrane alpha-helices with accuracies of the order of 95% and transmembrane protein topology with accuracies greater than 75%. Statistical approaches to transmembrane protein modeling such as hidden Markov models have increased these prediction levels to an even higher level. Here, we review a number of these predictive techniques and consider problems associated with their use in the prediction of structure / function relationships, using alpha-helices from G-coupled protein receptors, penicillin binding proteins, apolipoproteins, peptide hormones, lytic peptides and tilted peptides as examples.
Ponomarenko, J. V., G. V. Orlova, et al. (2002). "rSNP_Guide: an integrated database-tools system for studying SNPs and site-directed mutations in transcription factor binding sites." Hum Mutat 20(4): 239-48. Since the human genome was sequenced in draft, single nucleotide polymorphism (SNP) analysis has become one of the keynote fields of bioinformatics. We have developed an integrated database-tools system, rSNP_Guide (http://wwwmgs.bionet.nsc.ru/mgs/systems/rsnp/), devoted to prediction of transcription factor (TF) binding sites, alterations of which could be associated with disease phenotype. By inputting data on alterations in DNA sequence and in DNA binding pattern of an unknown TF, rSNP_Guide searches for a known TF with alterations in the recognition score calculated on the basis of TF site's sequence and consistent with the input alterations in DNA binding to the unknown TF. Our system has been tested on many relationships between known TF sites and diseases, as well as on site-directed mutagenesis data. Experimental verification of rSNP_Guide system was made on functionally important SNPs in human TDO2and mouse K-ras genes. Additional examples of analysis are reported involving variants in the human gammaA-globin (HBG1), hsp70(HSPA1A), and Factor IX (F9) gene promoters.
Razvi, E. (2002). "Market opportunity in computational proteomics." Biotechniques Suppl: 54-8, 60-2. The current exuberance on the potential of proteomics as a means to deploy the wealth of the human genome is expected to last into the coming years. Unlike the genome, a finite entity with a fixed number of base pairs of the genetic material, the proteome is "plastic", changing throughout growth and development and environmental stresses, as well as in pathological situations. Our proteomes change over time, and therefore there is no one proteome; the proteome is for practical purposes an infinite entity. It is therefore crucial to build systems that are capable of manipulating the information content that is the proteome, thence the need for computational proteomics as a discipline. In this Market View article, we present the industry landscape that is emerging in the computational proteomics space. This space is still in its infancy and for the most part undefined; therefore we seek to present the market opportunity in informatics in the drug discovery space and then extend that to an examination of industry trends in proteomics. Thus, the gestalt is a set of predictions as to the evolution of the landscape in computational proteomics over the coming years.
Regnier, F. E., L. Riggs, et al. (2002). "Comparative proteomics based on stable isotope labeling and affinity selection." J Mass Spectrom 37(2): 133-45. Disease, external stimuli (such as drugs and toxins), and mutations cause changes in the rate of protein synthesis, post-translational modification, inter-compartmental transport, and degradation of proteins in living systems. Recognizing and identifying the small number of proteins involved is complicated by the complexity of biological extracts and the fact that post-translational alterations of proteins can occur at many sites in multiple ways. It is shown here that a variety of new tools and methods based on internal standard technology are now being developed to code globally all peptides in control and experimental samples for quantification. The great advantage of these stable isotope-labeling strategies is that mass spectrometers can rapidly target those proteins that have changed in concentration for further analysis. When coupled to stable isotope quantification, targeting can be further focused through chromatographic selection of peptide classes on the basis of specific structural features. Targeting structural features is particularly useful when they are unique to types of regulation or disease. Differential displays of targeted peptides show that stimulus-specific markers are relatively easy to identify and will probably be diagnostically valuable tools.
Resing, K. A. (2002). "Analysis of signaling pathways using functional proteomics." Ann N Y Acad Sci 971: 608-14. Advances in analytical methods for protein analysis by mass spectrometry provide new tools for global analysis of the expressed protein profile of cells (referred to as proteomics). Currently, available methodology samples only part of the proteome. This is sufficient for analysis of signal transduction, because signaling pathways contain enzymes, which modify high-abundance proteins other than those of the pathway. Thus, modulation of the signaling through a pathway will produce a "footprint" in the proteome that is characteristic of a specific cell phenotype. Comparison of different samples to identify these differences in posttranslational modification or protein expression is referred to as functional proteomics. This review surveys the methods in widest use in functional proteomics, as well as a few promising new ones. Although proteomic analyses were first conducted 26 years ago, a renewed interest is fueled by several recent advances. Most important are the availability of public genome and protein databases and the development of high-sensitivity, easy-to-use mass spectrometers and database search engines capable of exploiting these databases. Other important advances include improved two-dimensional polyacrylamide gel electrophoresis (2D-PAGE), computer programs for analysis of the 2D-PAGE gel images, protocols for proteolytic digestion of proteins in excised gel pieces, and low-flow chromatography methods. Despite the limitations of these methods, they can distinguish subtle changes in the phenotype of cells, providing the basis for future studies in regulation of the phenotype.
Sakurai, T., M. Satou, et al. (2002). "[Bioinformatics]." Tanpakushitsu Kakusan Koso 47(12 Suppl): 1500-5.
Sanchez, R., D. Nguyen, et al. (2002). "Diversity in the mechanisms of gene regulation by estrogen receptors." Bioessays 24(3): 244-54. The sequencing of the human genome has opened the way for using bioinformatics to identify sets of genes controlled by specific regulatory signals. Here, we review the unexpected diversity of DNA response elements mediating transcriptional regulation by estrogen receptors (ERs), which control the broad physiological effects of estrogens. Consensus palindromic estrogen response elements are found in only a few known estrogen target genes, whereas most responsive genes contain only low-affinity half palindromes, which may also control regulation by other nuclear receptors. ERs can also regulate gene expression in the absence of direct interaction with DNA, via protein-protein interactions with other transcription factors or by modulating the activity of upstream signaling components, thereby significantly expanding the repertoire of estrogen-responsive genes. These diverse mechanisms of action must be taken into account in screening for potential estrogen-responsive sequences in the genome or in regulatory regions of target genes identified by expression profiling.
Sarai, A. (2002). "[Thermodynamic databases for proteins and interactions]." Tanpakushitsu Kakusan Koso 47(8 Suppl): 1071-5.
Sassetti, C. and E. J. Rubin (2002). "Genomic analyses of microbial virulence." Curr Opin Microbiol 5(1): 27-32. Genomic sequencing of bacterial pathogens is providing an increasing wealth of new data. It is, however, still unclear how this information can be used to develop new experimental approaches. Here, we describe recent efforts to complement existing bacterial genetics with genomic methods for the study of pathogenesis.
Sato, N. (2002). "[UNIX environment for bioinformatics in wet laboratory]." Tanpakushitsu Kakusan Koso 47(5): 633-4.
Satoru, M. (2002). "[Deposition with DNA Date Bank of Japan (DDBJ); its data format and tools for submitions]." Tanpakushitsu Kakusan Koso 47(6): 733-6.
Schachter, V. (2002). "Bioinformatics of large-scale protein interaction networks." Biotechniques Suppl: 16-8, 20-4, 26-7. We survey recent techniques for construction and prediction of large-scale protein interaction networks, focusing on computational processing steps. Special emphasis is placed on critical assessment of data completeness and reliability of the various approaches. Once built, protein interaction networks can be used for functional annotation or to generate higher-level biological hypotheses on pathways.
Schmidt, B., H. Walter, et al. (2002). "Genotypic drug resistance interpretation systems--the cutting edge of antiretroviral therapy." AIDS Rev 4(3): 148-56. The technical quality of genotypic and phenotypic drug resistance testing has considerably improved, and therefore the major challenge now lies in the interpretation of drug resistance. This is due to several facts: (i) in times of combination therapy, the effect of drug resistance-associated mutations cannot be considered independently, (ii) many additive and subtractive interactions between mutations exist, and resistant strains may exhibit varying degrees of cross-resistance, (iii) the phenotype cannot adequately determine slight, but clinically relevant, differences for those drugs with a narrow range of resistance, and (iv) pharmacokinetic interactions may shift relevant levels of drug resistance. Genotypic drug resistance interpretation systems are designed to solve these problems. Rule-based systems incorporate current knowledge about correlations between genotype, phenotype and clinical response. Database-driven systems use the information provided by paired geno- and phenotypic data, applying database matching search or bioinformatic approaches. For detailed comparison, 11 interpretation systems were selected which present a comprehensive system for most of the available drugs, can easily be accessed via the Internet and are regularly updated. The systems were characterized for the source data, access, input, output, and availability of clinical studies. For further comparison, existing clinical databases should be merged into one large database to allow competition between the systems. This may also solve the burning problem of clinically relevant cut-offs. Head-to-head comparisons of interpretation systems require large prospective randomized trials in which only the interpretation system is different between groups, before a consensus can be achieved for the best antiretroviral therapy of the individual patient.
Searls, D. B. (2002). "The language of genes." Nature 420(6912): 211-7. Linguistic metaphors have been woven into the fabric of molecular biology since its inception. The determination of the human genome sequence has brought these metaphors to the forefront of the popular imagination, with the natural extension of the notion of DNA as language to that of the genome as the 'book of life'. But do these analogies go deeper and, if so, can the methods developed for analysing languages be applied to molecular biology? In fact, many techniques used in bioinformatics, even if developed independently, may be seen to be grounded in linguistics. Further interweaving of these fields will be instrumental in extending our understanding of the language of life.
Sharan, R., R. Elkon, et al. (2002). "Cluster analysis and its applications to gene expression data." Ernst Schering Res Found Workshop(38): 83-108.
Simon, R., M. D. Radmacher, et al. (2002). "Design of studies using DNA microarrays." Genet Epidemiol 23(1): 21-36. DNA microarrays are assays that simultaneously provide information about expression levels of thousands of genes and are consequently finding wide use in biomedical research. In order to control the many sources of variation and the many opportunities for misanalysis, DNA microarray studies require careful planning. Different studies have different objectives, and important aspects of design and analysis strategy differ for different types of studies. We review several types of objectives of studies using DNA microarrays and address issues such as selection of samples, levels of replication needed, allocation of samples to dyes and arrays, sample size considerations, and analysis strategies.
Sinchaikul, S., B. Sookkheo, et al. (2002). "Bioinformatics, functional genomics, and proteomics study of Bacillus sp." J Chromatogr B Analyt Technol Biomed Life Sci 771(1-2): 261-87. The ability of bioinformatics to characterize genomic and proteomic sequences from bacteria Bacillus sp. for prediction of genes and proteins has been evaluated. Genomics coupling with proteomics, which is relied on integration of the significant advances recently achieved in two-dimensional (2-D) electrophoretic separation of proteins and mass spectrometry (MS), are now important and high throughput techniques for qualifying and analyzing gene and protein expression, discovering new gene or protein products, and understanding of gene and protein functions including post-genomic study. In addition, the bioinformatics of Bacillus sp. is embraced into many databases that will facilitate to rapidly search the information of Bacillus sp. in both genomics and proteomics. It is also possible to highlight sites for post-translational modifications based on the specific protein sequence motifs that play important roles in the structure, activity and compartmentalization of proteins. Moreover, the secreted proteins from Bacillus sp. are interesting and widely used in many applications especially biomedical applications that are the highly advantages for their potential therapeutic values.
Slonim, D. K. (2002). "From patterns to pathways: gene expression data analysis comes of age." Nat Genet 32 Suppl: 502-8. Many different biological questions are routinely studied using transcriptional profiling on microarrays. A wide range of approaches are available for gleaning insights from the data obtained from such experiments. The appropriate choice of data-analysis technique depends both on the data and on the goals of the experiment. This review summarizes some of the common themes in microarray data analysis, including detection of differential expression, clustering, and predicting sample characteristics. Several approaches to each problem, and their relative merits, are discussed and key areas for additional research highlighted.
Soanes, D. M., W. Skinner, et al. (2002). "Genomics of phytopathogenic fungi and the development of bioinformatic resources." Mol Plant Microbe Interact 15(5): 421-7. Genomic resources available to researchers studying phytopathogenic fungi are limited. Here, we briefly review the genomic and bioinformatic resources available and the current status of fungal genomics. We also describe a relational database containing sequences of expressed sequence tags (ESTs) from three phytopathogenic fungi, Blumeria graminis, Magnaporthe grisea, and Mycosphaerella graminicola, and the methods and underlying principles required for its construction. The database contains significant annotation for each EST sequence and is accessible at http://cogeme.ex.ac.uk. An easy-to-use interface allows the user to identify gene sequences by using simple text queries or homology searches. New querying functions and large sequence sets from a variety of phytopathogenic species will be incorporated in due course.
Srinivas, P. R., M. Verma, et al. (2002). "Proteomics for cancer biomarker discovery." Clin Chem 48(8): 1160-9. The emergence of novel technologies allows researchers to facilitate the comprehensive analyses of genomes, transcriptomes, and proteomes in health and disease. The information that is expected from such technologies may soon exert a dramatic change in the pace of cancer research and impact dramatically on the care of cancer patients. These approaches have already demonstrated the power of molecular medicine in discriminating among disease subtypes that are not recognizable by traditional pathologic criteria and in identifying specific genetic events involved in cancer progression. This review covers a selection of advances in the realm of proteomics and its promise for cancer biomarker discovery. It also addresses issues regarding sample preparation and specificity and discusses current challenges that need to be overcome. Finally, the review touches on the efforts of the Early Detection Research Network at the National Cancer Institute in promoting biomarker discovery for translation at the clinical level.
Srinivasan, K. N., P. Gopalakrishnakone, et al. (2002). "SCORPION, a molecular database of scorpion toxins." Toxicon 40(1): 23-31. Increasing interest in the studies of toxins and the requirements for better structural and functional annotations have created a need for improved data management in the field of toxins. The molecular database, SCORPION, contains more than 200 entries of fully referenced scorpion toxin data including primary sequences, three-dimensional structures, structural and functional annotations of scorpion toxins along with relevant literature references. SCORPION has a set of search tools that allow users to extract data and perform specific queries. These entries have been compiled from public databases and literature, cleaned of errors and enriched with additional structural and functional information. The grouping of scorpion toxins provides a basis for extending and clarifying the existing structural and functional classifications. The bioinformatics modules in SCORPION facilitate analyses aimed at classification of scorpion toxins and identification of sequence patterns associated with specific structural or functional properties of scorpion toxins. The SCORPION database is accessible via the Internet at sdmc.krdl.org.sg:8080/scorpion.
Stenger, D. A., J. D. Andreadis, et al. (2002). "Potential applications of DNA microarrays in biodefense-related diagnostics." Curr Opin Biotechnol 13(3): 208-12. Recent years have witnessed a logarithmic growth in the number of applications involving DNA microarrays. Extrapolation of their use for infectious diagnostics and biodefense-related diagnostics seems obvious. Nevertheless, the application of DNA microarrays to biodefense-related diagnostics will depend on solving a set of substantial, yet approachable, technical and logistical problems that encompass diverse topics from amplification efficiency to bioinformatics.
Stormo, G. D. and K. Tan (2002). "Mining genome databases to identify and understand new gene regulatory systems." Curr Opin Microbiol 5(2): 149-53. The availability of a large number of sequenced microbial genomes allows us to conduct systematic studies on microbial gene regulatory systems. Computational methods, using comparative genomics approaches, are powerful tools to understand their mechanisms and evolutionary history. Recent advances in computational methodology for uncovering transcriptional regulatory components and their interactions are discussed.
Stupka, E. (2002). "Large-scale open bioinformatics data resources." Curr Opin Mol Ther 4(3): 265-74. The data explosion in bioinformatics is relentless. More and more genomes are being sequenced and many new types of datasets are being generated in large-scale projects. Integration and true open access to the data are still difficult issues, although they are gradually being addressed. Notably, certain fields have good standardization and interoperability, while others lag behind. This review summarizes the latest developments in genome and sequences databases, transcriptomics data (ESTs, ORESTES, full-length cDNAs), proteomics data (protein databases, protein structures, family and domain classification) as well as loosely integrated fields, such as microarray experiments, mutation databases and databases of regulatory regions and elements. The review attempts to resist simply summarizing what data are available, and aims to provide a critical look at some of the integration and access issues associated with several of these resources.
Swaroop, A. and D. J. Zack (2002). "Transcriptome analysis of the retina." Genome Biol 3(8): REVIEWS1022. The retina offers unique opportunities to define the molecular and cellular pathways mediating neuronal function and disease because of its morphological complexity, well-defined role in visual transduction and the availability of mutants. These investigations are being greatly facilitated by the ongoing identification of genes expressed in the retina using high-throughput methods.
Tada, M. (2002). "[Yeast-based genetic diagnosis and post-genome sequencing]." Hokkaido Igaku Zasshi 77(2): 145-6.
Takahashi, Y., T. Nagata, et al. (2002). "[GeneChip system from a bioinformatical point of view]." Nippon Yakurigaku Zasshi 120(2): 73-84. GeneChip (Affymetrix, Inc., USA) employs a specific method for spotting DNA probes on chips, which is different from any other DNA chips, and can complete the whole process from sample preparation to data construction and analysis. The GeneChip system can be applied to both gene expression analysis and genomic mutation analysis, which would play an important role in human genome analysis in the future. Techniques for data construction ("wet" experimental techniques), which are the major components in the GeneChip system, are generally established as routine work in the first screening process in most laboratories worldwide. The most important point would be how we exchange experimental data produced by researchers and gene/genome information available both on the public and the commercial bases so that we reduce useful information on gene expression. Recently, the center of the research has been shifting to computing technology for data processing ("bioinformatics"). This article separately deals with gene expression analysis and genomic analysis, with emphasis on bioinformatics. We describe the data on gene expression screening, the gene targeting process, the analysis of genomic DNA mutations using the P53 probe array, and the HuSNP mapping assays, by presenting our experimental examples.
Tamames, J., D. Clark, et al. (2002). "Bioinformatics methods for the analysis of expression arrays: data clustering and information extraction." J Biotechnol 98(2-3): 269-83. Expression arrays facilitate the monitoring of changes in the expression patterns of large collections of genes. The analysis of expression array data has become a computationally-intensive task that requires the development of bioinformatics technology for a number of key stages in the process, such as image analysis, database storage, gene clustering and information extraction. Here, we review the current trends in each of these areas, with particular emphasis on the development of the related technology being carried out within our groups.
Taylor, W. R. (2002). "Comparing secondary structure 'stick' models of proteins using graph matching with double dynamic programming." Ernst Schering Res Found Workshop(38): 133-48.
Todd, R. and D. T. Wong (2002). "DNA hybridization arrays for gene expression analysis of human oral cancer." J Dent Res 81(2): 89-97. DNA hybridization arrays permit global gene expression profiling to be done in a single experiment. The evolution and challenges of DNA hybridization arrays are reflected in the variety of experimental platforms, probe composition, hybridization/signal detection methods, and bioinformatic interpretation. In tumor biology, DNA hybridization arrays are being used for gene/gene pathway discovery, diagnosis, and therapeutic design. Similar applications are advancing our understanding of oral cancer cell biology.
Turner, M. J. and R. A. Colbert (2002). "HLA-B27 and pathogenesis of spondyloarthropathies." Curr Opin Rheumatol 14(4): 367-72. Although the influence of HLA-B27 on the development of spondyloarthropathies is undisputed, its role in pathogenesis remains unclear. New ideas have focused on abnormal characteristics of HLA-B27 resulting from aberrant folding, disulfide bond formation, or both, rather than a predilection for selecting arthritogenic peptides. This reflects, in part, unanswered questions about whether immunologic recognition of HLA-B27 is required for disease. Recent studies suggest that CD4+ T cells, immunomodulatory killer cell Ig receptors, and Ig-like transcript receptors may recognize aberrant forms of HLA-B27. Other reports suggest that HLA-B27 expression can alter cytokine production from monocytes and T cells-effects that appear unrelated to antigen presentation. Novel bioinformatics approaches have led to the identification of HLA-B27-restricted pathogen-derived peptides and may prove useful in determining whether HLA-B27 presents arthritogenic peptides. Elucidating the role of HLA-B27 in the pathogenesis of these conditions will require an integration of information from animal models, genome-wide screens for susceptibility alleles, and translational studies using human samples.
Valaskovic, G. A. and N. L. Kelleher (2002). "Miniaturized formats for efficient mass spectrometry-based proteomics and therapeutic development." Curr Top Med Chem 2(1): 1-12. Off-line miniaturized "nano-spray" formats for electrospray ionization mass spectrometry (ESI-MS) enable the routine identification of femtomole quantities of protein or peptide. Even greater strides have been achieved using on-line miniaturized ESI-MS methods, such as nanobore LC-MS and CE-MS. On-line methods enable greater sensitivity (sub-attomole limit of detection), dynamic range, and throughput. In either off- or on-line methods for protein analysis, samples are typically isolated and digested enzymatically, with MS analysis of the peptide fragments, yielding 5-50% sequence coverage, in a "bottom-up" approach. Obtaining biologically relevant (structure/function) information (such as the localization of regions of error or post-transnational modifications) often demands 100% sequence coverage and this may be obtained by analyzing intact proteins by MS with a "top-down" methodology. Proteome wide success with top-down methods will require the development of novel miniaturized approaches for sample preparation along with new tools for bioinformatics. As these miniaturized formats continue to power proteomics applications, they will undoubtedly pollinate "cross-over" applications in LC-MS ranging from drug discovery to development. An example of metabolite identification using an order of magnitude less sample than usually required, with a concurrent order of magnitude increase in signal, illustrates the potential of miniaturized formats in lead characterization activities.
Valdar, W. S. (2002). "Scoring residue conservation." Proteins 48(2): 227-41. The importance of a residue for maintaining the structure and function of a protein can usually be inferred from how conserved it appears in a multiple sequence alignment of that protein and its homologues. A reliable metric for quantifying residue conservation is desirable. Over the last two decades many such scores have been proposed, but none has emerged as a generally accepted standard. This work surveys the range of scores that biologists, biochemists, and, more recently, bioinformatics workers have developed, and reviews the intrinsic problems associated with developing and evaluating such a score. A general formula is proposed that may be used to compare the properties of different particular conservation scores or as a measure of conservation in its own right.
van Helden, J., L. Wernisch, et al. (2002). "Graph-based analysis of metabolic networks." Ernst Schering Res Found Workshop(38): 245-74.
Villoutreix, B. O. (2002). "Structural bioinformatics: methods, concepts and applications to blood coagulation proteins." Curr Protein Pept Sci 3(3): 341-64. Structural and theoretical analyses of proteins are central to the understanding of complex molecular mechanisms and are fundamental to the drug discovery process. Computational techniques yield useful insights into an ever-wider range of biomolecular systems. Protein three-dimensional structures and molecular functions can be predicted in some circumstances, while experimental structures can be analyzed in depth via such computational approaches. Non-covalent binding of biomolecules can be understood by considering structural, thermodynamic and kinetic issues, and theoretical simulations of such events can be attempted. The central role of electrostatic interactions with regard to protein function, structure and stability has been investigated and some electrostatic properties can be modeled theoretically. Computer methods thus help to prioritize, design, analyze and rationalize biochemical experiments. Cardiovascular diseases and associated blood coagulation disorders are leading causes of death worldwide. Blood coagulation involves more than 30 proteins that interact specifically with various degrees of affinity. Many of these molecules can also bind transiently to phospholipid surfaces. Numerous point mutations in the genes of coagulation proteins and regulators have been identified. Understanding the coagulation cascade, its regulation and the impact of mutations is required for the development of new therapies and diagnostic tools. In this review, we describe concepts and methods pertaining to the field of structural bioinformatics. We provide examples of applications of these approaches to blood coagulation proteins and show that such studies can give insights about molecular mechanisms contributing to cardiovascular disease susceptibility.
Vizirianakis, I. S. (2002). "Pharmaceutical education in the wake of genomic technologies for drug development and personalized medicine." Eur J Pharm Sci 15(3): 243-50. The development of safe and effective new therapeutics is a long, difficult, and expensive process. Over the last 20-30 years, recombinant DNA (rDNA) technology has provided a multiple of new methods, molecular targets and DNA-based diagnostics to pharmaceutical research that can be utilized in assays for screening and developing potential biopharmaceutical drugs. In parallel, new innovative approaches to drug delivery systems were discovered and reached the market. Pharmaceutical biotechnology, pharmacogenomics, combinatorial chemistry, in close relation to high-throughput screening technologies, and bioinformatics are major advances that give a new direction to pharmaceutical sciences. To meet with the needs of this new dynamic era of pharmaceutical research and health care environment, pharmaceutical education has to set new priorities to keep pace with the challenges related to genomic technologies. The development of new initiative education programs, for both undergraduate and graduate curricula, in pharmacy has to be focused on preparing pharmacists oriented for both pharmacy practice and drug research and development. This can be achieved by providing future pharmacists with knowledge, skills and attitudes to be more competitive in the health care system, pharmacy practice-related fields, pharmaceutical industry and drug research and development areas, or finally in academia. Educators and pharmacy school members have the responsibility of deciding how, to what extent, by which methods, and/or in which way these changes and new directions in the education programs should be developed.
von Heijne, G. (2002). "Bioinformatics of membrane proteins." Ernst Schering Res Found Workshop(38): 17-27.
Vuilleumier, S. and M. Pagni (2002). "The elusive roles of bacterial glutathione S-transferases: new lessons from genomes." Appl Microbiol Biotechnol 58(2): 138-46. Glutathione S-transferases constitute a large family of enzymes which catalyze the addition of glutathione to endogenous or xenobiotic, often toxic electrophilic chemicals. Eukaryotic glutathione S-transferases usually promote the inactivation, degradation or excretion of a wide range of compounds by formation of the corresponding glutathione conjugates. In bacteria, by contrast, the few glutathione S-transferases for which substrates are known, such as dichloromethane dehalogenase, 1,2-dichloroepoxyethane epoxidase and tetrachlorohydroquinone reductase, are catabolic enzymes with an essential role for growth on recalcitrant chemicals. Glutathione S-transferase genes have also been found in bacterial operons and gene clusters involved in the degradation of aromatic compounds. Information from bacterial genome sequencing projects now suggests that glutathione S-transferases are present in large numbers in proteobacteria. In particular, the genomes of three Pseudomonas species each include at least ten different glutathione S-transferase genes. Several of the corresponding proteins define new classes of the glutathione S-transferase family and may also have novel functions that remain to be elucidated.
Walker, J., D. Flower, et al. (2002). "Microarrays in hematology." Curr Opin Hematol 9(1): 23-9. Microarrays are fast becoming routine tools for the high-throughput analysis of gene expression in a wide range of biologic systems, including hematology. Although a number of approaches can be taken when implementing microarray-based studies, all are capable of providing important insights into biologic function. Although some technical issues have not been resolved, microarrays will continue to make a significant impact on hematologically important research.
Weinstein, J. N., U. Scherf, et al. (2002). "The bioinformatics of microarray gene expression profiling." Cytometry 47(1): 46-9.
Westerhoff, H. V., W. M. Getz, et al. (2002). "Bioinformatics, cellular flows, and calculation." Ernst Schering Res Found Workshop(38): 221-43.
Whipple, M. E. and W. P. Kuo (2002). "DNA microarrays in otolaryngology-head and neck surgery." Otolaryngol Head Neck Surg 127(3): 196-204. OBJECTIVES: Our goal was to review the technologies underlying DNA microarrays and to explore their use in otolaryngology-head and neck surgery. STUDY DESIGN: The current literature relating to microarray technology and methodology is reviewed, specifically the use of DNA microarrays to characterize gene expression. Bioinformatics involves computational and statistical methods to extract, organize, and analyze the huge amounts of data produced by microarray experiments. The means by which these techniques are being applied to otolaryngology-head and neck surgery are outlined. RESULTS: Microarray technologies are having a substantial impact on biomedical research, including many areas relevant to otolaryngology-head and neck surgery. CONCLUSIONS: DNA microarrays allow for the simultaneous investigation of thousands of individual genes in a single experiment. In the coming years, the application of these technologies to clinical medicine should allow for unprecedented methods of diagnosis and treatment. SIGNIFICANCE: These highly parallel experimental techniques promise to revolutionize gene discovery, disease characterization, and drug development.
Wilson, J. W., M. J. Schurr, et al. (2002). "Mechanisms of bacterial pathogenicity." Postgrad Med J 78(918): 216-24. Pathogenic bacteria utilise a number of mechanisms to cause disease in human hosts. Bacterial pathogens express a wide range of molecules that bind host cell targets to facilitate a variety of different host responses. The molecular strategies used by bacteria to interact with the host can be unique to specific pathogens or conserved across several different species. A key to fighting bacterial disease is the identification and characterisation of all these different strategies. The availability of complete genome sequences for several bacterial pathogens coupled with bioinformatics will lead to significant advances toward this goal.
Wilson, V. and F. L. Conlon (2002). "The T-box family." Genome Biol 3(6): REVIEWS3008. SUMMARY: Transcription factors of the T-box family are required both for early cell-fate decisions, such as those necessary for formation of the basic vertebrate body plan, and for differentiation and organogenesis. When mutated, T-box genes give dramatic phenotypes in mouse and zebrafish, and they have been implicated both in fundamentals of limb patterning and in a number of human congenital malformations such as Holt-Oram, ulnar-mammary and DiGeorge syndromes, as well as being amplified in a subset of cancers. Genes encoding members of the T-box family have recently been shown to comprise approximately 0.1% of genomes as diverse as those of nematodes and humans and have been identified in a wide variety of animals from ctenophores (comb jellies) to mammals; they are, however, completely absent from genomes from other organisms (such as the model plant Arabidopsis thaliana).
Wolf, Y. I., I. B. Rogozin, et al. (2002). "Genome trees and the tree of life." Trends Genet 18(9): 472-9. Genome comparisons indicate that horizontal gene transfer and differential gene loss are major evolutionary phenomena that, at least in prokaryotes, involve a large fraction, if not the majority, of genes. The extent of these events casts doubt on the feasibility of constructing a 'Tree of Life', because the trees for different genes often tell different stories. However, alternative approaches to tree construction that attempt to determine tree topology on the basis of comparisons of complete gene sets seem to reveal a phylogenetic signal that supports the three-domain evolutionary scenario and suggests the possibility of delineation of previously undetected major clades of prokaryotes. If the validity of these whole-genome approaches to tree building is confirmed by analyses of numerous new genomes, which are currently being sequenced at an increasing rate, it would seem that the concept of a universal 'species' tree is still appropriate. However, this tree should be reinterpreted as a prevailing trend in the evolution of genome-scale gene sets rather than as a complete picture of evolution.
Wright, J. T. and T. C. Hart (2002). "The genome projects: implications for dental practice and education." J Dent Educ 66(5): 659-71. Information from the Human Genome Project (HGP) and the integration of information from related areas of study and technology will dramatically change health care for the craniofacial complex. Approaches to risk assessment and diagnosis, prevention, early intervention, and management of craniofacial conditions are and will continue to evolve through the application of this new knowledge. While this information will advance our health care abilities, it is clear that the dental profession will face challenges regarding the acquisition, application, transfer, and effective and efficient use of this knowledge with regards to dental research, dental education, and clinical practice. Unraveling the human genomic sequence now allows accurate diagnosis of numerous craniofacial conditions. However, the greatest oral disease burden results from dental caries and periodontal disease that are complex disorders having both hereditary and environmental factors determining disease risk, progression, and course. Disease risk assessment, prevention, and therapy, based on knowledge from the HGP, will likely vary markedly for the different complex conditions affecting the head and neck. Integration of Information from the human genome, comparative and microbial genomics, proteomics, bioinformatics, and related technologies will provide the basis for proactive prevention and intervention and novel and more efficient treatment approaches. Oral health care practitioners will increasingly require knowledge of human genetics and the application of new molecular-based diagnostic and therapeutic technologies.
Yada, T. (2002). "[Finding genes from genome sequences]." Tanpakushitsu Kakusan Koso 47(3): 276-80.
Yao, T. (2002). "Bioinformatics for the genomic sciences and towards systems biology. Japanese activities in the post-genome era." Prog Biophys Mol Biol 80(1-2): 23-42. The knowledge gleaned from genome sequencing and post-genome analyses is having a very significant impact on a whole range of life sciences and their applications. 'Genome-wide analysis' is a good keyword to represent this tendency. Thanks to innovations in high-throughput measurement technologies and information technologies, genome-wide analysis is becoming available in a broad range of research fields from DNA sequences, gene and protein expressions, protein structures and interactions, to pathways or networks analysis. In fact, the number of research targets has increased by more than two orders in recent years and we should change drastically the attitude to research activities. The scope and speed of research activities are expanding and the field of bioinformatics is playing an important role. In parallel with the data-driven research approach that focuses on speedy handling and analyzing of the huge amount of data, a new approach is gradually gaining power. This is a 'model-driven research' approach, that incorporates biological modeling in its research framework. Computational simulations of biological processes play a pivotal role. By modeling and simulating, this approach aims at predicting and even designing the dynamic behaviors of complex biological systems, which is expected to make rapid progress in life science researches and lead to meaningful applications to various fields such as health care, food supply and improvement of environment. Genomic sciences are now advancing as great frontiers of research and applications in the 21st century.This article starts with surveying the general progress of bioinformatics (Section 1), and describes Japanese activities in bioinformatics (Section 2). In Section 3, I will introduce recent developments in Systems Biology which I think will become more important in the future.
Yao, T. (2002). "[Movements on the systems biology in the post-genome era: from genome, gene and protein analysis to biosystems analysis]." Tanpakushitsu Kakusan Koso 47(9): 1229-35.
Yasugi, E. and K. Watanabe (2002). "[LIPIDBANK for Web, the newly developed lipid database]." Tanpakushitsu Kakusan Koso 47(7): 837-41.
Zharkov, D. O. and A. P. Grollman (2002). "Combining structural and bioinformatics methods for the analysis of functionally important residues in DNA glycosylases." Free Radic Biol Med 32(12): 1254-63. An essential function of DNA glycosylases is the recognition and excision of damaged bases in DNA, thereby preserving genomic integrity. Lesion recognition is a multistep process, which is only partially revealed by structural analysis of the catalytically competent complex. The functional role of additional residues can be predicted by combining structural data with analysis of amino acid conservation. The following postulate underlies this approach: if a family or superfamily can be broken into subgroups with different substrate specificities, residues highly conserved between these subgroups represent those important for enzyme catalysis and structure maintenance while residues highly conserved within a subgroup but not between subgroups represent residues important for substrate specificity. We review the bioinformatics approach used for this quantitative analysis and describe its application to the Nth superfamily and Fpg family of DNA glycosylases. These results serve as a starting point in planning site-directed mutagenesis experiments to elucidate the functional role of similar and dissimilar residues in DNA repair and other proteins.
|