|
Home
About Us
eMedicine Search
Drug Development
Feedback
Google Scholar Search
Intranet |
|
Enhanced by
Neuroinformation Bioinformatics Reviews: 2000 (111 References) (2000). "Data mining." Nat Biotechnol 18 Suppl: IT35-6.
(2000). "Bioinformatics." Nat Biotechnol 18 Suppl: IT31-4.
Apweiler, R. (2000). "Protein sequence databases." Adv Protein Chem 54: 31-71.
Aravind, L. (2000). "Guilt by association: contextual information in genome analysis." Genome Res 10(8): 1074-7.
Archakov, A. I. (2000). "[What lies beyond genomics?--Proteomics]." Vopr Med Khim 46(4): 335-43.
Ashburner, M. (2000). "A biologist's view of the Drosophila genome annotation assessment project." Genome Res 10(4): 391-3.
Attwood, T. K. (2000). "The quest to deduce protein function from sequence: the role of pattern databases." Int J Biochem Cell Biol 32(2): 139-55. In the wake of the numerous now-fruitful genome projects, we have witnessed a 'tsunami' of sequence data and with it the birth of the field of bioinformatics. Bioinformatics involves the application of information technology to the management and analysis of biological data. For many of us, this means that databases and their search tools have become an essential part of the research environment. However, the rate of sequence generation and the haphazard proliferation of databases have made it difficult to keep pace with developments, even for the cognoscenti. Moreover, increasing amounts of sequence information do not necessarily equate with an increase in knowledge, and in the panic to automate the route from raw data to biological insight, we may be generating and propagating innumerable errors in our precious databases. In the genome era upon us, researchers want rapid, easy-to-use, reliable tools for functional characterisation of newly determined sequences. For the pharmaceutical industry in particular, the Pandora's box of bioinformatics harbours an information-rich nugget, ripe with potential drug targets and possible new avenues for the development of therapeutic agents. This review outlines the current status of the major pattern databases now used routinely in the analysis of protein sequences. The review is divided into three main sections. In the first, commonly used terms are defined and the methods behind the databases are briefly described; in the second, the structure and content of the principal pattern databases are discussed; and in the final part, several alignment databases, which are frequently confused with pattern databases, are mentioned. For the new-comer, the array of resources, the range of methods behind them and the different tools required to search them can be confusing. The review therefore also briefly mentions a current international endeavour to integrate the diverse databases, which effort should facilitate sequence analysis in the future. This is particularly important for target-discovery programmes, where the challenge is to rationalise the enormous numbers of potential targets generated by sequence database searches. This problem may be addressed, at least in part, by reducing search outputs to the more focused and manageable subsets suggested by searches of integrated groups of family-specific pattern databases.
Bajic, V. B. (2000). "Comparing the success of different prediction software in sequence analysis: a review." Brief Bioinform 1(3): 214-28. The abundance of computer software for different types of prediction in DNA and protein sequence analyses raises the problem of adequate ranking of prediction program quality. A single measure of success of predictor software, which adequately ranks the predictors, does not exist. A typical example of such an incomplete measure is the so-called correlation coefficient. This paper provides an overview and short analysis of several different measures of prediction quality. Frequently, some of these measures give results contradictory to each other even when they relate to the same prediction scores.This may lead to confusion. In order to overcome some of the problems, a few new measures are proposed including some variants of a 'generalised distance from the ideal predictor score'; these are based on topological properties, rather than on statistics. In order to provide a sort of a balanced ranking, the averaged score measure (ASM) is introduced.The ASM provides a possibility for the selection of the predictor that probably has the best overall performance.The method presented in the paper applies to the ranking problem of any prediction software whose results can be properly represented in a true positive-false positive framework, thus providing a natural set-up for linear biological sequence analysis.
Baldi, P., S. Brunak, et al. (2000). "Assessing the accuracy of prediction algorithms for classification: an overview." Bioinformatics 16(5): 412-24. We provide a unified overview of methods that currently are widely used to assess the accuracy of prediction algorithms, from raw percentages, quadratic error measures and other distances, and correlation coefficients, and to information theoretic measures such as relative entropy and mutual information. We briefly discuss the advantages and disadvantages of each approach. For classification tasks, we derive new learning algorithms for the design of prediction systems by directly optimising the correlation coefficient. We observe and prove several results relating sensitivity and specificity of optimal systems. While the principles are general, we illustrate the applicability on specific problems such as protein secondary structure and signal peptide prediction.
Bassingthwaighte, J. B. (2000). "Strategies for the physiome project." Ann Biomed Eng 28(8): 1043-58. The physiome is the quantitative description of the functioning organism in normal and pathophysiological states. The human physiome can be regarded as the virtual human. It is built upon the morphome, the quantitative description of anatomical structure, chemical and biochemical composition, and material properties of an intact organism, including its genome, proteome, cell, tissue, and organ structures up to those of the whole intact being. The Physiome Project is a multicentric integrated program to design, develop, implement, test and document, archive and disseminate quantitative information, and integrative models of the functional behavior of molecules, organelles, cells, tissues, organs, and intact organisms from bacteria to man. A fundamental and major feature of the project is the databasing of experimental observations for retrieval and evaluation. Technologies allowing many groups to work together are being rapidly developed. Internet II will facilitate this immensely. When problems are huge and complex, a particular working group can be expert in only a small part of the overall project. The strategies to be worked out must therefore include how to pull models composed of many submodules together even when the expertise in each is scattered amongst diverse institutions. The technologies of bioinformatics will contribute greatly to this effort. Developing and implementing code for large-scale systems has many problems. Most of the submodules are complex, requiring consideration of spatial and temporal events and processes. Submodules have to be linked to one another in a way that preserves mass balance and gives an accurate representation of variables in nonlinear complex biochemical networks with many signaling and controlling pathways. Microcompartmentalization vitiates the use of simplified model structures. The stiffness of the systems of equations is computationally costly. Faster computation is needed when using models as thinking tools and for iterative data analysis. Perhaps the most serious problem is the current lack of definitive information on kinetics and dynamics of systems, due in part to the almost total lack of databased observations, but also because, though we are nearly drowning in new information being published each day, either the information required for the modeling cannot be found or has never been obtained. "Simple" things like tissue composition, material properties, and mechanical behavior of cells and tissues are not generally available. The development of comprehensive models of biological systems is a key to pharmaceutics and drug design, for the models will become gradually better predictors of the results of interventions, both genomic and pharmaceutic. Good models will be useful in predicting the side effects and long term effects of drugs and toxins, and when the models are really good, to predict where genomic intervention will be effective and where the multiple redundancies in our biological systems will render a proposed intervention useless. The Physiome Project will provide the integrating scientific basis for the Genes to Health initiative, and make physiological genomics a reality applicable to whole organisms, from bacteria to man.
Baumeister, W. and A. C. Steven (2000). "Macromolecular electron microscopy in the era of structural genomics." Trends Biochem Sci 25(12): 624-31. Macromolecular machines carry out many cellular functions. Cryo-electron microscopy (cryo-EM) is emerging as a powerful method for studying the structure, assembly and dynamics of such macromolecules, and their interactions with substrates. With resolutions still improving, 'single-particle' analyses are already depicting secondary structure. Moreover, cryo-EM can be combined in several ways with X-ray diffraction to enhance the resolution of cryo-EM and the applicability of crystallography. Electron tomography holds promise for visualizing machines at work inside cells.
Becich, M. J. (2000). "The role of the pathologist as tissue refiner and data miner: the impact of functional genomics on the modern pathology laboratory and the critical roles of pathology informatics and bioinformatics." Mol Diagn 5(4): 287-99. This article provides an overview of how functional genomics is likely to impact on the pathology laboratory and highlights how informatics and tissue banking will greatly facilitate the molecular age of medicine. Important aspects of functional genomics in the post-genome era, including the roles of laser capture microdissection, DNA- and complementary DNA-based microarrays, proteomic methods, collaborative human tissue banking, tissue microarrays, and pathobioinformatics in the modern pathology laboratory are discussed. The role of mass spectroscopy in the analysis of RNA, DNA, and protein and its impact on the clinical laboratory, particularly in cost-effectiveness and time savings, are evaluated. This article explores how laboratory information systems (LISs) and the devices that feed them information may need to be modified to adapt to greater volumes of data for the new testing modalities that require understanding sophisticated fluorescence detection methods and image processing. Emerging genomic testing methods and their impact on pathology laboratory testing, especially in the area of molecular classification of neoplasms, are examined. The role of the tissue bank in the modern pathology laboratory as an archive of control normal tissues, as well as subsamples of the spectrum of progressive neoplastic states, is discussed in light of its critical importance to the molecular classification of cancer. Establishing a database that combines structured reports in pathology LISs and construction of tissue banking information systems will provide a rich resource for pathology departments. The article discusses a hypothetical resource, such as the Shared Tumor Expression Profiler, that would provide access to well-characterized tissue-based research resources for clinicians and researchers. Last, the article emphasizes how LISs can prepare for these changes, and how training pathologists in pathology informatics and bioinformatics (pathobioinformatics) is critical to ensure pathology's overall leadership role in the post-genome era.
Benner, S. A., S. G. Chamberlin, et al. (2000). "Functional inferences from reconstructed evolutionary biology involving rectified databases--an evolutionarily grounded approach to functional genomics." Res Microbiol 151(2): 97-106. If bioinformatics tools are constructed to reproduce the natural, evolutionary history of the biosphere, they offer powerful approaches to some of the most difficult tasks in genomics, including the organization and retrieval of sequence data, the updating of massive genomic databases, the detection of database error, the assignment of introns, the prediction of protein conformation from protein sequences, the detection of distant homologs, the assignment of function to open reading frames, the identification of biochemical pathways from genomic data, and the construction of a comprehensive model correlating the history of biomolecules with the history of planet Earth.
Berendsen, H. J. and S. Hayward (2000). "Collective protein dynamics in relation to function." Curr Opin Struct Biol 10(2): 165-9. Several techniques for the analysis of the internal motions of proteins are available - separating large collective motions from small, presumably uninteresting motions. Such descriptions are helpful in the characterization of internal motions and provide insight into the energy landscape of proteins. The real challenge, however, is to relate large collective motions to functional properties, such as binding and regulation, or to folding. These issues have been recently addressed in several papers.
Bhattacharya, A., S. Bhattacharya, et al. (2000). "Identification of parasitic genes by computational methods." Parasitol Today 16(3): 127-31. A number of parasite genome projects are under way, and large amounts of nucleotide sequence data are becoming available for analysis. There is an urgent need for development of theoretical tools to analyze the genome data, including identification of protein-coding sequences. The majority of the methods developed to date require prior information about the genome before accurate predictions can be made. Because such information is not available for many parasites, these methods cannot be directly applied. In this article, Alok Bhattacharya and colleagues describe some of the gene-prediction methods commonly in use, and a new method, GeneScan, that they have developed for the analysis of parasite genomes.
Black, D. L. (2000). "Protein diversity from alternative splicing: a challenge for bioinformatics and post-genome biology." Cell 103(3): 367-70.
Blundell, T. L. and K. Mizuguchi (2000). "Structural genomics: an overview." Prog Biophys Mol Biol 73(5): 289-95.
Brazma, A. and J. Vilo (2000). "Gene expression data analysis." FEBS Lett 480(1): 17-24. Microarrays are one of the latest breakthroughs in experimental molecular biology, which allow monitoring of gene expression for tens of thousands of genes in parallel and are already producing huge amounts of valuable data. Analysis and handling of such data is becoming one of the major bottlenecks in the utilization of the technology. The raw microarray data are images, which have to be transformed into gene expression matrices--tables where rows represent genes, columns represent various samples such as tissues or experimental conditions, and numbers in each cell characterize the expression level of the particular gene in the particular sample. These matrices have to be analyzed further, if any knowledge about the underlying biological processes is to be extracted. In this paper we concentrate on discussing bioinformatics methods used for such analysis. We briefly discuss supervised and unsupervised data analysis and its applications, such as predicting gene function classes and cancer classification. Then we discuss how the gene expression matrix can be used to predict putative regulatory signals in the genome sequences. In conclusion we discuss some possible future directions.
Brenner, S. E. (2000). "Target selection for structural genomics." Nat Struct Biol 7 Suppl: 967-9. Structural genomics aims to use high-throughput structure determination and computational analysis to provide three-dimensional models of every tractable protein. The process of choosing proteins for experimental structure characterization is known as target selection. In this nomenclature, the targets are regions of proteins to be studied by crystallography or NMR. Selection of the targets is principally a computational process of restricting candidate proteins to those that are tractable and of unknown structure, and prioritizing according to expected interest and accessibility.
Broder, S. and J. C. Venter (2000). "Sequencing the entire genomes of free-living organisms: the foundation of pharmacology in the new millennium." Annu Rev Pharmacol Toxicol 40: 97-132. The power and effectiveness of clinical pharmacology are about to be transformed with a speed that earlier in this decade could not have been foreseen even by the most astute visionaries. In the very near future, we will have at our disposal the reference DNA sequence for the entire human genome, estimated to contain approximately 3.5 billion bp. At the same time, the science of whole genome sequencing is fostering the computational science of bioinformatics needed to develop practical applications for pharmacology and toxicology. Indeed, it is likely that pharmacology, toxicology, bioinformatics, and genomics will merge into a new branch of medical science for studying and developing pharmaceuticals from molecule to bedside.
Bull, A. T., A. C. Ward, et al. (2000). "Search and discovery strategies for biotechnology: the paradigm shift." Microbiol Mol Biol Rev 64(3): 573-606. Profound changes are occurring in the strategies that biotechnology-based industries are deploying in the search for exploitable biology and to discover new products and develop new or improved processes. The advances that have been made in the past decade in areas such as combinatorial chemistry, combinatorial biosynthesis, metabolic pathway engineering, gene shuffling, and directed evolution of proteins have caused some companies to consider withdrawing from natural product screening. In this review we examine the paradigm shift from traditional biology to bioinformatics that is revolutionizing exploitable biology. We conclude that the reinvigorated means of detecting novel organisms, novel chemical structures, and novel biocatalytic activities will ensure that natural products will continue to be a primary resource for biotechnology. The paradigm shift has been driven by a convergence of complementary technologies, exemplified by DNA sequencing and amplification, genome sequencing and annotation, proteome analysis, and phenotypic inventorying, resulting in the establishment of huge databases that can be mined in order to generate useful knowledge such as the identity and characterization of organisms and the identity of biotechnology targets. Concurrently there have been major advances in understanding the extent of microbial diversity, how uncultured organisms might be grown, and how expression of the metabolic potential of microorganisms can be maximized. The integration of information from complementary databases presents a significant challenge. Such integration should facilitate answers to complex questions involving sequence, biochemical, physiological, taxonomic, and ecological information of the sort posed in exploitable biology. The paradigm shift which we discuss is not absolute in the sense that it will replace established microbiology; rather, it reinforces our view that innovative microbiology is essential for releasing the potential of microbial diversity for biotechnology penetration throughout industry. Various of these issues are considered with reference to deep-sea microbiology and biotechnology.
Case, D. A. (2000). "Interpretation of chemical shifts and coupling constants in macromolecules." Curr Opin Struct Biol 10(2): 197-203. Recent developments in NMR spectroscopy, along with advances in computational techniques, have produced new approaches to the interpretation of chemical shifts and spin-spin coupling constants in biomolecules. Quantum chemical studies of useful accuracy are now becoming more routine and are increasingly being used in conjunction with experimental studies to map out expected structural patterns for peptides and oligonucleotides. Topics of recent special interest include spin couplings across hydrogen bonds and patterns of chemical shift anisotropies, in both diamagnetic and paramagnetic proteins.
Celis, J. E., M. Kruhoffer, et al. (2000). "Gene expression profiling: monitoring transcription and translation products using DNA microarrays and proteomics." FEBS Lett 480(1): 2-16. Novel and powerful technologies such as DNA microarrays and proteomics have made possible the analysis of the expression levels of multiple genes simultaneously both in health and disease. In combination, these technologies promise to revolutionize biology, in particular in the area of molecular medicine as they are expected to reveal gene regulation events involved in disease progression as well as to pinpoint potential targets for drug discovery and diagnostics. Here, we review the current status of these technologies and highlight some studies in which they have been applied in concert to the analysis of biopsy specimens.
Chakravarti, D. N., M. J. Fiske, et al. (2000). "Mining genomes and mapping proteomes: identification and characterization of protein subunit vaccines." Dev Biol (Basel) 103: 81-90. Currently, there is an extensive and unprecedented effort to obtain the complete nucleotide sequence of the complex genomes of many micro-organisms. In this post-genomic era, based on the availability of the entire genome sequence of an organism, three new disciplines of molecular biology have emerged: genomics, transcriptional profiling and proteomics. All these technologies have the potential to accelerate the process of identifying protective protein antigens as subunit vaccine targets as well as validating and extending the range of available candidate antigens. The progress of these technologies has led to the origination of the science of bioinformatics for management and critical evaluation of the large amount of information generated. Although genomics, transcriptional profiling and proteomics are each based on different principles, there is considerable synergy between them. Appropriate application of any one, or a combination of two or more of these approaches, coupled with bioinformatics, would allow identification of a short-list of vaccine candidates from the entire list of several hundreds to thousands of proteins encoded by the genome. These candidates would then require usual channelling through the subsequent process involving recombinant expression, purification and testing for immunogenicity and protective efficacy.
Chambers, G., L. Lawrie, et al. (2000). "Proteomics: a new approach to the study of disease." J Pathol 192(3): 280-8. The global analysis of cellular proteins has recently been termed proteomics and is a key area of research that is developing in the post-genome era. Proteomics uses a combination of sophisticated techniques including two-dimensional (2D) gel electrophoresis, image analysis, mass spectrometry, amino acid sequencing, and bio-informatics to resolve comprehensively, to quantify, and to characterize proteins. The application of proteomics provides major opportunities to elucidate disease mechanisms and to identify new diagnostic markers and therapeutic targets. This review aims to explain briefly the background to proteomics and then to outline proteomic techniques. Applications to the study of human disease conditions ranging from cancer to infectious diseases are reviewed. Finally, possible future advances are briefly considered, especially those which may lead to faster sample throughput and increased sensitivity for the detection of individual proteins.
Cordon-Cardo, C., R. J. Cote, et al. (2000). "Genetic and molecular markers of urothelial premalignancy and malignancy." Scand J Urol Nephrol Suppl(205): 82-93. The molecular genetic changes reported in bladder tumors can be classified as primary and secondary aberrations. Primary molecular alterations may be defined as those directly related to the genesis of cancer. These are frequently found as the sole abnormality and are often associated with particular tumors. There are characteristic primary abnormalities involved in th production of low-grade/well-differentiated neoplasms, which destabilize cellular proliferation but have little effect on cellula "social" interactions or differentiation, as well as the rate of cell death or apoptosis. Other molecular events lead to high-grad neoplasms which disrupt growth control, including the cell cycle and apoptosis, and which have a major impact on biological behavior. A primary target leading to low-grade papillary superficial bladder tumors resides on chromosome 9, while p53 gene alterations are commonly seen in flat carcinoma in situ. Other molecular alterations must be elucidated, as many non-invasive neoplasms have neither chromosome 9 nor p53 alterations. Novel approaches utilizing tissue microdissection techniques an molecular genetic assays are needed to shed further light on this subject. Secondary genetic or epigenetic abnormalities may be fortuitous, or may determine the biological behavior of the tumor. Multiple molecular abnormalities are identified in most human cancers studied, including bladder neoplasms. The accumulation, rather than the order, of these genetic alterations may be the critical factor that grants synergistic activity. In this regard, it is noteworthy that many of the genes that are altered act upon the two recognized critical growth and senescenc pathways, TP53 and RB. These particular molecular aberrations may be especially important to evaluate for their use in the management of bladder cancer because of their commonality in progressive forms of the disease. Thus, clinical trials are underway to explore their use in specific situations, particularly in the surgical management of locally advanced disease, and to determine whether adjuvant chemotherapy in such patients may be of benefit. The use of molecular alterations in the management of non-invasive bladder neoplasms remains to be firmly established. Our knowledge of molecular alterations important in bladder cancer progression is far from complete, and further study is necessary to further elucidate cruci pathways involved in progression and therapeutic response. As per preneoplastic conditions, difficulties in identifying and interpreting the significance of phenotypic changes have imposed certain limitations, as has an evolving nomenclature and issues of reproducibility in interpreting morphologica criteria. Nevertheless, molecular alterations involving chromosome 9q and the INK4A locus in papillary superficial tumors vs changes in chromosomes 14q and 8q, p53 and RB in flat carcinoma in situ lesions may indicate a molecular basis for early events that lead to varying pathways in urothelial tumorigenesis. Studies aimed at revealing the clinical relevance of genet instability, as well as molecular or epigenetic alterations, in urothelium and preneoplastic lesions of otherwise morphologicall normal appearance are needed to further advance knowledge in the field. Clinical advances in bladder cancer will be facilitated by novel animal models paralleling the human disease. Molecular diagnostics, particularly specific antigen expression, fluorescence in situ hybridization and microsatellite analyses, have show great promise as screening and follow-up methodologies, and may supplement urine cytology in the diagnosis and characterization of new and recurrent disease. In addition, the use of high-throughput genomic/proteomic assays, linked to comprehensive databases, and coupled with robust bioinformatics will be key elements in elucidating the components of regulatory and signaling pathways involved in bladder tumorigenesis and cancer progression.
Cuticchia, A. J. (2000). "Recent advances in bioinformatics in the medical research environment and applications to the study of skin diseases." J Cutan Med Surg 4(3): 169-73. BACKGROUND: The computer has become increasingly intertwined in society for the past 30 years. Within the academic health science centre, there is an increasing need for researchers to become skilled at using the Internet as a mechanism for the retrieval of scientific results and the underlying data. The discipline of bioinformatics, which uses computer technology to provide answers to biological questions, has been expanding in scope and utility for the past decade. Increasing numbers of research groups have been investing in bioinformatics infrastructure to aid in the research process. These continuing investments have led to the establishment for the first time of a supercomputing facility within a hospital. Such computational power is being used for the mapping of genes and the study of human disease. OBJECTIVE: A discussion of the increasing role of computational biology in the research environment of the clinician scientist is presented here. CONCLUSIONS: Though the investment in a supercomputer may not be possible in most research settings, several less expensive alternatives relying on existing desktop computers can provide supercomputer-like performance within nearly any environment.
de Wolf, F. A. and G. M. Brett (2000). "Ligand-binding proteins: their potential for application in systems for controlled delivery and uptake of ligands." Pharmacol Rev 52(2): 207-36. Unstable or harmful agents, such as drugs, vitamins, flavors, pheromones, and catalysts, for use in pharmaceutics, personal care, functional foods, crop protection, laboratories, offices, and industrial processes, require stabilization against oxidation and degradation or shielding from sensitive environments. Therefore, binding them to carriers with high affinity and selectivity for targeting to the right environment and subsequent controlled release is beneficial, especially if this allows improved control of (stimulus-induced) release. Proteins often possess one or more of these properties, whereas modern biotechnology and bioinformatics provide an increasing number of tools to engineer and adapt these properties. Carrier systems are now developed that incorporate proteins as the central ligand-binding component, e.g., lectins for glucose-triggered release of glycosylated insulin and bispecific antibodies for brain targeting of drugs, but ligand-binding proteins can potentially be used in many other applications. Collectively, the proteins available in nature bind an impressive variety of ligands and non-natural analogs. In this light, various ligand-binding protein classes are surveyed, including biotin-, lipid-, immunosuppressant-, insect pheromone-, phosphate-, and sulfate-binding proteins, as well as bacterial periplasmic proteins, lectins, serum albumins, immunoglobulins, and inactivated enzymes. Disadvantages, such as enzymatic degradation or immunogenicity, associated with the pharmaceutical use of certain proteins can be avoided by incorporating these proteins in more complex carrier and targeting systems. In other applications, this may not be necessary. The enclosure of high-affinity (potentially stimulus-sensitive) binding proteins within an envelope that acts as a diffusion barrier for the ligand may provide excellent slow release. Many possibilities seem to be as yet unexplored.
Degtyarenko, K. (2000). "Bioinorganic motifs: towards functional classification of metalloproteins." Bioinformatics 16(10): 851-64. The habitat of bioinorganic motifs (BIMs) is at the interface of biological inorganic chemistry and bioinformatics. BIM is defined as a common structural feature shared by functionally related, but not necessarily homologous, proteins, and consisting of the metal atom(s) and first coordination shell ligands. BIMs appear to be suitable for classification of metal centres at any level, from groups of unrelated proteins with similar function to different functional states of the same protein, and for description of possible evolutionary relationships of metalloproteins. However, they have not attracted wide attention from the bioinformatics community. Although their presence is appreciated, they are difficult to predict-therefore the current 'high-throughput' initiatives are likely to miss or ignore them altogether. The protein sequence databases do not distinguish between proteins containing different prosthetic groups (unless they have different sequences) or between apo- and holoprotein. On the other hand, the protein structure databases include data on 'hetero compounds' of various origin but these data are often inconsistent. A number of specialized databases dealing with BIMs and attempts to classify them are reviewed. SUPPLEMENTARY INFORMATION: The additional bibliography and list of Internet resources on bioinorganic chemistry are available at http://www.ebi.ac.uk/ approximately kirill/biometal/
Durand, P., S. Fabrega, et al. (2000). "Structural features of normal and mutant human lysosomal glycoside hydrolases deduced from bioinformatics analysis." Hum Mol Genet 9(6): 967-77. Lysosomal storage diseases are due to inherited deficiencies in various enzymes involved in basic metabolic processes. As with other genetic diseases, accurate structure data for these enzymatic proteins should help in better understanding the molecular effects of mutations identified in patients with the corresponding lysosomal diseases; however, no such three-dimensional (3D) structure data are available for many lysosomal enzymes. Thus, we herein intend to illustrate for an audience of molecular geneticists how structure information can nonetheless be obtained via a bioinformatics approach in the case of five human lysosomal glycoside hydrolases. Indeed, using the two-dimensional hydrophobic cluster analysis method to decipher the sequence information available in data banks for the large group of glycoside hydrolases (clan GH-A) to which these human lysosomal enzymes belong, we could deduce structure predictions for their catalytic domains and propose explanations for the molecular effects of mutations described in patients. In addition, in the case of human beta-glucuronidase for which experimental 3D data have been reported, we also show here that bioinformatics methods relying on the available 3D structure information can be used to obtain further insights into the effects of various mutations described in patients with Sly disease. In a broader perspective, our work stresses that, in the context of a rapid increase in protein sequence information through genome sequencing, bioinformatics approaches might be highly useful for generating structure-function predictions based on sequence-structure interrelationships.
Dutt, M. J. and K. H. Lee (2000). "Proteomic analysis." Curr Opin Biotechnol 11(2): 176-9. The field of proteomics is becoming increasingly important as genome sequences are being completed and annotated. Recent advances in proteomics include experimental and mathematical proofs of the need to complement microarray analysis with protein analysis, improved sensitivity for mass spectrometric analysis of separated proteins, better informatic tools for gel analysis and protein spot annotation, first steps towards automated experimental procedures, and new technology for quantitation of protein changes.
Ebstein, R. P., J. Benjamin, et al. (2000). "Personality and polymorphisms of genes involved in aminergic neurotransmission." Eur J Pharmacol 410(2-3): 205-214. Genetic factors significantly contribute to the determination of human personality traits assessed by self-report questionnaires. However, only in the past few years have common genetic polymorphisms especially the dopamine D4 receptor and the serotonin transporter promoter region been associated with specific personality traits such as novelty seeking and harm avoidance, respectively. The effects of these genes are modest and several genes are likely accounting for individual differences in personality dimensions that can be attributed to genetic factors. Molecular genetic studies of adult personality have also been extended to investigations of early human temperament and some of the genes associated with adult personality traits are also contributing to the earliest developmental expressions of human behavior. Additionally, some of these same genes have also been implicated in various types of abnormal behavior including addiction, obsessive-compulsive disorder, attention deficit, depression, aggression and psychosis. Future research directions will no doubt take advantage of the bioinformatics revolution coinciding with the completion of the first phase of the human genome project. It should soon be possible to identify many of the genes contributing to specific personality traits and to better define their role in determining normal and abnormal behavior from early development through adulthood.
Einarson, M. B. and E. A. Golemis (2000). "Encroaching genomics: adapting large-scale science to small academic laboratories." Physiol Genomics 2(3): 85-92. The process of conducting biological research is undergoing a profound metamorphosis due to the technological innovations and torrent of information resulting from the execution of multiple species genome projects. The further tasks of mapping polymorphisms and characterizing genome-wide protein-protein interaction (the characterization of the proteome) will continue to garner resources, talent, and public attention. Although some elements of these whole genome size projects can only be addressed by large research groups, consortia, or industry, the impact of these projects has already begun to transform the process of research in many small laboratories. Although the impact of this transformation is generally positive, laboratories engaged in types of research destined to be dominated by the efforts of a genomic consortium may be negatively impacted if they cannot rapidly adjust strategies in the face of new large-scale competition. The focus of this report is to outline a series of strategies that have been productively utilized by a number of small academic laboratories that have attempted to integrate such genomic resources into research plans with the goal of developing novel physiological insights.
Eisenberg, D., E. M. Marcotte, et al. (2000). "Protein function in the post-genomic era." Nature 405(6788): 823-6. Faced with the avalanche of genomic sequences and data on messenger RNA expression, biological scientists are confronting a frightening prospect: piles of information but only flakes of knowledge. How can the thousands of sequences being determined and deposited, and the thousands of expression profiles being generated by the new array methods, be synthesized into useful knowledge? What form will this knowledge take? These are questions being addressed by scientists in the field known as 'functional genomics'.
Eng, F. J. and S. L. Friedman (2000). "Fibrogenesis I. New insights into hepatic stellate cell activation: the simple becomes complex." Am J Physiol Gastrointest Liver Physiol 279(1): G7-G11. Hepatic stellate cell activation is a complex process. Paradoxes and controversies include the origin(s) of hepatic stellate cells, the regulation of membrane receptor signaling and transcription, and the fate of the cells once liver injury resolves. Major themes have emerged, including the dominance of autocrine signaling and the identification of counterregulatory stimuli that oppose key features of activated cells. Advances in analytical methods including proteomics and gene array, coupled with powerful bioinformatics, promise to revolutionize how we view cellular responses. Our understanding of stellate cell activation is likely to benefit from these advances, unearthing modes of regulating cellular behavior that are not even conceivable on the basis of current paradigms.
Fagan, R. and M. Swindells (2000). "Bioinformatics, target discovery and the pharmaceutical/biotechnology industry." Curr Opin Mol Ther 2(6): 655-61. With the first draft of the human genome now available a directed genome-wide mining strategy is being implemented by many pharmaceutical and biotechnology companies in order to identify novel members of the most therapeutically relevant target families. At the same time there is an increasing amount of annotation relevant to the human genome sequence entering into the public domain. The ability to identify protein families on a genome-wide scale can only be done at speed by using high-throughput computational approaches. This review describes many of the latest algorithmic developments in this field and shows how they can be best put to use for target identification and prioritization.
Fickett, J. W. and W. W. Wasserman (2000). "Discovery and modeling of transcriptional regulatory regions." Curr Opin Biotechnol 11(1): 19-24. A complex network of regulatory controls governs the patterns of gene expression. Enabled by the tools of molecular cloning, initial experimental queries into the gene regulatory network elucidated a wide array of transcription factors and their cognate binding sites from hundreds of genes. The recent fusion of genome-scale experimental tools, a more comprehensive gene catalog, and concomitant advances in computational methodology, has extended the range of questions being posed. The potential to further our understanding of the biochemical mechanisms of transcriptional regulation and to accelerate the delineation of regulatory control regions in the human genome is enormous.
Fiechter, A. (2000). "Biotechnology in Switzerland and a glance at Germany." Adv Biochem Eng Biotechnol 69: 175-208. The roots of biotechnology go back to classic fermentation processes, which starting from spontaneous reactions were developed by simple means. The discovery of antibiotics made contamination-free bioprocess engineering indispensable, which led to a further step in technology development. On-line analytics and the use of computers were the basis of automation and the increase in quality. On both sides of the Atlantic, molecular biology emerged at the same time, which gave genetic engineering in medicine, agriculture, industry and environment new opportunities. The story of this new advanced technology in Switzerland, with a quick glance at Germany, is followed back to the post-war years. The growth of research and teaching and the foundation of the European Federation of Biotechnology (EFB) are dealt with. The promising phase of the 1960s and 1970s soon had to give way to a restrictive policy of insecurity and anxiousness, which, today, manifests itself in the rather insignificant contributions of many European countries to the new sciences of genomics, proteomics and bioinformatics, as well as in the resistance to the use of transgenic agricultural crops and their products in foods.
Foster, C. B. and S. J. Chanock (2000). "Mining variations in genes of innate and phagocytic immunity: current status and future prospects." Curr Opin Hematol 7(1): 9-15. The large number of sequence variations in human genes reflects the diversity of human populations and the response to prior environmental and pathogen challenges. Currently, major efforts are under way to identify and catalog single-nucleotide polymorphisms for use in genetic studies designed to explore the contribution of common variants to both disease susceptibility and interindividual differences in outcomes. So far, the most productive approach has been to search with candidate genes for which there is a scientific rationale (eg, prior data on the biologic implications of one or more variant alleles). Recently, there has been an explosion in the number of genetic association studies seeking to correlate one or more well-defined outcomes with variant alleles. These studies provide a foundation for identifying and applying genetic risk factors to clinical medicine. However, a number of challenges must be met before widespread use in clinical medicine can be undertaken. These include more efficient bioinformatics, basic insights into the significance of the variant alleles, ethical issues, and the availability of cost-effective, high-throughput platforms for genotype analysis.
Fujita, Y. (2000). "[A new approach to pharmacogenomics]." Nippon Yakurigaku Zasshi 116(3): 149-57. The medicine in the 21st century will be so called "evidence based medicine" or "personalized medicine," based on the principle of "right drug to right patient." Pharmacogenomics covers the entire spectrum of genes that determines drug behavior and sensitivity, and we anticipate it will bring major impact on the healthcare system as well as the drug discovery process in the near future. Three waves of genomic impact are predicted to arise as follows: The first wave will hit on existing drugs and late-phase development candidates within the next 2-3 years, aiming to minimize the risks in clinical trials (adverse events, resistance, etc.). The wave will then affect the candidate selection process in the early pre-development stage, and finally the disease gene finding to target discovery process. The driving force will be technologies such as SNPs database, differential gene expression (DGE) analysis, proteomics, serial analysis of gene expression (SAGE) and bioinformatics. This new approach of genomic discovery (so called "integrated approach") requires knowledge on how to implement and integrate new valuable technologies from an early stage of the discovery process. The implication of SNPs, high throughput proteomics and application of structural genomics will be the key issues in the pharmacogenomics era.
Gelfand, M. S., P. S. Novichkov, et al. (2000). "Comparative analysis of regulatory patterns in bacterial genomes." Brief Bioinform 1(4): 357-71. Recognition of transcription regulatory sites in bacterial genomes is a notoriously difficult problem.There are no algorithms capable of making reliable predictions even for well-studied sites such as the CRP (cyclic AMP receptor protein) box. However, availability of complete bacterial genomes makes it possible to make reliable predictions with bad rules.This comparative approach is based on the assumption that sets of co-regulated genes are conserved in related bacteria.Thus true sites occur upstream of orthologous genes, whereas false candidates are scattered at random. This means not only that knowledge about regulation in well-studied genomes can be transferred to newly sequenced ones, but also that new members of regulons can be found.This paper reviews several recent studies. In particular, a detailed analysis of catabolite repression in gamma-purple bacteria is presented.
Gerstein, M. and R. Jansen (2000). "The current excitement in bioinformatics-analysis of whole-genome expression data: how does it relate to protein structure and function?" Curr Opin Struct Biol 10(5): 574-84. Whole-genome expression profiles provide a rich new data-trove for bioinformatics. Initial analyses of the profiles have included clustering and cross-referencing to 'external' information on protein structure and function. Expression profile clusters do relate to protein function, but the correlation is not perfect, with the discrepancies partially resulting from the difficulty in consistently defining function. Other attributes of proteins can also be related to expression-in particular, structure and localization-and sometimes show a clearer relationship than function.
Geuna, S. (2000). "Appreciating the difference between design-based and model-based sampling strategies in quantitative morphology of the nervous system." J Comp Neurol 427(3): 333-9. Quantitative morphology of the nervous system has undergone great developments over recent years, and several new technical procedures have been devised and applied successfully to neuromorphological research. However, a lively debate has arisen on some issues, and a great deal of confusion appears to exist that is definitely responsible for the slow spread of the new techniques among scientists. One such element of confusion is related to uncertainty about the meaning, implications, and advantages of the design-based sampling strategy that characterize the new techniques. In this article, to help remove this uncertainty, morphoquantitative methods are described and contrasted on the basis of the inferential paradigm of the sampling strategy: design-based vs model-based. Moreover, some recommendations are made to help scientists judge the appropriateness of a method used for a given study in relation to its specific goals. Finally, the use of the term stereology to label, more or less expressly, only some methods is critically discussed.
Giegerich, R. (2000). "A systematic approach to dynamic programming in bioinformatics." Bioinformatics 16(8): 665-77. MOTIVATION: Dynamic programming is probably the most popular programming method in bioinformatics. Sequence comparison, gene recognition, RNA structure prediction and hundreds of other problems are solved by ever new variants of dynamic programming. Currently, the development of a successful dynamic programming algorithm is a matter of experience, talent and luck. The typical matrix recurrence relations that make up a dynamic programming algorithm are intricate to construct, and difficult to implement reliably. No general problem independent guidance is available. RESULTS: This article introduces a systematic method for constructing dynamic programming solutions to problems in biosequence analysis. By a conceptual splitting of the algorithm into a recognition and an evaluation phase, algorithm development is simplified considerably, and correct recurrences can be derived systematically. Without additional effort, the method produces an early, executable prototype expressed in a functional programming language. The method is quite generally applicable, and, while programming effort decreases, no overhead in terms of ultimate program efficiency is incurred.
Graf, W. D. and O. E. Oleinik (2000). "The study of neural tube defects after the Human Genome Project and folic acid fortification of foods." Eur J Pediatr Surg 10 Suppl 1: 9-12. The implementation of folic acid fortification will eliminate a proportion of neural tube defects (NTD). As a result, the etiologic and clinical profiles of the developmental disorder may both change. In the assessment of NTD as it evolves, the bioinformatics structure and content of the Human Genome Project will find vital application. One important development will be an enhanced understanding of the role of folic acid in global regulation of gene expression through epigenetic processes. In addition, bioinformatics will facilitate coordination of research in the basic sciences with clinical investigations to better define remaining etiologic factors.
Graf, W. D. (2000). "Can bioinformatics help trace the steps from gene mutation to disease?" Neurology 55(3): 331-3.
Gutierrez, J. A. (2000). "Genomics: from novel genes to new therapeutics in parasitology." Int J Parasitol 30(3): 247-52. The advent of rapid DNA sequencing technologies is generating vast quantities of raw genomic information ranging from in-depth analysis of the expressed genes to complete sequencing of genomes at an increasing rate (bioinformatics). However, it is the functional characterisation of a specific gene product that is the key limiting factor for validation as targets for high throughput assay development. The challenge is to obtain the raw genomic information from parasites of economic importance and to effectively integrate broad technologies such as gene disruption and over-expression, DNA arrays, proteomics, antisense RNAs, with bioinformatics in a timely fashion to identify relevant biological targets. Screening of validated targets in a strategy that includes large numbers of chemistries with high diversity and predictive in vitro and in vivo assays should permit the successful identification of novel chemical entities with high specificity to the target parasite. It is proposed that this rational approach will permit the identification of new antiparasitic therapies able to surpass the current toxicological, environmental, and economic challenges of the marketplace.
Harris, N. L. (2000). "Annotating sequence data using Genotator." Mol Biotechnol 16(3): 221-32. In this postgenomic era, it is no longer necessary to argue the need for automated methods for sequence annotation. Many researchers have designed tools for analyzing DNA sequences, but running multiple tools and interpreting the results can be tedious and confusing. In the last few years, many analysis workbenches have been developed to help streamline the process of sequence annotation. Genotator, developed in 1996, is still a popular choice owing to its ease of use and its configurability. This article will review annotating sequence data using the Genotator.
Hassan, A. and H. S. Markus (2000). "Genetics and ischaemic stroke." Brain 123 ( Pt 9): 1784-812. Ischaemic stroke can be caused by a number of monogenic disorders, and in such cases stroke is frequently part of a multisystem disorder. Cerebral autosomal dominant arteriopathy with subcortical infarcts and leucoencephalopathy (CADASIL), due to mutations in the NOTCH: 3 gene, is increasingly appreciated as a cause of familial subcortical stroke. The genetics and phenotypes of monogenic stroke are covered in this review. However, the majority of cases of ischaemic stroke are multifactorial in aetiology. Strong evidence from epidemiological and animal studies has implicated genetic influences in the pathogenesis of multifactorial ischaemic stroke, but the identification of individual causative mutations remains problematic; this is in part limited by the number of approaches currently available. In addition, genetic influences are likely to be polygenic, and ischaemic stroke itself consists of a number of different phenotypes which may each have different genetic profiles. Almost all human studies to date have employed a candidate gene approach. Associations with polymorphisms in a variety of candidate genes have been investigated, including haemostatic genes, genes controlling homocysteine metabolism, the angiotensin-converting enzyme gene, and the endothelial nitric oxide synthase gene. The results of these studies, and the advantages and limitations of the candidate gene approach, are presented. The recent biological revolution, spurred by the human genome project, promises the advent of novel technologies supported by bioinformatics resources that will transform the study of polygenic disorders such as stroke. Their potential application to polygenic ischaemic stroke is discussed.
Hirt, H. (2000). "MAP kinases in plant signal transduction." Results Probl Cell Differ 27: 1-9. Mitogen-activated protein kinase (MAPK) pathways are modules involved in the transduction of extracellular signals to intracellular targets in all eukaryotes. Distinct MAPK pathways are regulated by different extracellular stimuli and are implicated in a wide variety of biological processes. In plants there is evidence for MAPKs playing a role in the signaling of abiotic stresses, pathogens, plant hormones, and cell cycle cues. The large number and divergence of plant MAPKs indicates that this ancient mechanism of bioinformatics is extensively used in plants and their study promises to give molecular answers to old questions.
Horrocks, P., S. Bowman, et al. (2000). "Entering the post-genomic era of malaria research." Bull World Health Organ 78(12): 1424-37. The sequencing of the genome of Plasmodium falciparum promises to revolutionize the way in which malaria research will be carried out. Beyond simple gene discovery, the genome sequence will facilitate the comprehensive determination of the parasite's gene expression during its developmental phases, pathology, and in response to environmental variables, such as drug treatment and host genetic background. This article reviews the current status of the P. falciparum genome sequencing project and the unique insights it has generated. We also summarize the application of bioinformatics and analytical tools that have been developed for functional genomics. The aim of these activities is the rational, information-based identification of new therapeutic strategies and targets, based on a thorough insight into the biology of Plasmodium spp.
Hsiao, L. L., R. L. Stears, et al. (2000). "Prospective use of DNA microarrays for evaluating renal function and disease." Curr Opin Nephrol Hypertens 9(3): 253-8. At the forefront of the revolution in human genomics is DNA microarray technology, which evaluates expression levels or genotypes of thousands of genes simultaneously, by means of miniaturization and parallel processing. Furthermore, advances in bioinformatics will result in the creation of large databases, which will require complex software programming for structural analysis. Over the next decade, DNA microarrays, combined with sophisticated informatics and genomic databases, will provide molecular fingerprints of disease processes and prognoses. This review provides an update on DNA microarray technology and its application to renal diseases.
Jain, K. K. (2000). "Applications of biochip and microarray systems in pharmacogenomics." Pharmacogenomics 1(3): 289-307. A DNA microarray system is usually comprised of DNA probes formatted on a microscale on a glass surface (chip), plus the instruments needed to handle samples (automated robotics), to read the reporter molecules (scanners) and analyse the data (bioinformatic tools). Biochips are formed by in situ (on chip) synthesis of oligonucleotides or peptide nucleic acids (PNAs) or spotting of DNA fragments. Hybridisation of RNA- or DNA-derived samples on chips allows the monitoring of expression of mRNAs or the occurrence of polymorphisms in genomic DNA. Basic types of DNA chips are the sequencing chip, the expression chip and chips for comparative genomic hybridisation. Advanced technologies used in automated microarray production are photolithography, mechanical microspotting and ink jets. Bioelectronic microchips contain numerous electronically active microelectrodes with specific DNA capture probes linked to the electrodes through molecular wires. Several biosensors have been used in combination with biochips. PNA biosensors commonly rely on the immobilisation of a single-stranded DNA sequence (the 'probe') onto a transducer surface for hybridisation with the complementary ('target') strand to give a suitable electrical signal. Other sensors are cell-based immunobiosensors with engineered molecular recognition, integrated biosensors based on phototransistor integrated circuits and sensors based on surface plasmon resonance. Microarray technologies offer enormous savings in time and labour as compared to standard gel-based microsatellite methods. Reading of the information and its management by bioinformatics is necessary because of the enormous amount of data generated by the various technologies using microarrays. Standardised procedures are essential for compatible data production, quality control and analysis. Expression monitoring is the most biologically informative application of this technology at present. Microarray technology has important applications in pharmacogenomics: drug discovery and development, drug safety and molecular diagnostics. DNA chips will facilitate the integration of diagnosis and therapeutics, as well as the introduction of personalised medicines.
Jan van Wijk, K. (2000). "Proteomics of the chloroplast: experimentation and prediction." Trends Plant Sci 5(10): 420-5. New technologies, in combination with increasing amounts of plant genome sequence data, have opened up incredible experimental possibilities to identify the total set of chloroplast proteins (the chloroplast proteome) as well as their expression levels and post-translational modifications in a global manner. This is summarized under the term 'proteomics' and typically involves two-dimensional electrophoresis or chromatography, mass spectrometry and bioinformatics. Complemented with nucleotide-based global techniques, proteomics is expected to provide many new insights into chloroplast biogenesis, adaptation and function.
Johnson, J. E. and W. Chiu (2000). "Structures of virus and virus-like particles." Curr Opin Struct Biol 10(2): 229-35. Virus structures continue to be the basis for mechanistic virology and serve as a paradigm for solutions to problems concerning macromolecular assembly and function in general. The use of X-ray crystallography, electron cryomicroscopy and computational and biochemical methods has provided not only details of the structural folds of individual viral components, but also insights into the structural basis of assembly, nucleic acid packaging, particle dynamics and interactions with cellular molecules.
Kaminski, N. (2000). "Bioinformatics. A user's perspective." Am J Respir Cell Mol Biol 23(6): 705-11. This review provides an overview of bioinformatics from the user's point of view. Bioinformatics, defined as the application of computers, databases, and computational methods to the management of biologic information, is essential for almost every aspect of data management in modern biology. The rapid accumulation of genomic sequence information together with the wide availability of new technologies that analyze global gene expression patterns have created an information overload. Molecular biology labs are increasingly dependent on computers, large-capacity databases, search and analysis tools, and high-quality Internet connections. Currently available bioinformatics tools are discussed and a general approach is outlined. Using the resources and approaches in this review, readers should be able to form their own view of bioinformatics and tailor the solutions to the information overload according to their needs.
Kato, R. (2000). "[Actual situation and perspective of novel drug discovery]." Tanpakushitsu Kakusan Koso 45(6 Suppl): 763-75.
Kellner, R. (2000). "Proteomics. Concepts and perspectives." Fresenius J Anal Chem 366(6-7): 517-24. Within the last five years the field of proteomics has changed the understanding of molecular biology. Proteins manifest physiological as well as pathophysiological processes in a cell or an organism, and proteomics describes the complete protein inventory in dependence on in vivo parameters. Disease mechanism or drug effects both affect a protein profile and, vice versa, characterising protein profiles reveals information for the understanding of disease and therapy. Analytical methods for proteomics are based on conventional tools for protein characterisation. The technical challenge is the complete coverage of physico-chemical properties for thousands of proteins. Nucleic acids display a relative chemical homogeneity and therefore genomics was considered more promising in the past than proteomics. Further improvements in proteomics technologies will likely change this course with proteomics complementing genomics as a tool to study life sciences.
Kennedy, G. C. (2000). "The impact of genomics on therapeutic drug development." Exs 89: 1-10. Genomics can be defined as a set of related technologies that are focused on the discovery of genes implicated in human disease. Although many of the estimated 100,000 genes in the human genome have been at least partially identified by nucleotide sequence, elucidation of biological function has been achieved for only a small percentage of these. An even smaller percentage of genes discovered by these methodologies have become valid drug targets. This review discusses the various genomics technologies and their likelihood of yielding therapeutic drugs. Emerging advances in microarray "chip" technology have allowed the parallel analysis of gene expression patterns for thousands of genes simultaneously. Sequence information derived from the genomes of many individuals is leading to the rapid discovery of single nucleotide polymorphisms or SNPs. Detection of these human polymorphisms will fuel the discipline of pharmacogenomics, resulting in an increase in the success of clinical trials, the rescue of drugs that have previously failed in clinical trials because of adverse reactions from patient subpopulations, and ultimately, in the development of more personalized drug therapies. The impending identification of all human genes will signal the end of the structural genomics phase and usher in the function genomics phase. Technologies have already begun to move toward high-throughput elucidation of gene relationships, interactions and, it is hoped, toward their functions.
Ladunga, I. (2000). "Large-scale predictions of secretory proteins from mammalian genomic and EST sequences." Curr Opin Biotechnol 11(1): 13-8. Machine learning techniques have improved predictions of secretory proteins from protein, genomic and expressed sequence tag (EST) sequences. Artificial neural networks, physical sequence analysis using high-performance optimization, and hidden Markov models identify extremely variable signal peptides (the vehicles of protein transport across the endoplasmic reticulum membrane), transmembrane segments, and specific extracellular and intracellular domains as indicators of possible roles in the intercellular and intracellular chemical signaling pathways. The major role of peptide hormones, blood coagulation factors, carcinogenesis agents, and other secretory proteins in orchestrating multicellular life indicates pharmacological potential in the cure of major diseases and numerous biotechnological applications.
Landro, J. A., I. C. Taylor, et al. (2000). "HTS in the new millennium: the role of pharmacology and flexibility." J Pharmacol Toxicol Methods 44(1): 273-89. Over the past decade, high throughput screening (HTS) has become the focal point for discovery programs within the pharmaceutical industry. The role of this discipline has been and remains the rapid and efficient identification of lead chemical matter within chemical libraries for therapeutics development. Recent advances in molecular and computational biology, i.e., genomic sequencing and bioinformatics, have resulted in the announcement of publication of the first draft of the human genome. While much work remains before a complete and accurate genomic map will be available, there can be no doubt that the number of potential therapeutic intervention points will increase dramatically, thereby increasing the workload of early discovery groups. One current drug discovery paradigm integrates genomics, protein biosciences and HTS in establishing what the authors refer to as the "gene-to-screen" process. Adoption of the "gene-to-screen" paradigm results in a dramatic increase in the efficiency of the process of converting a novel gene coding for a putative enzymatic or receptor function into a robust and pharmacologically relevant high throughput screen. This article details aspects of the identification of lead chemical matter from HTS. Topics discussed include portfolio composition (molecular targets amenable to small molecule drug discovery), screening file content, assay formats and plating densities, and the impact of instrumentation on the ability of HTS to identify lead chemical matter.
Larsen, M. R. and P. Roepstorff (2000). "Mass spectrometric identification of proteins and characterization of their post-translational modifications in proteome analysis." Fresenius J Anal Chem 366(6-7): 677-90. High-throughput DNA sequencing has resulted in increasing input in protein sequence databases. Today more than 20 genomes have been sequenced and many more will be completed in the near future, including the largest of them all, the human genome. Presently, sequence databases contain entries for more than 425.000 protein sequences. However, the cellular functions are determined by the set of proteins expressed in the cell--the proteome. Two-dimensional gel electrophoresis, mass spectrometry and bioinformatics have become important tools in correlating the proteome with the genome. The current dominant strategies for identification of proteins from gels based on peptide mass spectrometric fingerprinting and partial sequencing by mass spectrometry are described. After identification of the proteins the next challenge in proteome analysis is characterization of their post-translational modifications. The general problems associated with characterization of these directly from gel separated proteins are described and the current state of art for the determination of phosphorylation, glycosylation and proteolytic processing is illustrated.
Lee, P. S. and K. H. Lee (2000). "Genomic analysis." Curr Opin Biotechnol 11(2): 171-5. Advances in genomic analysis include improved technology for DNA sequencing, routine use of DNA microarray technology for the analysis of gene expression profiles at the mRNA level and improved informatic tools to organize and analyze such data. At the same time, new developments in chip-based analysis of samples and the emergence of models of gene networks hold promise for the future of the 'Genomic Era'.
Lengauer, T. and R. Zimmer (2000). "Protein structure prediction methods for drug design." Brief Bioinform 1(3): 275-88. Along the long path from genomic data to a new drug, the knowledge of three-dimensional protein structure can be of significant help in several places.This paper points out such places, discusses the virtues of protein structure knowledge and reviews bioinformatics methods for gaining such knowledge on the protein structure.
Loferer, H. (2000). "Mining bacterial genomes for antimicrobial targets." Mol Med Today 6(12): 470-4. The elucidation of whole-genome sequences is expected to have a revolutionary impact on the discovery of novel medicines. With the availability of complete genome sequences of more than 30 different species, the field of antimicrobial drug discovery has the opportunity to access a remarkable diversity of genomic information. In this review, I summarize how microbial genomics has changed strategies of drug discovery by applying bioinformatics, novel genetic approaches and genomics-based technologies, including analysis of gene expression using DNA microarrays.
Mayer, K. F., K. Lemcke, et al. (2000). "Arabidopsis genome analysis as exemplified by analysis of chromosome 4." Brief Bioinform 1(4): 389-97. During the last decade the small cruciferous plant Arabidopsis thaliana has become a model organism for flowering plants. Sequencing and analysis of the Arabidopsis genome is nearing completion. Beside an overview on methods and strategies for Arabidopsis genome analysis, a summary of the results from the first analysis is presented.This includes an overview on chromosomal organisation and topological features as well as a first comparison with other genomes.
Michelson, S. and K. Joho (2000). "Drug discovery, drug development and the emerging world of pharmacogenomics: prospecting for information in a data-rich landscape." Curr Opin Mol Ther 2(6): 651-4. Drug development is a very expensive and inefficient process. Currently, it takes on average 15 years and costs approximately US $500 million to bring a new drug to market, with the pharmaceutical industry spending more than US $20 billion in identifying and developing drugs in 1998. Twenty-two percent of this total was spent on screening assays and toxicity testing. Yet the rapidly accelerating advances in high-throughput technologies, including screening and robotics, combinatorial chemistry, and genomics makes this an extremely data-rich environment. Add to that the new paradigms of pharmacogenomics and 'customized medicine', and the question is, are we helping or hurting our cause? Clearly, interpreting this flood of data and turning it into useful information is our next great hurdle. By extending the pharmacogenomic paradigm to the drug discovery process, this paper intends to put the scope of the problem into context.
Montgomery, D. L. (2000). "Tuberculosis vaccine design: influence of the completed genome sequence." Brief Bioinform 1(3): 289-96. Tuberculosis continues to be a major health problem, with more adults dying from Mycobacterium tuberculosis than any other pathogen world-wide.With the onset of the HIV epidemic and an increase in drug-resistant M. tuberculosis strains, the need for an improved vaccine has become an international priority.The recent completion of the genome sequences for two M. tuberculosis strains provides a wealth of information that can be used to design new strategies for vaccine development. The challenge comes in making rational choices from among the 4,000 genes of the most probable candidate immunogens or virulence genes.Thus, a well-designed screen is needed to reduce the number of candidates that must be tested. Presently, the most valuable role that bioinformatics can play is to provide such a screen.
Mori, H., K. Isono, et al. (2000). "Functional genomics of Escherichia coli in Japan." Res Microbiol 151(2): 121-8. Completion of the genome sequence of the model bacterium Escherichia coli has produced nearly 2000 open reading frames (ORFs) that remain to be functionally characterized. To accomplish this goal, we have organized a working project team in Japan and have begun construction of clones containing each of the putative ORFs. The procedure has been conceived such that we shall be able to perform systematic analysis of the shut-off as well as forced expression in vivo of each ORF and purification of its protein product for further biochemical studies. In addition, we have started a collection of various genetic and biochemical data of E. coli published in the past, and analyses of the data from a bio-informatics point of view. Thus, we aim at reaching complete understanding of this model organism in the near future.
Muller, G. (2000). "Towards 3D structures of G protein-coupled receptors: a multidisciplinary approach." Curr Med Chem 7(9): 861-88. Current strategies in pharmaceutical research comprise two methodologically different but complementary approaches for lead finding purposes, namely the random screening of compound libraries and the structure-based effort, commonly termed rational drug design. The structure-based approach is aimed to exploit 3D structure data of the molecular components involved in the molecular recognition event that underlies the attempt to therapeutically modulate the biological function of a macromolecular target with proven pathophysiological relevance for a disease state. In this context, G protein-coupled receptors (GPCRs) constitute the most prominent family of validated drug targets within biomedical research, since approximately 60 % of approved drugs elicit their therapeutic effects by selectively addressing members of that target family. From a 3D structure point of view, these transmembrane signal transduction systems represent the most challenging task for structure determination, which is due to the heterogeneous and fine-balanced environment conditions that are necessary for structural and functional integrity of the receptor protein. This contribution will address the different concepts to derive structurally relevant information on the transmemebrane seven-helix protein (7TM) domain of GPCRs with special emphasis laid on the multidisciplinarity of the applied methodologies. The current status of electron-cryo-microscopy on 2D crystals and even high-resolution x-ray crystallography on 7TM proteins will be introduced highlighting the transferability of the emerging structural principles onto the GPCR superfamily. Special techniques from bioinformatics and homology-related molecular modeling in combination with tailor-made protein simulation methodologies complement the experimentally derived data, in that they facilitate the 3D structure generation and structure validation process. This contribution summarises the most recent results of GPCR structure studies with the aim to underline the impact of structure data not only for the purpose of rationalising structure-activity data on low-molecular weight antagonists within the context of a protein binding pocket, but also for a better understanding of e.g. mutagenesis experiments, thus qualifying GPCR structure models as valid communication platforms establishing a functional link between molecular biology, biophysics, bioinformatics and organic chemistry in a highly efficient manner.
Nakatsuji, H., J. Hasegawa, et al. (2000). "[Excited states and electron transfer in the photosynthetic reaction center of Rhodopseudomonas viridis: SAC-CI study]." Tanpakushitsu Kakusan Koso 45(4): 587-94.
Nelson, R. W., D. Nedelkov, et al. (2000). "Biosensor chip mass spectrometry: a chip-based proteomics approach." Electrophoresis 21(6): 1155-63. Rapid advances in genomic sequencing, bioinformatics, and analytical instrumentation have created the field of proteomics, which at present is based largely on two-dimensional electrophoresis (2-DE) separation of complex protein mixtures and identification of individual proteins using mass spectrometry. These analyses provide a wealth of data, which upon further evaluation leads to many questions regarding the structure and function of the proteins. The challenge of answering these questions create a need for high-specificity approaches that may be used in the analysis of biomolecular recognition events and interacting partners, and thereby places great demands on general protein characterization instrumentation and the types of analyses they need to perform. Over the past five years we have been actively involved in interfacing two general, instrumental techniques, surface plasmon resonance-biomolecular interaction analysis (SPR-BIA) and matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry, into a single concerted approach for use in the functional and structural characterization of proteins. Reviewed here is the recent progress made using biomolecular interaction analysis - mass spectrometry (BIA-MS) in the detailed characterization of proteins and protein-protein interactions and the development of biosensor chip mass spectrometry (BCMS) as a new chip-based proteomics approach.
Nilsson, C. L. and P. Davidsson (2000). "New separation tools for comprehensive studies of protein expression by mass spectrometry." Mass Spectrom Rev 19(6): 390-7. Mass spectrometry has emerged as a core technique for protein identification and characterization because of its high sensitivity, accuracy, and speed of analysis. The most widespread strategy for studying global protein expression in biological systems employs analytical two-dimensional polyacrylamide gel electrophoresis (2D PAGE) followed by enzymatic degradation of isolated protein spots, peptide mapping, and bioinformatics searches. Using this method, thousands of proteins can be resolved in a gel and their expression quantified. However, certain types of proteins possessing important cellular functions are not easily analyzed using this strategy. These proteins include membrane, low copy number, highly basic, and very large (> 150 kDa) and small (< 10 kDa) proteins. To meet the growing need to simultaneously monitor all types of proteins in a biological system, new separation strategies have emerged that are amenable to hyphenation to mass spectrometric techniques. This article will review these new techniques and examine their usefulness in studies of protein expression.
Nylund, S. and M. Sibakov (2000). "[Will genes become an engine to the industry?]." Duodecim 116(16): 1763-8.
Ohlstein, E. H., R. R. Ruffolo, Jr., et al. (2000). "Drug discovery in the next millennium." Annu Rev Pharmacol Toxicol 40: 177-91. Selection and validation of novel molecular targets have become of paramount importance in light of the plethora of new potential therapeutic drug targets that have emerged from human gene sequencing. In response to this revolution within the pharmaceutical industry, the development of high-throughput methods in both biology and chemistry has been necessitated. This review addresses these technological advances as well as several new areas that have been created by necessity to deal with this new paradigm, such as bioinformatics, cheminformatics, and functional genomics. With many of these key components of future drug discovery now in place, it is possible to map out a critical path for this process that will be used into the new millennium.
Palotie, L. (2000). "[Where is the genome project leading to?]." Duodecim 116(16): 1731-3.
Pang, C. P., L. Baum, et al. (2000). "Hunting for disease genes in multi-functional diseases." Clin Chem Lab Med 38(9): 819-25. Disease genes may be identified through functional, positional, and candidate gene approaches. Although extensive and often labor-intensive studies such as family linkage analysis, functional investigation of gene products and genome database searches are usually involved, thousands of human disease genes, especially for monogenic diseases with Mendelian transmission, have been identified. However, in diseases caused by more than one gene, or by a combination of genetic and environmental factors, identification of the genes is even more difficult. Common examples include atherosclerosis, cancer, Alzheimer's disease, asthma, diabetes, glaucoma, and age-related macular degeneration. There have been conflicting reports on the roles of associated genes. Even with population-based case-control studies and new statistical methods such as the sib-ship disequilibrium test and the discordant alleles test, there is no agreement on whether alpha2-macroglobulin (A2M) is a gene for Alzheimer's disease. Another example is the inconsistent association between age-related macular degeneration and ATP-binding cassette transporter (ABCR). Ethnic variation causes further complications. In our investigation of LDL-receptor variants in familial hypercholesterolemia, and the trabecular meshwork inducible glucocorticoid response protein, or myocillin (TIGR-MYOC) mutation pattern in primary open angle glaucoma, we did find dissimilar results in Chinese compared to Caucasians. New information from the Human Genome Project and advancements in technologies will aid the search for and confirm identification of disease genes despite such challenges.
Persson, B. (2000). "Bioinformatics in protein analysis." Exs 88: 215-31. The chapter gives an overview of bioinformatic techniques of importance in protein analysis. These include database searches, sequence comparisons and structural predictions. Links to useful World Wide Web (WWW) pages are given in relation to each topic. Databases with biological information are reviewed with emphasis on databases for nucleotide sequences (EMBL, GenBank, DDBJ), genomes, amino acid sequences (Swissprot, PIR, TrEMBL, GenePept), and three-dimensional structures (PDB). Integrated user interfaces for databases (SRS and Entrez) are described. An introduction to databases of sequence patterns and protein families is also given (Prosite, Pfam, Blocks). Furthermore, the chapter describes the widespread methods for sequence comparisons, FASTA and BLAST, and the corresponding WWW services. The techniques involving multiple sequence alignments are also reviewed: alignment creation with the Clustal programs, phylogenetic tree calculation with the Clustal or Phylip packages and tree display using Drawtree, njplot or phylo_win. Finally, the chapter also treats the issue of structural prediction. Different methods for secondary structure predictions are described (Chou-Fasman, Garnier-Osguthorpe-Robson, Predator, PHD). Techniques for predicting membrane proteins, antigenic sites and postranslational modifications are also reviewed.
Pesole, G., G. Grillo, et al. (2000). "The untranslated regions of eukaryotic mRNAs: structure, function, evolution and bioinformatic tools for their analysis." Brief Bioinform 1(3): 236-49. The crucial role of the non-coding portion of genomes is now widely acknowledged. In particular, mRNA untranslated regions are involved in many post-transcriptional regulatory pathways that control mRNA localisation, stability and translation efficiency. A review is given of the most recent research works on the functional characterisation of eukaryotic mRNA untranslated regions. In order to make possible a systematic and detailed sequence analysis of mRNA untranslated regions (UTRs), a non-redundant database of metazoan mRNA untranslated sequences annotated for the occurrence of specific functional elements, UTRdb, was devised.These elements, whose consensus structure has been devised on the basis of experimental assays and of comparative analyses, have been collected in the UTRsite database. A suitable pattern-matching software has been devised to search UTRsite patterns in user-submitted sequences, also assessing their statistical significance. Structural, compositional and evolutionary features of untranslated sequences of metazoan mRNAs have been investigated showing peculiar intra- and interspecific patterns.
Pitcher, D. G. and N. K. Fry (2000). "Molecular techniques for the detection and identification of new bacterial pathogens." J Infect 40(2): 116-20.
Pollock, D. D., J. A. Eisen, et al. (2000). "A case for evolutionary genomics and the comprehensive examination of sequence biodiversity." Mol Biol Evol 17(12): 1776-88. Comparative analysis is one of the most powerful methods available for understanding the diverse and complex systems found in biology, but it is often limited by a lack of comprehensive taxonomic sampling. Despite the recent development of powerful genome technologies capable of producing sequence data in large quantities (witness the recently completed first draft of the human genome), there has been relatively little change in how evolutionary studies are conducted. The application of genomic methods to evolutionary biology is a challenge, in part because gene segments from different organisms are manipulated separately, requiring individual purification, cloning, and sequencing. We suggest that a feasible approach to collecting genome-scale data sets for evolutionary biology (i.e., evolutionary genomics) may consist of combination of DNA samples prior to cloning and sequencing, followed by computational reconstruction of the original sequences. This approach will allow the full benefit of automated protocols developed by genome projects to be realized; taxon sampling levels can easily increase to thousands for targeted genomes and genomic regions. Sequence diversity at this level will dramatically improve the quality and accuracy of phylogenetic inference, as well as the accuracy and resolution of comparative evolutionary studies. In particular, it will be possible to make accurate estimates of normal evolution in the context of constant structural and functional constraints (i.e., site-specific substitution probabilities), along with accurate estimates of changes in evolutionary patterns, including pairwise coevolution between sites, adaptive bursts, and changes in selective constraints. These estimates can then be used to understand and predict the effects of protein structure and function on sequence evolution and to predict unknown details of protein structure, function, and functional divergence. In order to demonstrate the practicality of these ideas and the potential benefit for functional genomic analysis, we describe a pilot project we are conducting to simultaneously sequence large numbers of vertebrate mitochondrial genomes.
Reed, M. A. and J. M. Tour (2000). "Computing with molecules." Sci Am 282(6): 86-93.
Rew, D. A. (2000). "Modelling in surgical oncology--part III: massive data sets and complex systems." Eur J Surg Oncol 26(8): 805-9. Human tumours are complex and unstable biological systems. New intellectual and mathematical approaches together with massive computing power are transforming our capacity to model and investigate such complexity. Computers also allow massive data sets to be collated and analysed. Such sets include the medical and epidemiological records of entire populations; the entire genetic code of the human being and of other species, including parasites and disease vectors; and the genotype of each and every individual. Massive data sets take us into new dimensions of complexity for which simple linear mathematics are insufficient. The analysis of the grades of complexity which determine protein and cell construction, cell to cell interactions within tissues and organs, the morphogenesis of entire organisms and population interactions with disease vectors require the sophisticated mathematical tools of non-linear analysis, neural networks, chaos and complexity theory. The capacity for closer representations of reality through powerful computational models also allows us to look afresh at the generalizations of conventional statistics. Within this computational cauldron, we may also find help in the better understanding of oncogenesis and cancer therapy. This paper, the third in our series on modelling in tumour biology, considers the breadth of opportunity and challenge at the interface between cell biology and biomathematics.
Rieger, P. T. (2000). "The gene genies." Am J Nurs 100(10): 87-90.
Rigoutsos, I., A. Floratos, et al. (2000). "The emergence of pattern discovery techniques in computational biology." Metab Eng 2(3): 159-77. In the past few years, pattern discovery has been emerging as a generic tool of choice for tackling problems from the computational biology domain. In this presentation, and after defining the problem in its generality, we review some of the algorithms that have appeared in the literature and describe several applications of pattern discovery to problems from computational biology.
Risch, N. J. (2000). "Searching for genetic determinants in the new millennium." Nature 405(6788): 847-56. Human genetics is now at a critical juncture. The molecular methods used successfully to identify the genes underlying rare mendelian syndromes are failing to find the numerous genes causing more common, familial, non-mendelian diseases. With the human genome sequence nearing completion, new opportunities are being presented for unravelling the complex genetic basis of non-mendelian disorders based on large-scale genome-wide studies. Considerable debate has arisen regarding the best approach to take. In this review I discuss these issues, together with suggestions for optimal post-genome strategies.
Roberts, R. J. (2000). "The early days of bioinformatics publishing." Bioinformatics 16(1): 2-4. A brief history of the early days of publishing in the bioinformatics field is presented.
Rogozin, I. B., V. I. Mayorov, et al. (2000). "Prediction and phylogenetic analysis of mammalian short interspersed elements (SINEs)." Brief Bioinform 1(3): 260-74. The presence of repetitive elements can create serious problems for sequence analysis, especially in the case of homology searches in nucleotide sequence databases. Repetitive elements should be treated carefully by using special programs and databases. In this paper, various aspects of SINE (short interspersed repetitive element) identification, analysis and evolution are discussed.
Rosamond, J. and A. Allsop (2000). "Harnessing the power of the genome in the search for new antibiotics." Science 287(5460): 1973-6. Over the past 40 years, the search for new antibiotics has been largely restricted to well-known compound classes active against a standard set of drug targets. Although many effective compounds have been discovered, insufficient chemical variability has been generated to prevent a serious escalation in clinical resistance. Recent advances in genomics have provided an opportunity to expand the range of potential drug targets and have facilitated a fundamental shift from direct antimicrobial screening programs toward rational target-based strategies. The application of genome-based technologies such as expression profiling and proteomics will lead to further changes in the drug discovery paradigm by combining the strengths and advantages of both screening strategies in a single program.
Rossi, D. and A. Zlotnik (2000). "The biology of chemokines and their receptors." Annu Rev Immunol 18: 217-42. During the last five years, the development of bioinformatics and EST databases has been primarily responsible for the identification of many new chemokines and chemokine receptors. The chemokine field has also received considerable attention since chemokine receptors were found to act as co-receptors for HIV infection (1). In addition, chemokines, along with adhesion molecules, are crucial during inflammatory responses for a timely recruitment of specific leukocyte subpopulations to sites of tissue damage. However, chemokines and their receptors are also important in dendritic cell maturation (2), B (3), and T (4) cell development, Th1 and Th2 responses, infections, angiogenesis, and tumor growth as well as metastasis (5). Furthermore, an increase in the number of chemokine/receptor transgenic and knock-out mice has helped to define the functions of chemokines in vivo. In this review we discuss some of the chemokines' biological effects in vivo and in vitro, described in the last few years, and the implications of these findings when considering chemokine receptors as therapeutic targets.
Rutka, J. T., M. Taylor, et al. (2000). "Molecular biology and neurosurgery in the third millennium." Neurosurgery 46(5): 1034-51. The application of techniques in molecular biology to human neurosurgical conditions has led to an increased understanding of disease processes that affect the brain and to novel forms of therapy that favorably modify the natural history of many of these conditions. Molecular strategies are currently being either used or sought for brain tumors, stroke, neurodegenerative diseases, vascular malformations, spinal degenerative diseases, and congenital malformations of the central nervous system. Considering that the structure of deoxyribonucleic acid was ascertained by Watson and Crick as recently as 1953, the progress that has been made to implement molecular medicine in clinical practice has been meteoric. More than 2000 patients have been treated in approved gene therapy trials throughout the world. Many of these patients have been treated for neurological diseases for which conventional medical therapies have been of limited utility. As part of this continuing series on advances in neurosurgery in the third millennium, we first reflect on the history of the nascent field of molecular biology. We then describe the powerful techniques that have evolved from knowledge in this field and have been used in many publications in Neurosurgery, particularly within the past decade. These methods include commonly used techniques such as advanced cytogenetics, differential display, microarray technology, molecular cell imaging, yeast two-hybrid assays, gene therapy, and stem cell utilization. We conclude with a description of the rapidly growing field of bioinformatics. Because the Human Genome Project will be completed within 5 years, providing a virtual blueprint of the human race, the next frontier (and perhaps our greatest challenge) will involve the development of the field of "proteomics," in which protein structure and function are determined from the deoxyribonucleic acid blueprint. It is our conviction that neurosurgeons will continue to be at the forefront of the treatment of patients with neurological diseases using molecular strategies, by performing essential research leading to increased understanding of diseases, by conducting carefully controlled studies to test the effects of treatments on disease processes, and by directly administering (by neurosurgical, endovascular, endoscopic, or stereotactic means) the treatments to patients.
Ryu, D. D. and D. H. Nam (2000). "Recent progress in biomolecular engineering." Biotechnol Prog 16(1): 2-16. During the next decade or so, there will be significant and impressive advances in biomolecular engineering, especially in our understanding of the biological roles of various biomolecules inside the cell. The advances in high throughput screening technology for discovery of target molecules and the accumulation of functional genomics and proteomics data at accelerating rates will enable us to design and discover novel biomolecules and proteins on a rational basis in diverse areas of pharmaceutical, agricultural, industrial, and environmental applications. As an applied molecular evolution technology, DNA shuffling will play a key role in biomolecular engineering. In contrast to the point mutation techniques, DNA shuffling exchanges large functional domains of sequences to search for the best candidate molecule, thus mimicking and accelerating the process of sexual recombination in the evolution of life. The phage-display system of combinatorial peptide libraries will be extensively exploited to design and create many novel proteins, as a result of the relative ease of screening and identifying desirable proteins. Even though this system has so far been employed mainly in screening the combinatorial antibody libraries, its application will be extended further into the science of protein-receptor or protein-ligand interactions. The bioinformatics for genome and proteome analyses will contribute substantially toward ever more accelerated advances in the pharmaceutical industry. Biomolecular engineering will no doubt become one of the most important scientific disciplines, because it will enable systematic and comprehensive analyses of gene expression patterns in both normal and diseased cells, as well as the discovery of many new high-value molecules. When the functional genomics database, EST and SAGE techniques, microarray technique, and proteome analysis by 2-dimensional gel electrophoresis or capillary electrophoresis in combination with mass spectrometer are all put to good use, biomolecular engineering research will yield new drug discoveries, improved therapies, and significantly improved or new bioprocess technology. With the advances in biomolecular engineering, the rate of finding new high-value peptides or proteins, including antibodies, vaccines, enzymes, and therapeutic peptides, will continue to accelerate. The targets for the rational design of biomolecules will be broad, diverse, and complex, but many application goals can be achieved through the expansion of knowledge based on biomolecules and their roles and functions in cells and tissues. Some engineered biomolecules, including humanized Mab's, have already entered the clinical trials for therapeutic uses. Early results of the trials and their efficacy are positive and encouraging. Among them, Herceptin, a humanized Mab for breast cancer treatment, became the first drug designed by a biomolecular engineering approach and was approved by the FDA. Soon, new therapeutic drugs and high-value biomolecules will be designed and produced by biomolecular engineering for the treatment or prevention of not-so-easily cured diseases such as cancers, genetic diseases, age-related diseases, and other metabolic diseases. Many more industrial enzymes, which will be engineered to confer desirable properties for the process improvement and manufacturing of high-value biomolecular products at a lower production cost, are also anticipated. New metabolites, including novel antibiotics that are active against resistant strains, will also be produced soon by recombinant organisms having de novo engineered biosynthetic pathway enzyme systems. The biomolecular engineering era is here, and many of benefits will be derived from this field of scientific research for years to come if we are willing to put it to good use.
Schulze, A. and J. Downward (2000). "Analysis of gene expression by microarrays: cell biologist's gold mine or minefield?" J Cell Sci 113 Pt 23: 4151-6. The development of DNA microarrays to study simultaneously the level of mRNA expressed from thousands of genes offers great promise to cell biologists. Microarrays can be used to gain detailed information about transcriptional changes involved in a specific pathway, potentially leading to the identification of novel components of the signalling system. They can also be used to obtain a fingerprint of the transcriptional status of the cell under a given condition, which may be useful for characterising the pathways used in response to novel stimuli. The use of microarrays will generate huge amounts of expression data, contributing to the transformation of biology from a data-poor to a data-rich science. Whether this leads to real advances in the understanding of cell biological problems will depend on the development of methodologies, both in experimental biology and in bioinformatics, that allow meaningful knowledge to be extracted from all this information.
Schwede, T., A. Diemand, et al. (2000). "Protein structure computing in the genomic era." Res Microbiol 151(2): 107-12. Functional analysis of the proteins discovered in fully sequenced genomes represents the next major challenge of life science research. Computational methods play a crucial role in this activity and, among them, comparative protein modelling is of great assistance during the rational design of mutagenesis experiments. Our aim over the last several years has been to further the use of 3-D model structures in this field. Therefore, we have developed a comparative protein modelling environment composed of the Swiss-PdbViewer (sequence to structure workbench and viewing program), SWISS-MODEL (internet-based server for model generation) and a database of a model generated with 3DCrunch.
Searls, D. B. (2000). "Bioinformatics tools for whole genomes." Annu Rev Genomics Hum Genet 1: 251-79. The advent of whole-genome data resources--not only sequence but also other genome-scale data collections such as gene expression, protein interaction, and genetic variation--is having two marked, complementary effects on the relatively new discipline of bioinformatics. First, the veritable flood of data is creating a need and demand for new tools for dealing adequately with the deluge, and, second, the unprecedented extent, diversity, and impending completeness of the data sets are creating opportunities for new approaches to discovery based on computational methods.
Selzer, P. M., S. Brutsche, et al. (2000). "Target-based drug discovery for the development of novel antiinfectives." Int J Med Microbiol 290(2): 191-201. In the 20th century and especially during the last 50 years, antiinfectives have been increasingly used to control and prevent infectious diseases. Unfortunately the resistance of microorganisms to these pharmaceuticals has increased as well. At the same time the discovery process for novel antiinfectives, the so-called "conventional" screening approach, involves testing natural products or derivatives of known compounds in in vitro cultures. By now it is obvious that this screening approach did not meet the expectations to generate a sufficient number of novel drug candidates. Consequently, studies for selective antiinfectives with new modes of action, which are able to break resistance, are highly desirable for human and animal health. The enormous advance in sequencing technologies--leading to a constantly growing number of known microbial genomes--together with the rapid development of computer power and bioinformatic software tools, now makes it possible to identify genes and gene products that are essential to the pathogenic organisms and are therefore considered to be novel targets for the development of new antiinfectives. When these potential targets have been validated by sophisticated laboratory methods, large diverse compound libraries can be tested in in vitro assays using high-throughput screening. This approach will most likely generate an increasing number of novel lead structures that will be specifically optimized by modern combinatorial chemistry and subsequently lead to new antiinfective candidates strengthening the armoury of weapons available to fight infectious diseases in humans and animals.
Shapiro, S. D. (2000). "Evolving concepts in the pathogenesis of chronic obstructive pulmonary disease." Clin Chest Med 21(4): 621-32. It is arguable that more biologic insight has been gained from the study of COPD than from any other pulmonary disorder. A vast knowledge of the biology of extracellular matrix proteins, proteinases, and proteinase inhibitors has largely stemmed from the elastase:anti-elastase hypothesis for the pathogenesis of emphysema. An equally compelling case could be made that interest in, and funding for, COPD research has been woeful, and investigators have made no significant medical breakthroughs in the treatment of this disorder, which, unfortunately, is becoming epidemic worldwide. Indeed, it cannot be argued that physicians have very little treatment to offer to the many patients with COPD. Humankind is rapidly approaching a time when all human genes will be sequenced, and genetic engineering will allow determination of the function of these proteins in vivo. Expression profiling and bioinformatics will allow clinicians to assess the spectrum of genes and proteins regulated in biologic processes, no longer limiting study to naive candidate genes. These advances will allow investigators to decipher precise pathways of complex diseases, identify genetic and environmental interactions, and ultimately lead to specific (pre)diagnoses and rational treatment. Answers to the question as to why only a subset of smokers develop COPD will enhance the understanding of the disease process. Fortunately, there has been a resurgence of interest in COPD, led largely by the pharmaceutical industry, which has discovered the potential of this unmet need. Consequently, these state-of-the-art scientific techniques are being directly applied to COPD, lending hope for the future. Of even greater importance, the tide seems to be turning on the cigarette industry. Although difficult to imagine, perhaps cigarettes will disappear in this lifetime or at least the next generation won't be fooled by this deadly habit! Well ... the rational therapy thing could work.
Shapiro, L. and T. Harris (2000). "Finding function through structural genomics." Curr Opin Biotechnol 11(1): 31-5. The recent availability of whole-genome sequences and large numbers of protein-coding regions from high-throughput cDNA analysis has fundamentally changed experimental biology. These efforts have provided huge databases of protein sequences, many of which are of unknown function. Deciphering the functions of these myriad proteins presents a major intellectual challenge.
Shaw, A. D., M. K. Winson, et al. (2000). "Rapid analysis of high-dimensional bioprocesses using multivariate spectroscopies and advanced chemometrics." Adv Biochem Eng Biotechnol 66: 83-113. There are an increasing number of instrumental methods for obtaining data from biochemical processes, many of which now provide information on many (indeed many hundreds) of variables simultaneously. The wealth of data that these methods provide, however, is useless without the means to extract the required information. As instruments advance, and the quantity of data produced increases, the fields of bioinformatics and chemometrics have consequently grown greatly in importance. The chemometric methods nowadays available are both powerful and dangerous, and there are many issues to be considered when using statistical analyses on data for which there are numerous measurements (which often exceed the number of samples). It is not difficult to carry out statistical analysis on multivariate data in such a way that the results appear much more impressive than they really are. The authors present some of the methods that we have developed and exploited in Aberystwyth for gathering highly multivariate data from bioprocesses, and some techniques of sound multivariate statistical analyses (and of related methods based on neural and evolutionary computing) which can ensure that the results will stand up to the most rigorous scrutiny.
Sheinerman, F. B., R. Norel, et al. (2000). "Electrostatic aspects of protein-protein interactions." Curr Opin Struct Biol 10(2): 153-9. Structural and mutational analyses reveal a central role for electrostatic interactions in protein-protein association. Experiment and theory both demonstrate that clusters of charged and polar residues that are located on protein-protein interfaces may enhance complex stability, although the total effect of electrostatics is generally net destabilizing. The past year also witnessed significant progress in our understanding of the effect of electrostatics on protein association kinetics, specifically in the characterization of a partially desolvated encounter complex.
Stec, I., S. B. Nagl, et al. (2000). "The PWWP domain: a potential protein-protein interaction domain in nuclear proteins influencing differentiation?" FEBS Lett 473(1): 1-5. Upon characterization of WHSC1, a gene mapping to the Wolf-Hirschhorn syndrome critical region and at its C-terminus similar to the Drosophila ASH1/trithorax group proteins, we identified a novel protein domain designated PWWP domain. To gain insight into its structure, evolutionary conservation and its potential functional role, we performed database searches to identify other PWWP domain-containing proteins. We retrieved 39 proteins, and a multiple alignment shows that the domain spans some 70 amino acids. It is present in proteins of nuclear origin and plays a role in cell growth and differentiation. Due to its position, the composition of amino acids close to the PWWP motif and the pattern of other domains present, we hypothesize that the domain is involved in protein-protein interactions.
Stevens, R., C. A. Goble, et al. (2000). "Ontology-based knowledge representation for bioinformatics." Brief Bioinform 1(4): 398-414. Much of biology works by applying prior knowledge ('what is known') to an unknown entity, rather than the application of a set of axioms that will elicit knowledge. In addition, the complex biological data stored in bioinformatics databases often require the addition of knowledge to specify and constrain the values held in that database. One way of capturing knowledge within bioinformatics applications and databases is the use of ontologies.An ontology is the concrete form of a conceptualisation of a community's knowledge of a domain. This paper aims to introduce the reader to the use of ontologies within bioinformatics. A description of the type of knowledge held in an ontology will be given.The paper will be illustrated throughout with examples taken from bioinformatics and molecular biology, and a survey of current biological ontologies will be presented. From this it will be seen that the use to which the ontology is put largely determines the content of the ontology. Finally, the paper will describe the process of building an ontology, introducing the reader to the techniques and methods currently in use and the open research questions in ontology development.
Tanaka, T., Y. Nishimura, et al. (2000). "[Genomic drug discovery and pharmainformatics]." Tanpakushitsu Kakusan Koso 45(6 Suppl): 805-10.
Thanaraj, T. A., A. Robinson, et al. (2000). "Paradigm shifts in the approaches for gene annotation." Brief Bioinform 1(4): 324-9.
Trifonov, E. N. (2000). "Earliest pages of bioinformatics." Bioinformatics 16(1): 5-9. This review is a brief outline of the chronology and essence of early events in bioinformatics, covering the period from 1869 (discovery of DNA by Miescher) to 1980-1981 (beginning of massive sequencing). For the purpose of this review, bioinformatics is understood as a chapter of molecular biology dealing with the amino acid and nucleotide sequences and with the information they carry.
Tsoka, S. and C. A. Ouzounis (2000). "Recent developments and future directions in computational genomics." FEBS Lett 480(1): 42-8. Computational genomics is a subfield of computational biology that deals with the analysis of entire genome sequences. Transcending the boundaries of classical sequence analysis, computational genomics exploits the inherent properties of entire genomes by modelling them as systems. We review recent developments in the field, discuss in some detail a number of novel approaches that take into account the genomic context and argue that progress will be made by novel knowledge representation and simulation technologies.
Valiaho, J., P. Riikonen, et al. (2000). "Novel immunodeficiency data servers." Immunol Rev 178: 177-85. The Internet contains scientific information in increasing amounts. It is possible to obtain the latest information, and Web services can easily be maintained and updated. We have set up three Internet services on immunodeficiencies. Immunodeficiency-related mutation infor mation is available in immunodeficiency mutation databases (IDbases). Currently 14 registries are distributed, including information about Bloom syndrome (BLMbase), X-linked agammaglobulinemia (BTKbase), X-linked and autosomal recessive chronic granulomatous diseases (CYBBbase for X-linked CGD, CYBAbase for p22(phox) deficiency, NCF1base for p47(phox) deficiency, NCF2base for p67(phox) deficiency), CD3gamma and CD3epsilon deficiencies (CD3Gbase, CD3Ebase), X-linked hyper-IgM syndrome (CD40Lbase), T-B+ severe combined immunodeficiency (JAK3base), V(D)J recombination defects (RAG1base, RAG2base), X-linked lymphoproliferative syndrome (SH2D1Abase), and ZAP-70 deficiency (ZAP70base). Information on laboratories analysing the genetic defects is collected to IDdiagnostics registry. Due to the rareness of immunodeficiencies there are very few laboratories performing genetic diagnostics. Such laboratories are listed in IDdiagnostics and physicians can use the registry to find a suitable laboratory for their diagnostic needs. Immunodeficiency Resource (IDR) is a comprehensive integrated knowledge base for all the information on immunode ficiencies, including clinical, biochemical, genetic, structural and computational data and analyses. All three services are available at http: //www.uta.fi/imt/bioinfo/.
Via, A., F. Ferre, et al. (2000). "Protein surface similarities: a survey of methods to describe and compare protein surfaces." Cell Mol Life Sci 57(13-14): 1970-7. Many methods have been developed to analyse protein sequences and structures, although less work has been undertaken describing and comparing protein surfaces. Evolution can lead sequences to diverge or structures to change topology; nevertheless, surface determinants that are essential to protein function itself may be mantained. Moreover, different molecules could converge to similar functions by gaining specific surface determinants. In such cases, sequence or structure comparisons are likely to be inadequate in describing or identifying protein functions and evolutionary relationships among proteins. Surface analysis can identify function determinants that are independent of sequence or secondary structure and can therefore be a powerful tool to highlight cases of possible convergent or divergent evolution. This kind of approach can be useful for a better understanding of protein molecular and biochemical mechanisms of catalysis or interaction with a ligand, which are usually surface dependent. Protein surface comparison, when compared to sequence or structure comparison methods, is a hard computational challenge and evaluated methods allowing the comparison of protein surfaces are difficult to find. In this review, we will survey the current knowledge about protein surface similarity and the techniques to detect it.
Vihinen, M. and H. Lehvaslaiho (2000). "[Genetic databases and their use]." Duodecim 116(16): 1759-62.
Winson, M. K. and H. M. Davey (2000). "Flow cytometric analysis of microorganisms." Methods 21(3): 231-40. The application of flow cytometry to microorganisms is as old as the technique itself, but it has historically been underexploited for microbial applications. This is now being reversed and microbiologists are ideally placed to benefit from recent technological advances. While earlier papers demonstrated the use of flow cytometry for studies of viability and taxonomy, recent developments in bioinformatics and reporter gene technologies are leading to novel applications in microbiology. Variants of green fluorescent protein have been used for the study of conditional microbial gene regulation in medically important host-pathogen interactions and fluorescence-activated cell sorting is being applied to the isolation of novel mutants in directed evolution studies. This paper reviews the reasons for the delay in the application of flow cytometry to microbial problems, the range of applications, and their limitations and considers the progress made in developing new strategies for use in microbiological investigations.
Yao, T. (2000). "[Bioinformatics in USA and Europe in the post-genome-era]." Tanpakushitsu Kakusan Koso 45(12): 1969-77.
|