Home   About Us   eMedicine Search   Drug Development   Feedback   Google Scholar Search   Intranet 
Literature Database   News   Photo Gallery   Publications   Site Map   Site Search   Useful Links 
 

 Back to  Bioinformatics

Enhanced by Neuroinformation

Bioinformatics Reviews: 2000

(111 References)

(2000). "Data mining." Nat Biotechnol 18 Suppl: IT35-6.

           

(2000). "Bioinformatics." Nat Biotechnol 18 Suppl: IT31-4.

           

Apweiler, R. (2000). "Protein sequence databases." Adv Protein Chem 54: 31-71.

           

Aravind, L. (2000). "Guilt by association: contextual information in genome analysis." Genome Res 10(8): 1074-7.

           

Archakov, A. I. (2000). "[What lies beyond genomics?--Proteomics]." Vopr Med Khim 46(4): 335-43.

           

Ashburner, M. (2000). "A biologist's view of the Drosophila genome annotation assessment project." Genome Res 10(4): 391-3.

           

Attwood, T. K. (2000). "The quest to deduce protein function from sequence: the role of pattern databases." Int J Biochem Cell Biol 32(2): 139-55.

            In the wake of the numerous now-fruitful genome projects, we have witnessed a 'tsunami' of sequence data and with it the birth of the field of bioinformatics. Bioinformatics involves the application of information technology to the management and analysis of biological data. For many of us, this means that databases and their search tools have become an essential part of the research environment. However, the rate of sequence generation and the haphazard proliferation of databases have made it difficult to keep pace with developments, even for the cognoscenti. Moreover, increasing amounts of sequence information do not necessarily equate with an increase in knowledge, and in the panic to automate the route from raw data to biological insight, we may be generating and propagating innumerable errors in our precious databases. In the genome era upon us, researchers want rapid, easy-to-use, reliable tools for functional characterisation of newly determined sequences. For the pharmaceutical industry in particular, the Pandora's box of bioinformatics harbours an information-rich nugget, ripe with potential drug targets and possible new avenues for the development of therapeutic agents. This review outlines the current status of the major pattern databases now used routinely in the analysis of protein sequences. The review is divided into three main sections. In the first, commonly used terms are defined and the methods behind the databases are briefly described; in the second, the structure and content of the principal pattern databases are discussed; and in the final part, several alignment databases, which are frequently confused with pattern databases, are mentioned. For the new-comer, the array of resources, the range of methods behind them and the different tools required to search them can be confusing. The review therefore also briefly mentions a current international endeavour to integrate the diverse databases, which effort should facilitate sequence analysis in the future. This is particularly important for target-discovery programmes, where the challenge is to rationalise the enormous numbers of potential targets generated by sequence database searches. This problem may be addressed, at least in part, by reducing search outputs to the more focused and manageable subsets suggested by searches of integrated groups of family-specific pattern databases.

 

Bajic, V. B. (2000). "Comparing the success of different prediction software in sequence analysis: a review." Brief Bioinform 1(3): 214-28.

            The abundance of computer software for different types of prediction in DNA and protein sequence analyses raises the problem of adequate ranking of prediction program quality. A single measure of success of predictor software, which adequately ranks the predictors, does not exist. A typical example of such an incomplete measure is the so-called correlation coefficient. This paper provides an overview and short analysis of several different measures of prediction quality. Frequently, some of these measures give results contradictory to each other even when they relate to the same prediction scores.This may lead to confusion. In order to overcome some of the problems, a few new measures are proposed including some variants of a 'generalised distance from the ideal predictor score'; these are based on topological properties, rather than on statistics. In order to provide a sort of a balanced ranking, the averaged score measure (ASM) is introduced.The ASM provides a possibility for the selection of the predictor that probably has the best overall performance.The method presented in the paper applies to the ranking problem of any prediction software whose results can be properly represented in a true positive-false positive framework, thus providing a natural set-up for linear biological sequence analysis.

 

Baldi, P., S. Brunak, et al. (2000). "Assessing the accuracy of prediction algorithms for classification: an overview." Bioinformatics 16(5): 412-24.

            We provide a unified overview of methods that currently are widely used to assess the accuracy of prediction algorithms, from raw percentages, quadratic error measures and other distances, and correlation coefficients, and to information theoretic measures such as relative entropy and mutual information. We briefly discuss the advantages and disadvantages of each approach. For classification tasks, we derive new learning algorithms for the design of prediction systems by directly optimising the correlation coefficient. We observe and prove several results relating sensitivity and specificity of optimal systems. While the principles are general, we illustrate the applicability on specific problems such as protein secondary structure and signal peptide prediction.

 

Bassingthwaighte, J. B. (2000). "Strategies for the physiome project." Ann Biomed Eng 28(8): 1043-58.

            The physiome is the quantitative description of the functioning organism in normal and pathophysiological states. The human physiome can be regarded as the virtual human. It is built upon the morphome, the quantitative description of anatomical structure, chemical and biochemical composition, and material properties of an intact organism, including its genome, proteome, cell, tissue, and organ structures up to those of the whole intact being. The Physiome Project is a multicentric integrated program to design, develop, implement, test and document, archive and disseminate quantitative information, and integrative models of the functional behavior of molecules, organelles, cells, tissues, organs, and intact organisms from bacteria to man. A fundamental and major feature of the project is the databasing of experimental observations for retrieval and evaluation. Technologies allowing many groups to work together are being rapidly developed. Internet II will facilitate this immensely. When problems are huge and complex, a particular working group can be expert in only a small part of the overall project. The strategies to be worked out must therefore include how to pull models composed of many submodules together even when the expertise in each is scattered amongst diverse institutions. The technologies of bioinformatics will contribute greatly to this effort. Developing and implementing code for large-scale systems has many problems. Most of the submodules are complex, requiring consideration of spatial and temporal events and processes. Submodules have to be linked to one another in a way that preserves mass balance and gives an accurate representation of variables in nonlinear complex biochemical networks with many signaling and controlling pathways. Microcompartmentalization vitiates the use of simplified model structures. The stiffness of the systems of equations is computationally costly. Faster computation is needed when using models as thinking tools and for iterative data analysis. Perhaps the most serious problem is the current lack of definitive information on kinetics and dynamics of systems, due in part to the almost total lack of databased observations, but also because, though we are nearly drowning in new information being published each day, either the information required for the modeling cannot be found or has never been obtained. "Simple" things like tissue composition, material properties, and mechanical behavior of cells and tissues are not generally available. The development of comprehensive models of biological systems is a key to pharmaceutics and drug design, for the models will become gradually better predictors of the results of interventions, both genomic and pharmaceutic. Good models will be useful in predicting the side effects and long term effects of drugs and toxins, and when the models are really good, to predict where genomic intervention will be effective and where the multiple redundancies in our biological systems will render a proposed intervention useless. The Physiome Project will provide the integrating scientific basis for the Genes to Health initiative, and make physiological genomics a reality applicable to whole organisms, from bacteria to man.

 

Baumeister, W. and A. C. Steven (2000). "Macromolecular electron microscopy in the era of structural genomics." Trends Biochem Sci 25(12): 624-31.

            Macromolecular machines carry out many cellular functions. Cryo-electron microscopy (cryo-EM) is emerging as a powerful method for studying the structure, assembly and dynamics of such macromolecules, and their interactions with substrates. With resolutions still improving, 'single-particle' analyses are already depicting secondary structure. Moreover, cryo-EM can be combined in several ways with X-ray diffraction to enhance the resolution of cryo-EM and the applicability of crystallography. Electron tomography holds promise for visualizing machines at work inside cells.

 

Becich, M. J. (2000). "The role of the pathologist as tissue refiner and data miner: the impact of functional genomics on the modern pathology laboratory and the critical roles of pathology informatics and bioinformatics." Mol Diagn 5(4): 287-99.

            This article provides an overview of how functional genomics is likely to impact on the pathology laboratory and highlights how informatics and tissue banking will greatly facilitate the molecular age of medicine. Important aspects of functional genomics in the post-genome era, including the roles of laser capture microdissection, DNA- and complementary DNA-based microarrays, proteomic methods, collaborative human tissue banking, tissue microarrays, and pathobioinformatics in the modern pathology laboratory are discussed. The role of mass spectroscopy in the analysis of RNA, DNA, and protein and its impact on the clinical laboratory, particularly in cost-effectiveness and time savings, are evaluated. This article explores how laboratory information systems (LISs) and the devices that feed them information may need to be modified to adapt to greater volumes of data for the new testing modalities that require understanding sophisticated fluorescence detection methods and image processing. Emerging genomic testing methods and their impact on pathology laboratory testing, especially in the area of molecular classification of neoplasms, are examined. The role of the tissue bank in the modern pathology laboratory as an archive of control normal tissues, as well as subsamples of the spectrum of progressive neoplastic states, is discussed in light of its critical importance to the molecular classification of cancer. Establishing a database that combines structured reports in pathology LISs and construction of tissue banking information systems will provide a rich resource for pathology departments. The article discusses a hypothetical resource, such as the Shared Tumor Expression Profiler, that would provide access to well-characterized tissue-based research resources for clinicians and researchers. Last, the article emphasizes how LISs can prepare for these changes, and how training pathologists in pathology informatics and bioinformatics (pathobioinformatics) is critical to ensure pathology's overall leadership role in the post-genome era.

 

Benner, S. A., S. G. Chamberlin, et al. (2000). "Functional inferences from reconstructed evolutionary biology involving rectified databases--an evolutionarily grounded approach to functional genomics." Res Microbiol 151(2): 97-106.

            If bioinformatics tools are constructed to reproduce the natural, evolutionary history of the biosphere, they offer powerful approaches to some of the most difficult tasks in genomics, including the organization and retrieval of sequence data, the updating of massive genomic databases, the detection of database error, the assignment of introns, the prediction of protein conformation from protein sequences, the detection of distant homologs, the assignment of function to open reading frames, the identification of biochemical pathways from genomic data, and the construction of a comprehensive model correlating the history of biomolecules with the history of planet Earth.

 

Berendsen, H. J. and S. Hayward (2000). "Collective protein dynamics in relation to function." Curr Opin Struct Biol 10(2): 165-9.

            Several techniques for the analysis of the internal motions of proteins are available - separating large collective motions from small, presumably uninteresting motions. Such descriptions are helpful in the characterization of internal motions and provide insight into the energy landscape of proteins. The real challenge, however, is to relate large collective motions to functional properties, such as binding and regulation, or to folding. These issues have been recently addressed in several papers.

 

Bhattacharya, A., S. Bhattacharya, et al. (2000). "Identification of parasitic genes by computational methods." Parasitol Today 16(3): 127-31.

            A number of parasite genome projects are under way, and large amounts of nucleotide sequence data are becoming available for analysis. There is an urgent need for development of theoretical tools to analyze the genome data, including identification of protein-coding sequences. The majority of the methods developed to date require prior information about the genome before accurate predictions can be made. Because such information is not available for many parasites, these methods cannot be directly applied. In this article, Alok Bhattacharya and colleagues describe some of the gene-prediction methods commonly in use, and a new method, GeneScan, that they have developed for the analysis of parasite genomes.

 

Black, D. L. (2000). "Protein diversity from alternative splicing: a challenge for bioinformatics and post-genome biology." Cell 103(3): 367-70.

           

Blundell, T. L. and K. Mizuguchi (2000). "Structural genomics: an overview." Prog Biophys Mol Biol 73(5): 289-95.

           

Brazma, A. and J. Vilo (2000). "Gene expression data analysis." FEBS Lett 480(1): 17-24.

            Microarrays are one of the latest breakthroughs in experimental molecular biology, which allow monitoring of gene expression for tens of thousands of genes in parallel and are already producing huge amounts of valuable data. Analysis and handling of such data is becoming one of the major bottlenecks in the utilization of the technology. The raw microarray data are images, which have to be transformed into gene expression matrices--tables where rows represent genes, columns represent various samples such as tissues or experimental conditions, and numbers in each cell characterize the expression level of the particular gene in the particular sample. These matrices have to be analyzed further, if any knowledge about the underlying biological processes is to be extracted. In this paper we concentrate on discussing bioinformatics methods used for such analysis. We briefly discuss supervised and unsupervised data analysis and its applications, such as predicting gene function classes and cancer classification. Then we discuss how the gene expression matrix can be used to predict putative regulatory signals in the genome sequences. In conclusion we discuss some possible future directions.

 

Brenner, S. E. (2000). "Target selection for structural genomics." Nat Struct Biol 7 Suppl: 967-9.

            Structural genomics aims to use high-throughput structure determination and computational analysis to provide three-dimensional models of every tractable protein. The process of choosing proteins for experimental structure characterization is known as target selection. In this nomenclature, the targets are regions of proteins to be studied by crystallography or NMR. Selection of the targets is principally a computational process of restricting candidate proteins to those that are tractable and of unknown structure, and prioritizing according to expected interest and accessibility.

 

Broder, S. and J. C. Venter (2000). "Sequencing the entire genomes of free-living organisms: the foundation of pharmacology in the new millennium." Annu Rev Pharmacol Toxicol 40: 97-132.

            The power and effectiveness of clinical pharmacology are about to be transformed with a speed that earlier in this decade could not have been foreseen even by the most astute visionaries. In the very near future, we will have at our disposal the reference DNA sequence for the entire human genome, estimated to contain approximately 3.5 billion bp. At the same time, the science of whole genome sequencing is fostering the computational science of bioinformatics needed to develop practical applications for pharmacology and toxicology. Indeed, it is likely that pharmacology, toxicology, bioinformatics, and genomics will merge into a new branch of medical science for studying and developing pharmaceuticals from molecule to bedside.

 

Bull, A. T., A. C. Ward, et al. (2000). "Search and discovery strategies for biotechnology: the paradigm shift." Microbiol Mol Biol Rev 64(3): 573-606.

            Profound changes are occurring in the strategies that biotechnology-based industries are deploying in the search for exploitable biology and to discover new products and develop new or improved processes. The advances that have been made in the past decade in areas such as combinatorial chemistry, combinatorial biosynthesis, metabolic pathway engineering, gene shuffling, and directed evolution of proteins have caused some companies to consider withdrawing from natural product screening. In this review we examine the paradigm shift from traditional biology to bioinformatics that is revolutionizing exploitable biology. We conclude that the reinvigorated means of detecting novel organisms, novel chemical structures, and novel biocatalytic activities will ensure that natural products will continue to be a primary resource for biotechnology. The paradigm shift has been driven by a convergence of complementary technologies, exemplified by DNA sequencing and amplification, genome sequencing and annotation, proteome analysis, and phenotypic inventorying, resulting in the establishment of huge databases that can be mined in order to generate useful knowledge such as the identity and characterization of organisms and the identity of biotechnology targets. Concurrently there have been major advances in understanding the extent of microbial diversity, how uncultured organisms might be grown, and how expression of the metabolic potential of microorganisms can be maximized. The integration of information from complementary databases presents a significant challenge. Such integration should facilitate answers to complex questions involving sequence, biochemical, physiological, taxonomic, and ecological information of the sort posed in exploitable biology. The paradigm shift which we discuss is not absolute in the sense that it will replace established microbiology; rather, it reinforces our view that innovative microbiology is essential for releasing the potential of microbial diversity for biotechnology penetration throughout industry. Various of these issues are considered with reference to deep-sea microbiology and biotechnology.

 

Case, D. A. (2000). "Interpretation of chemical shifts and coupling constants in macromolecules." Curr Opin Struct Biol 10(2): 197-203.

            Recent developments in NMR spectroscopy, along with advances in computational techniques, have produced new approaches to the interpretation of chemical shifts and spin-spin coupling constants in biomolecules. Quantum chemical studies of useful accuracy are now becoming more routine and are increasingly being used in conjunction with experimental studies to map out expected structural patterns for peptides and oligonucleotides. Topics of recent special interest include spin couplings across hydrogen bonds and patterns of chemical shift anisotropies, in both diamagnetic and paramagnetic proteins.

 

Celis, J. E., M. Kruhoffer, et al. (2000). "Gene expression profiling: monitoring transcription and translation products using DNA microarrays and proteomics." FEBS Lett 480(1): 2-16.

            Novel and powerful technologies such as DNA microarrays and proteomics have made possible the analysis of the expression levels of multiple genes simultaneously both in health and disease. In combination, these technologies promise to revolutionize biology, in particular in the area of molecular medicine as they are expected to reveal gene regulation events involved in disease progression as well as to pinpoint potential targets for drug discovery and diagnostics. Here, we review the current status of these technologies and highlight some studies in which they have been applied in concert to the analysis of biopsy specimens.

 

Chakravarti, D. N., M. J. Fiske, et al. (2000). "Mining genomes and mapping proteomes: identification and characterization of protein subunit vaccines." Dev Biol (Basel) 103: 81-90.

            Currently, there is an extensive and unprecedented effort to obtain the complete nucleotide sequence of the complex genomes of many micro-organisms. In this post-genomic era, based on the availability of the entire genome sequence of an organism, three new disciplines of molecular biology have emerged: genomics, transcriptional profiling and proteomics. All these technologies have the potential to accelerate the process of identifying protective protein antigens as subunit vaccine targets as well as validating and extending the range of available candidate antigens. The progress of these technologies has led to the origination of the science of bioinformatics for management and critical evaluation of the large amount of information generated. Although genomics, transcriptional profiling and proteomics are each based on different principles, there is considerable synergy between them. Appropriate application of any one, or a combination of two or more of these approaches, coupled with bioinformatics, would allow identification of a short-list of vaccine candidates from the entire list of several hundreds to thousands of proteins encoded by the genome. These candidates would then require usual channelling through the subsequent process involving recombinant expression, purification and testing for immunogenicity and protective efficacy.

 

Chambers, G., L. Lawrie, et al. (2000). "Proteomics: a new approach to the study of disease." J Pathol 192(3): 280-8.

            The global analysis of cellular proteins has recently been termed proteomics and is a key area of research that is developing in the post-genome era. Proteomics uses a combination of sophisticated techniques including two-dimensional (2D) gel electrophoresis, image analysis, mass spectrometry, amino acid sequencing, and bio-informatics to resolve comprehensively, to quantify, and to characterize proteins. The application of proteomics provides major opportunities to elucidate disease mechanisms and to identify new diagnostic markers and therapeutic targets. This review aims to explain briefly the background to proteomics and then to outline proteomic techniques. Applications to the study of human disease conditions ranging from cancer to infectious diseases are reviewed. Finally, possible future advances are briefly considered, especially those which may lead to faster sample throughput and increased sensitivity for the detection of individual proteins.

 

Cordon-Cardo, C., R. J. Cote, et al. (2000). "Genetic and molecular markers of urothelial premalignancy and malignancy." Scand J Urol Nephrol Suppl(205): 82-93.

            The molecular genetic changes reported in bladder tumors can be classified as primary and secondary aberrations. Primary molecular alterations may be defined as those directly related to the genesis of cancer. These are frequently found as the sole abnormality and are often associated with particular tumors. There are characteristic primary abnormalities involved in th production of low-grade/well-differentiated neoplasms, which destabilize cellular proliferation but have little effect on cellula "social" interactions or differentiation, as well as the rate of cell death or apoptosis. Other molecular events lead to high-grad neoplasms which disrupt growth control, including the cell cycle and apoptosis, and which have a major impact on biological behavior. A primary target leading to low-grade papillary superficial bladder tumors resides on chromosome 9, while p53 gene alterations are commonly seen in flat carcinoma in situ. Other molecular alterations must be elucidated, as many non-invasive neoplasms have neither chromosome 9 nor p53 alterations. Novel approaches utilizing tissue microdissection techniques an molecular genetic assays are needed to shed further light on this subject. Secondary genetic or epigenetic abnormalities may be fortuitous, or may determine the biological behavior of the tumor. Multiple molecular abnormalities are identified in most human cancers studied, including bladder neoplasms. The accumulation, rather than the order, of these genetic alterations may be the critical factor that grants synergistic activity. In this regard, it is noteworthy that many of the genes that are altered act upon the two recognized critical growth and senescenc pathways, TP53 and RB. These particular molecular aberrations may be especially important to evaluate for their use in the management of bladder cancer because of their commonality in progressive forms of the disease. Thus, clinical trials are underway to explore their use in specific situations, particularly in the surgical management of locally advanced disease, and to determine whether adjuvant chemotherapy in such patients may be of benefit. The use of molecular alterations in the management of non-invasive bladder neoplasms remains to be firmly established. Our knowledge of molecular alterations important in bladder cancer progression is far from complete, and further study is necessary to further elucidate cruci pathways involved in progression and therapeutic response. As per preneoplastic conditions, difficulties in identifying and interpreting the significance of phenotypic changes have imposed certain limitations, as has an evolving nomenclature and issues of reproducibility in interpreting morphologica criteria. Nevertheless, molecular alterations involving chromosome 9q and the INK4A locus in papillary superficial tumors vs changes in chromosomes 14q and 8q, p53 and RB in flat carcinoma in situ lesions may indicate a molecular basis for early events that lead to varying pathways in urothelial tumorigenesis. Studies aimed at revealing the clinical relevance of genet instability, as well as molecular or epigenetic alterations, in urothelium and preneoplastic lesions of otherwise morphologicall normal appearance are needed to further advance knowledge in the field. Clinical advances in bladder cancer will be facilitated by novel animal models paralleling the human disease. Molecular diagnostics, particularly specific antigen expression, fluorescence in situ hybridization and microsatellite analyses, have show great promise as screening and follow-up methodologies, and may supplement urine cytology in the diagnosis and characterization of new and recurrent disease. In addition, the use of high-throughput genomic/proteomic assays, linked to comprehensive databases, and coupled with robust bioinformatics will be key elements in elucidating the components of regulatory and signaling pathways involved in bladder tumorigenesis and cancer progression.

 

Cuticchia, A. J. (2000). "Recent advances in bioinformatics in the medical research environment and applications to the study of skin diseases." J Cutan Med Surg 4(3): 169-73.

            BACKGROUND: The computer has become increasingly intertwined in society for the past 30 years. Within the academic health science centre, there is an increasing need for researchers to become skilled at using the Internet as a mechanism for the retrieval of scientific results and the underlying data. The discipline of bioinformatics, which uses computer technology to provide answers to biological questions, has been expanding in scope and utility for the past decade. Increasing numbers of research groups have been investing in bioinformatics infrastructure to aid in the research process. These continuing investments have led to the establishment for the first time of a supercomputing facility within a hospital. Such computational power is being used for the mapping of genes and the study of human disease. OBJECTIVE: A discussion of the increasing role of computational biology in the research environment of the clinician scientist is presented here. CONCLUSIONS: Though the investment in a supercomputer may not be possible in most research settings, several less expensive alternatives relying on existing desktop computers can provide supercomputer-like performance within nearly any environment.

 

de Wolf, F. A. and G. M. Brett (2000). "Ligand-binding proteins: their potential for application in systems for controlled delivery and uptake of ligands." Pharmacol Rev 52(2): 207-36.

            Unstable or harmful agents, such as drugs, vitamins, flavors, pheromones, and catalysts, for use in pharmaceutics, personal care, functional foods, crop protection, laboratories, offices, and industrial processes, require stabilization against oxidation and degradation or shielding from sensitive environments. Therefore, binding them to carriers with high affinity and selectivity for targeting to the right environment and subsequent controlled release is beneficial, especially if this allows improved control of (stimulus-induced) release. Proteins often possess one or more of these properties, whereas modern biotechnology and bioinformatics provide an increasing number of tools to engineer and adapt these properties. Carrier systems are now developed that incorporate proteins as the central ligand-binding component, e.g., lectins for glucose-triggered release of glycosylated insulin and bispecific antibodies for brain targeting of drugs, but ligand-binding proteins can potentially be used in many other applications. Collectively, the proteins available in nature bind an impressive variety of ligands and non-natural analogs. In this light, various ligand-binding protein classes are surveyed, including biotin-, lipid-, immunosuppressant-, insect pheromone-, phosphate-, and sulfate-binding proteins, as well as bacterial periplasmic proteins, lectins, serum albumins, immunoglobulins, and inactivated enzymes. Disadvantages, such as enzymatic degradation or immunogenicity, associated with the pharmaceutical use of certain proteins can be avoided by incorporating these proteins in more complex carrier and targeting systems. In other applications, this may not be necessary. The enclosure of high-affinity (potentially stimulus-sensitive) binding proteins within an envelope that acts as a diffusion barrier for the ligand may provide excellent slow release. Many possibilities seem to be as yet unexplored.

 

Degtyarenko, K. (2000). "Bioinorganic motifs: towards functional classification of metalloproteins." Bioinformatics 16(10): 851-64.

            The habitat of bioinorganic motifs (BIMs) is at the interface of biological inorganic chemistry and bioinformatics. BIM is defined as a common structural feature shared by functionally related, but not necessarily homologous, proteins, and consisting of the metal atom(s) and first coordination shell ligands. BIMs appear to be suitable for classification of metal centres at any level, from groups of unrelated proteins with similar function to different functional states of the same protein, and for description of possible evolutionary relationships of metalloproteins. However, they have not attracted wide attention from the bioinformatics community. Although their presence is appreciated, they are difficult to predict-therefore the current 'high-throughput' initiatives are likely to miss or ignore them altogether. The protein sequence databases do not distinguish between proteins containing different prosthetic groups (unless they have different sequences) or between apo- and holoprotein. On the other hand, the protein structure databases include data on 'hetero compounds' of various origin but these data are often inconsistent. A number of specialized databases dealing with BIMs and attempts to classify them are reviewed. SUPPLEMENTARY INFORMATION: The additional bibliography and list of Internet resources on bioinorganic chemistry are available at http://www.ebi.ac.uk/ approximately kirill/biometal/

 

Durand, P., S. Fabrega, et al. (2000). "Structural features of normal and mutant human lysosomal glycoside hydrolases deduced from bioinformatics analysis." Hum Mol Genet 9(6): 967-77.

            Lysosomal storage diseases are due to inherited deficiencies in various enzymes involved in basic metabolic processes. As with other genetic diseases, accurate structure data for these enzymatic proteins should help in better understanding the molecular effects of mutations identified in patients with the corresponding lysosomal diseases; however, no such three-dimensional (3D) structure data are available for many lysosomal enzymes. Thus, we herein intend to illustrate for an audience of molecular geneticists how structure information can nonetheless be obtained via a bioinformatics approach in the case of five human lysosomal glycoside hydrolases. Indeed, using the two-dimensional hydrophobic cluster analysis method to decipher the sequence information available in data banks for the large group of glycoside hydrolases (clan GH-A) to which these human lysosomal enzymes belong, we could deduce structure predictions for their catalytic domains and propose explanations for the molecular effects of mutations described in patients. In addition, in the case of human beta-glucuronidase for which experimental 3D data have been reported, we also show here that bioinformatics methods relying on the available 3D structure information can be used to obtain further insights into the effects of various mutations described in patients with Sly disease. In a broader perspective, our work stresses that, in the context of a rapid increase in protein sequence information through genome sequencing, bioinformatics approaches might be highly useful for generating structure-function predictions based on sequence-structure interrelationships.

 

Dutt, M. J. and K. H. Lee (2000). "Proteomic analysis." Curr Opin Biotechnol 11(2): 176-9.

            The field of proteomics is becoming increasingly important as genome sequences are being completed and annotated. Recent advances in proteomics include experimental and mathematical proofs of the need to complement microarray analysis with protein analysis, improved sensitivity for mass spectrometric analysis of separated proteins, better informatic tools for gel analysis and protein spot annotation, first steps towards automated experimental procedures, and new technology for quantitation of protein changes.

 

Ebstein, R. P., J. Benjamin, et al. (2000). "Personality and polymorphisms of genes involved in aminergic neurotransmission." Eur J Pharmacol 410(2-3): 205-214.

            Genetic factors significantly contribute to the determination of human personality traits assessed by self-report questionnaires. However, only in the past few years have common genetic polymorphisms especially the dopamine D4 receptor and the serotonin transporter promoter region been associated with specific personality traits such as novelty seeking and harm avoidance, respectively. The effects of these genes are modest and several genes are likely accounting for individual differences in personality dimensions that can be attributed to genetic factors. Molecular genetic studies of adult personality have also been extended to investigations of early human temperament and some of the genes associated with adult personality traits are also contributing to the earliest developmental expressions of human behavior. Additionally, some of these same genes have also been implicated in various types of abnormal behavior including addiction, obsessive-compulsive disorder, attention deficit, depression, aggression and psychosis. Future research directions will no doubt take advantage of the bioinformatics revolution coinciding with the completion of the first phase of the human genome project. It should soon be possible to identify many of the genes contributing to specific personality traits and to better define their role in determining normal and abnormal behavior from early development through adulthood.

 

Einarson, M. B. and E. A. Golemis (2000). "Encroaching genomics: adapting large-scale science to small academic laboratories." Physiol Genomics 2(3): 85-92.

            The process of conducting biological research is undergoing a profound metamorphosis due to the technological innovations and torrent of information resulting from the execution of multiple species genome projects. The further tasks of mapping polymorphisms and characterizing genome-wide protein-protein interaction (the characterization of the proteome) will continue to garner resources, talent, and public attention. Although some elements of these whole genome size projects can only be addressed by large research groups, consortia, or industry, the impact of these projects has already begun to transform the process of research in many small laboratories. Although the impact of this transformation is generally positive, laboratories engaged in types of research destined to be dominated by the efforts of a genomic consortium may be negatively impacted if they cannot rapidly adjust strategies in the face of new large-scale competition. The focus of this report is to outline a series of strategies that have been productively utilized by a number of small academic laboratories that have attempted to integrate such genomic resources into research plans with the goal of developing novel physiological insights.

 

Eisenberg, D., E. M. Marcotte, et al. (2000). "Protein function in the post-genomic era." Nature 405(6788): 823-6.

            Faced with the avalanche of genomic sequences and data on messenger RNA expression, biological scientists are confronting a frightening prospect: piles of information but only flakes of knowledge. How can the thousands of sequences being determined and deposited, and the thousands of expression profiles being generated by the new array methods, be synthesized into useful knowledge? What form will this knowledge take? These are questions being addressed by scientists in the field known as 'functional genomics'.

 

Eng, F. J. and S. L. Friedman (2000). "Fibrogenesis I. New insights into hepatic stellate cell activation: the simple becomes complex." Am J Physiol Gastrointest Liver Physiol 279(1): G7-G11.

            Hepatic stellate cell activation is a complex process. Paradoxes and controversies include the origin(s) of hepatic stellate cells, the regulation of membrane receptor signaling and transcription, and the fate of the cells once liver injury resolves. Major themes have emerged, including the dominance of autocrine signaling and the identification of counterregulatory stimuli that oppose key features of activated cells. Advances in analytical methods including proteomics and gene array, coupled with powerful bioinformatics, promise to revolutionize how we view cellular responses. Our understanding of stellate cell activation is likely to benefit from these advances, unearthing modes of regulating cellular behavior that are not even conceivable on the basis of current paradigms.

 

Fagan, R. and M. Swindells (2000). "Bioinformatics, target discovery and the pharmaceutical/biotechnology industry." Curr Opin Mol Ther 2(6): 655-61.

            With the first draft of the human genome now available a directed genome-wide mining strategy is being implemented by many pharmaceutical and biotechnology companies in order to identify novel members of the most therapeutically relevant target families. At the same time there is an increasing amount of annotation relevant to the human genome sequence entering into the public domain. The ability to identify protein families on a genome-wide scale can only be done at speed by using high-throughput computational approaches. This review describes many of the latest algorithmic developments in this field and shows how they can be best put to use for target identification and prioritization.

 

Fickett, J. W. and W. W. Wasserman (2000). "Discovery and modeling of transcriptional regulatory regions." Curr Opin Biotechnol 11(1): 19-24.

            A complex network of regulatory controls governs the patterns of gene expression. Enabled by the tools of molecular cloning, initial experimental queries into the gene regulatory network elucidated a wide array of transcription factors and their cognate binding sites from hundreds of genes. The recent fusion of genome-scale experimental tools, a more comprehensive gene catalog, and concomitant advances in computational methodology, has extended the range of questions being posed. The potential to further our understanding of the biochemical mechanisms of transcriptional regulation and to accelerate the delineation of regulatory control regions in the human genome is enormous.

 

Fiechter, A. (2000). "Biotechnology in Switzerland and a glance at Germany." Adv Biochem Eng Biotechnol 69: 175-208.

            The roots of biotechnology go back to classic fermentation processes, which starting from spontaneous reactions were developed by simple means. The discovery of antibiotics made contamination-free bioprocess engineering indispensable, which led to a further step in technology development. On-line analytics and the use of computers were the basis of automation and the increase in quality. On both sides of the Atlantic, molecular biology emerged at the same time, which gave genetic engineering in medicine, agriculture, industry and environment new opportunities. The story of this new advanced technology in Switzerland, with a quick glance at Germany, is followed back to the post-war years. The growth of research and teaching and the foundation of the European Federation of Biotechnology (EFB) are dealt with. The promising phase of the 1960s and 1970s soon had to give way to a restrictive policy of insecurity and anxiousness, which, today, manifests itself in the rather insignificant contributions of many European countries to the new sciences of genomics, proteomics and bioinformatics, as well as in the resistance to the use of transgenic agricultural crops and their products in foods.

 

Foster, C. B. and S. J. Chanock (2000). "Mining variations in genes of innate and phagocytic immunity: current status and future prospects." Curr Opin Hematol 7(1): 9-15.

            The large number of sequence variations in human genes reflects the diversity of human populations and the response to prior environmental and pathogen challenges. Currently, major efforts are under way to identify and catalog single-nucleotide polymorphisms for use in genetic studies designed to explore the contribution of common variants to both disease susceptibility and interindividual differences in outcomes. So far, the most productive approach has been to search with candidate genes for which there is a scientific rationale (eg, prior data on the biologic implications of one or more variant alleles). Recently, there has been an explosion in the number of genetic association studies seeking to correlate one or more well-defined outcomes with variant alleles. These studies provide a foundation for identifying and applying genetic risk factors to clinical medicine. However, a number of challenges must be met before widespread use in clinical medicine can be undertaken. These include more efficient bioinformatics, basic insights into the significance of the variant alleles, ethical issues, and the availability of cost-effective, high-throughput platforms for genotype analysis.

 

Fujita, Y. (2000). "[A new approach to pharmacogenomics]." Nippon Yakurigaku Zasshi 116(3): 149-57.

            The medicine in the 21st century will be so called "evidence based medicine" or "personalized medicine," based on the principle of "right drug to right patient." Pharmacogenomics covers the entire spectrum of genes that determines drug behavior and sensitivity, and we anticipate it will bring major impact on the healthcare system as well as the drug discovery process in the near future. Three waves of genomic impact are predicted to arise as follows: The first wave will hit on existing drugs and late-phase development candidates within the next 2-3 years, aiming to minimize the risks in clinical trials (adverse events, resistance, etc.). The wave will then affect the candidate selection process in the early pre-development stage, and finally the disease gene finding to target discovery process. The driving force will be technologies such as SNPs database, differential gene expression (DGE) analysis, proteomics, serial analysis of gene expression (SAGE) and bioinformatics. This new approach of genomic discovery (so called "integrated approach") requires knowledge on how to implement and integrate new valuable technologies from an early stage of the discovery process. The implication of SNPs, high throughput proteomics and application of structural genomics will be the key issues in the pharmacogenomics era.

 

Gelfand, M. S., P. S. Novichkov, et al. (2000). "Comparative analysis of regulatory patterns in bacterial genomes." Brief Bioinform 1(4): 357-71.

            Recognition of transcription regulatory sites in bacterial genomes is a notoriously difficult problem.There are no algorithms capable of making reliable predictions even for well-studied sites such as the CRP (cyclic AMP receptor protein) box. However, availability of complete bacterial genomes makes it possible to make reliable predictions with bad rules.This comparative approach is based on the assumption that sets of co-regulated genes are conserved in related bacteria.Thus true sites occur upstream of orthologous genes, whereas false candidates are scattered at random. This means not only that knowledge about regulation in well-studied genomes can be transferred to newly sequenced ones, but also that new members of regulons can be found.This paper reviews several recent studies. In particular, a detailed analysis of catabolite repression in gamma-purple bacteria is presented.

 

Gerstein, M. and R. Jansen (2000). "The current excitement in bioinformatics-analysis of whole-genome expression data: how does it relate to protein structure and function?" Curr Opin Struct Biol 10(5): 574-84.

            Whole-genome expression profiles provide a rich new data-trove for bioinformatics. Initial analyses of the profiles have included clustering and cross-referencing to 'external' information on protein structure and function. Expression profile clusters do relate to protein function, but the correlation is not perfect, with the discrepancies partially resulting from the difficulty in consistently defining function. Other attributes of proteins can also be related to expression-in particular, structure and localization-and sometimes show a clearer relationship than function.

 

Geuna, S. (2000). "Appreciating the difference between design-based and model-based sampling strategies in quantitative morphology of the nervous system." J Comp Neurol 427(3): 333-9.

            Quantitative morphology of the nervous system has undergone great developments over recent years, and several new technical procedures have been devised and applied successfully to neuromorphological research. However, a lively debate has arisen on some issues, and a great deal of confusion appears to exist that is definitely responsible for the slow spread of the new techniques among scientists. One such element of confusion is related to uncertainty about the meaning, implications, and advantages of the design-based sampling strategy that characterize the new techniques. In this article, to help remove this uncertainty, morphoquantitative methods are described and contrasted on the basis of the inferential paradigm of the sampling strategy: design-based vs model-based. Moreover, some recommendations are made to help scientists judge the appropriateness of a method used for a given study in relation to its specific goals. Finally, the use of the term stereology to label, more or less expressly, only some methods is critically discussed.

 

Giegerich, R. (2000). "A systematic approach to dynamic programming in bioinformatics." Bioinformatics 16(8): 665-77.

            MOTIVATION: Dynamic programming is probably the most popular programming method in bioinformatics. Sequence comparison, gene recognition, RNA structure prediction and hundreds of other problems are solved by ever new variants of dynamic programming. Currently, the development of a successful dynamic programming algorithm is a matter of experience, talent and luck. The typical matrix recurrence relations that make up a dynamic programming algorithm are intricate to construct, and difficult to implement reliably. No general problem independent guidance is available. RESULTS: This article introduces a systematic method for constructing dynamic programming solutions to problems in biosequence analysis. By a conceptual splitting of the algorithm into a recognition and an evaluation phase, algorithm development is simplified considerably, and correct recurrences can be derived systematically. Without additional effort, the method produces an early, executable prototype expressed in a functional programming language. The method is quite generally applicable, and, while programming effort decreases, no overhead in terms of ultimate program efficiency is incurred.

 

Graf, W. D. and O. E. Oleinik (2000). "The study of neural tube defects after the Human Genome Project and folic acid fortification of foods." Eur J Pediatr Surg 10 Suppl 1: 9-12.

            The implementation of folic acid fortification will eliminate a proportion of neural tube defects (NTD). As a result, the etiologic and clinical profiles of the developmental disorder may both change. In the assessment of NTD as it evolves, the bioinformatics structure and content of the Human Genome Project will find vital application. One important development will be an enhanced understanding of the role of folic acid in global regulation of gene expression through epigenetic processes. In addition, bioinformatics will facilitate coordination of research in the basic sciences with clinical investigations to better define remaining etiologic factors.

 

Graf, W. D. (2000). "Can bioinformatics help trace the steps from gene mutation to disease?" Neurology 55(3): 331-3.

           

Gutierrez, J. A. (2000). "Genomics: from novel genes to new therapeutics in parasitology." Int J Parasitol 30(3): 247-52.

            The advent of rapid DNA sequencing technologies is generating vast quantities of raw genomic information ranging from in-depth analysis of the expressed genes to complete sequencing of genomes at an increasing rate (bioinformatics). However, it is the functional characterisation of a specific gene product that is the key limiting factor for validation as targets for high throughput assay development. The challenge is to obtain the raw genomic information from parasites of economic importance and to effectively integrate broad technologies such as gene disruption and over-expression, DNA arrays, proteomics, antisense RNAs, with bioinformatics in a timely fashion to identify relevant biological targets. Screening of validated targets in a strategy that includes large numbers of chemistries with high diversity and predictive in vitro and in vivo assays should permit the successful identification of novel chemical entities with high specificity to the target parasite. It is proposed that this rational approach will permit the identification of new antiparasitic therapies able to surpass the current toxicological, environmental, and economic challenges of the marketplace.

 

Harris, N. L. (2000). "Annotating sequence data using Genotator." Mol Biotechnol 16(3): 221-32.

            In this postgenomic era, it is no longer necessary to argue the need for automated methods for sequence annotation. Many researchers have designed tools for analyzing DNA sequences, but running multiple tools and interpreting the results can be tedious and confusing. In the last few years, many analysis workbenches have been developed to help streamline the process of sequence annotation. Genotator, developed in 1996, is still a popular choice owing to its ease of use and its configurability. This article will review annotating sequence data using the Genotator.

 

Hassan, A. and H. S. Markus (2000). "Genetics and ischaemic stroke." Brain 123 ( Pt 9): 1784-812.

            Ischaemic stroke can be caused by a number of monogenic disorders, and in such cases stroke is frequently part of a multisystem disorder. Cerebral autosomal dominant arteriopathy with subcortical infarcts and leucoencephalopathy (CADASIL), due to mutations in the NOTCH: 3 gene, is increasingly appreciated as a cause of familial subcortical stroke. The genetics and phenotypes of monogenic stroke are covered in this review. However, the majority of cases of ischaemic stroke are multifactorial in aetiology. Strong evidence from epidemiological and animal studies has implicated genetic influences in the pathogenesis of multifactorial ischaemic stroke, but the identification of individual causative mutations remains problematic; this is in part limited by the number of approaches currently available. In addition, genetic influences are likely to be polygenic, and ischaemic stroke itself consists of a number of different phenotypes which may each have different genetic profiles. Almost all human studies to date have employed a candidate gene approach. Associations with polymorphisms in a variety of candidate genes have been investigated, including haemostatic genes, genes controlling homocysteine metabolism, the angiotensin-converting enzyme gene, and the endothelial nitric oxide synthase gene. The results of these studies, and the advantages and limitations of the candidate gene approach, are presented. The recent biological revolution, spurred by the human genome project, promises the advent of novel technologies supported by bioinformatics resources that will transform the study of polygenic disorders such as stroke. Their potential application to polygenic ischaemic stroke is discussed.

 

Hirt, H. (2000). "MAP kinases in plant signal transduction." Results Probl Cell Differ 27: 1-9.

            Mitogen-activated protein kinase (MAPK) pathways are modules involved in the transduction of extracellular signals to intracellular targets in all eukaryotes. Distinct MAPK pathways are regulated by different extracellular stimuli and are implicated in a wide variety of biological processes. In plants there is evidence for MAPKs playing a role in the signaling of abiotic stresses, pathogens, plant hormones, and cell cycle cues. The large number and divergence of plant MAPKs indicates that this ancient mechanism of bioinformatics is extensively used in plants and their study promises to give molecular answers to old questions.

 

Horrocks, P., S. Bowman, et al. (2000). "Entering the post-genomic era of malaria research." Bull World Health Organ 78(12): 1424-37.

            The sequencing of the genome of Plasmodium falciparum promises to revolutionize the way in which malaria research will be carried out. Beyond simple gene discovery, the genome sequence will facilitate the comprehensive determination of the parasite's gene expression during its developmental phases, pathology, and in response to environmental variables, such as drug treatment and host genetic background. This article reviews the current status of the P. falciparum genome sequencing project and the unique insights it has generated. We also summarize the application of bioinformatics and analytical tools that have been developed for functional genomics. The aim of these activities is the rational, information-based identification of new therapeutic strategies and targets, based on a thorough insight into the biology of Plasmodium spp.

 

Hsiao, L. L., R. L. Stears, et al. (2000). "Prospective use of DNA microarrays for evaluating renal function and disease." Curr Opin Nephrol Hypertens 9(3): 253-8.

            At the forefront of the revolution in human genomics is DNA microarray technology, which evaluates expression levels or genotypes of thousands of genes simultaneously, by means of miniaturization and parallel processing. Furthermore, advances in bioinformatics will result in the creation of large databases, which will require complex software programming for structural analysis. Over the next decade, DNA microarrays, combined with sophisticated informatics and genomic databases, will provide molecular fingerprints of disease processes and prognoses. This review provides an update on DNA microarray technology and its application to renal diseases.

 

Jain, K. K. (2000). "Applications of biochip and microarray systems in pharmacogenomics." Pharmacogenomics 1(3): 289-307.

            A DNA microarray system is usually comprised of DNA probes formatted on a microscale on a glass surface (chip), plus the instruments needed to handle samples (automated robotics), to read the reporter molecules (scanners) and analyse the data (bioinformatic tools). Biochips are formed by in situ (on chip) synthesis of oligonucleotides or peptide nucleic acids (PNAs) or spotting of DNA fragments. Hybridisation of RNA- or DNA-derived samples on chips allows the monitoring of expression of mRNAs or the occurrence of polymorphisms in genomic DNA. Basic types of DNA chips are the sequencing chip, the expression chip and chips for comparative genomic hybridisation. Advanced technologies used in automated microarray production are photolithography, mechanical microspotting and ink jets. Bioelectronic microchips contain numerous electronically active microelectrodes with specific DNA capture probes linked to the electrodes through molecular wires. Several biosensors have been used in combination with biochips. PNA biosensors commonly rely on the immobilisation of a single-stranded DNA sequence (the 'probe') onto a transducer surface for hybridisation with the complementary ('target') strand to give a suitable electrical signal. Other sensors are cell-based immunobiosensors with engineered molecular recognition, integrated biosensors based on phototransistor integrated circuits and sensors based on surface plasmon resonance. Microarray technologies offer enormous savings in time and labour as compared to standard gel-based microsatellite methods. Reading of the information and its management by bioinformatics is necessary because of the enormous amount of data generated by the various technologies using microarrays. Standardised procedures are essential for compatible data production, quality control and analysis. Expression monitoring is the most biologically informative application of this technology at present. Microarray technology has important applications in pharmacogenomics: drug discovery and development, drug safety and molecular diagnostics. DNA chips will facilitate the integration of diagnosis and therapeutics, as well as the introduction of personalised medicines.

 

Jan van Wijk, K. (2000). "Proteomics of the chloroplast: experimentation and prediction." Trends Plant Sci 5(10): 420-5.

            New technologies, in combination with increasing amounts of plant genome sequence data, have opened up incredible experimental possibilities to identify the total set of chloroplast proteins (the chloroplast proteome) as well as their expression levels and post-translational modifications in a global manner. This is summarized under the term 'proteomics' and typically involves two-dimensional electrophoresis or chromatography, mass spectrometry and bioinformatics. Complemented with nucleotide-based global techniques, proteomics is expected to provide many new insights into chloroplast biogenesis, adaptation and function.

 

Johnson, J. E. and W. Chiu (2000). "Structures of virus and virus-like particles." Curr Opin Struct Biol 10(2): 229-35.

            Virus structures continue to be the basis for mechanistic virology and serve as a paradigm for solutions to problems concerning macromolecular assembly and function in general. The use of X-ray crystallography, electron cryomicroscopy and computational and biochemical methods has provided not only details of the structural folds of individual viral components, but also insights into the structural basis of assembly, nucleic acid packaging, particle dynamics and interactions with cellular molecules.

 

Kaminski, N. (2000). "Bioinformatics. A user's perspective." Am J Respir Cell Mol Biol 23(6): 705-11.

            This review provides an overview of bioinformatics from the user's point of view. Bioinformatics, defined as the application of computers, databases, and computational methods to the management of biologic information, is essential for almost every aspect of data management in modern biology. The rapid accumulation of genomic sequence information together with the wide availability of new technologies that analyze global gene expression patterns have created an information overload. Molecular biology labs are increasingly dependent on computers, large-capacity databases, search and analysis tools, and high-quality Internet connections. Currently available bioinformatics tools are discussed and a general approach is outlined. Using the resources and approaches in this review, readers should be able to form their own view of bioinformatics and tailor the solutions to the information overload according to their needs.

 

Kato, R. (2000). "[Actual situation and perspective of novel drug discovery]." Tanpakushitsu Kakusan Koso 45(6 Suppl): 763-75.

           

Kellner, R. (2000). "Proteomics. Concepts and perspectives." Fresenius J Anal Chem 366(6-7): 517-24.

            Within the last five years the field of proteomics has changed the understanding of molecular biology. Proteins manifest physiological as well as pathophysiological processes in a cell or an organism, and proteomics describes the complete protein inventory in dependence on in vivo parameters. Disease mechanism or drug effects both affect a protein profile and, vice versa, characterising protein profiles reveals information for the understanding of disease and therapy. Analytical methods for proteomics are based on conventional tools for protein characterisation. The technical challenge is the complete coverage of physico-chemical properties for thousands of proteins. Nucleic acids display a relative chemical homogeneity and therefore genomics was considered more promising in the past than proteomics. Further improvements in proteomics technologies will likely change this course with proteomics complementing genomics as a tool to study life sciences.

 

Kennedy, G. C. (2000). "The impact of genomics on therapeutic drug development." Exs 89: 1-10.

            Genomics can be defined as a set of related technologies that are focused on the discovery of genes implicated in human disease. Although many of the estimated 100,000 genes in the human genome have been at least partially identified by nucleotide sequence, elucidation of biological function has been achieved for only a small percentage of these. An even smaller percentage of genes discovered by these methodologies have become valid drug targets. This review discusses the various genomics technologies and their likelihood of yielding therapeutic drugs. Emerging advances in microarray "chip" technology have allowed the parallel analysis of gene expression patterns for thousands of genes simultaneously. Sequence information derived from the genomes of many individuals is leading to the rapid discovery of single nucleotide polymorphisms or SNPs. Detection of these human polymorphisms will fuel the discipline of pharmacogenomics, resulting in an increase in the success of clinical trials, the rescue of drugs that have previously failed in clinical trials because of adverse reactions from patient subpopulations, and ultimately, in the development of more personalized drug therapies. The impending identification of all human genes will signal the end of the structural genomics phase and usher in the function genomics phase. Technologies have already begun to move toward high-throughput elucidation of gene relationships, interactions and, it is hoped, toward their functions.

 

Ladunga, I. (2000). "Large-scale predictions of secretory proteins from mammalian genomic and EST sequences." Curr Opin Biotechnol 11(1): 13-8.

            Machine learning techniques have improved predictions of secretory proteins from protein, genomic and expressed sequence tag (EST) sequences. Artificial neural networks, physical sequence analysis using high-performance optimization, and hidden Markov models identify extremely variable signal peptides (the vehicles of protein transport across the endoplasmic reticulum membrane), transmembrane segments, and specific extracellular and intracellular domains as indicators of possible roles in the intercellular and intracellular chemical signaling pathways. The major role of peptide hormones, blood coagulation factors, carcinogenesis agents, and other secretory proteins in orchestrating multicellular life indicates pharmacological potential in the cure of major diseases and numerous biotechnological applications.

 

Landro, J. A., I. C. Taylor, et al. (2000). "HTS in the new millennium: the role of pharmacology and flexibility." J Pharmacol Toxicol Methods 44(1): 273-89.

            Over the past decade, high throughput screening (HTS) has become the focal point for discovery programs within the pharmaceutical industry. The role of this discipline has been and remains the rapid and efficient identification of lead chemical matter within chemical libraries for therapeutics development. Recent advances in molecular and computational biology, i.e., genomic sequencing and bioinformatics, have resulted in the announcement of publication of the first draft of the human genome. While much work remains before a complete and accurate genomic map will be available, there can be no doubt that the number of potential therapeutic intervention points will increase dramatically, thereby increasing the workload of early discovery groups. One current drug discovery paradigm integrates genomics, protein biosciences and HTS in establishing what the authors refer to as the "gene-to-screen" process. Adoption of the "gene-to-screen" paradigm results in a dramatic increase in the efficiency of the process of converting a novel gene coding for a putative enzymatic or receptor function into a robust and pharmacologically relevant high throughput screen. This article details aspects of the identification of lead chemical matter from HTS. Topics discussed include portfolio composition (molecular targets amenable to small molecule drug discovery), screening file content, assay formats and plating densities, and the impact of instrumentation on the ability of HTS to identify lead chemical matter.

 

Larsen, M. R. and P. Roepstorff (2000). "Mass spectrometric identification of proteins and characterization of their post-translational modifications in proteome analysis." Fresenius J Anal Chem 366(6-7): 677-90.

            High-throughput DNA sequencing has resulted in increasing input in protein sequence databases. Today more than 20 genomes have been sequenced and many more will be completed in the near future, including the largest of them all, the human genome. Presently, sequence databases contain entries for more than 425.000 protein sequences. However, the cellular functions are determined by the set of proteins expressed in the cell--the proteome. Two-dimensional gel electrophoresis, mass spectrometry and bioinformatics have become important tools in correlating the proteome with the genome. The current dominant strategies for identification of proteins from gels based on peptide mass spectrometric fingerprinting and partial sequencing by mass spectrometry are described. After identification of the proteins the next challenge in proteome analysis is characterization of their post-translational modifications. The general problems associated with characterization of these directly from gel separated proteins are described and the current state of art for the determination of phosphorylation, glycosylation and proteolytic processing is illustrated.

 

Lee, P. S. and K. H. Lee (2000). "Genomic analysis." Curr Opin Biotechnol 11(2): 171-5.

            Advances in genomic analysis include improved technology for DNA sequencing, routine use of DNA microarray technology for the analysis of gene expression profiles at the mRNA level and improved informatic tools to organize and analyze such data. At the same time, new developments in chip-based analysis of samples and the emergence of models of gene networks hold promise for the future of the 'Genomic Era'.

 

Lengauer, T. and R. Zimmer (2000). "Protein structure prediction methods for drug design." Brief Bioinform 1(3): 275-88.

            Along the long path from genomic data to a new drug, the knowledge of three-dimensional protein structure can be of significant help in several places.This paper points out such places, discusses the virtues of protein structure knowledge and reviews bioinformatics methods for gaining such knowledge on the protein structure.

 

Loferer, H. (2000). "Mining bacterial genomes for antimicrobial targets." Mol Med Today 6(12): 470-4.

            The elucidation of whole-genome sequences is expected to have a revolutionary impact on the discovery of novel medicines. With the availability of complete genome sequences of more than 30 different species, the field of antimicrobial drug discovery has the opportunity to access a remarkable diversity of genomic information. In this review, I summarize how microbial genomics has changed strategies of drug discovery by applying bioinformatics, novel genetic approaches and genomics-based technologies, including analysis of gene expression using DNA microarrays.

 

Mayer, K. F., K. Lemcke, et al. (2000). "Arabidopsis genome analysis as exemplified by analysis of chromosome 4." Brief Bioinform 1(4): 389-97.

            During the last decade the small cruciferous plant Arabidopsis thaliana has become a model organism for flowering plants. Sequencing and analysis of the Arabidopsis genome is nearing completion. Beside an overview on methods and strategies for Arabidopsis genome analysis, a summary of the results from the first analysis is presented.This includes an overview on chromosomal organisation and topological features as well as a first comparison with other genomes.

 

Michelson, S. and K. Joho (2000). "Drug discovery, drug development and the emerging world of pharmacogenomics: prospecting for information in a data-rich landscape." Curr Opin Mol Ther 2(6): 651-4.

            Drug development is a very expensive and inefficient process. Currently, it takes on average 15 years and costs approximately US $500 million to bring a new drug to market, with the pharmaceutical industry spending more than US $20 billion in identifying and developing drugs in 1998. Twenty-two percent of this total was spent on screening assays and toxicity testing. Yet the rapidly accelerating advances in high-throughput technologies, including screening and robotics, combinatorial chemistry, and genomics makes this an extremely data-rich environment. Add to that the new paradigms of pharmacogenomics and 'customized medicine', and the question is, are we helping or hurting our cause? Clearly, interpreting this flood of data and turning it into useful information is our next great hurdle. By extending the pharmacogenomic paradigm to the drug discovery process, this paper intends to put the scope of the problem into context.

 

Montgomery, D. L. (2000). "Tuberculosis vaccine design: influence of the completed genome sequence." Brief Bioinform 1(3): 289-96.

            Tuberculosis continues to be a major health problem, with more adults dying from Mycobacterium tuberculosis than any other pathogen world-wide.With the onset of the HIV epidemic and an increase in drug-resistant M. tuberculosis strains, the need for an improved vaccine has become an international priority.The recent completion of the genome sequences for two M. tuberculosis strains provides a wealth of information that can be used to design new strategies for vaccine development. The challenge comes in making rational choices from among the 4,000 genes of the most probable candidate immunogens or virulence genes.Thus, a well-designed screen is needed to reduce the number of candidates that must be tested. Presently, the most valuable role that bioinformatics can play is to provide such a screen.

 

Mori, H., K. Isono, et al. (2000). "Functional genomics of Escherichia coli in Japan." Res Microbiol 151(2): 121-8.

            Completion of the genome sequence of the model bacterium Escherichia coli has produced nearly 2000 open reading frames (ORFs) that remain to be functionally characterized. To accomplish this goal, we have organized a working project team in Japan and have begun construction of clones containing each of the putative ORFs. The procedure has been conceived such that we shall be able to perform systematic analysis of the shut-off as well as forced expression in vivo of each ORF and purification of its protein product for further biochemical studies. In addition, we have started a collection of various genetic and biochemical data of E. coli published in the past, and analyses of the data from a bio-informatics point of view. Thus, we aim at reaching complete understanding of this model organism in the near future.

 

Muller, G. (2000). "Towards 3D structures of G protein-coupled receptors: a multidisciplinary approach." Curr Med Chem 7(9): 861-88.

            Current strategies in pharmaceutical research comprise two methodologically different but complementary approaches for lead finding purposes, namely the random screening of compound libraries and the structure-based effort, commonly termed rational drug design. The structure-based approach is aimed to exploit 3D structure data of the molecular components involved in the molecular recognition event that underlies the attempt to therapeutically modulate the biological function of a macromolecular target with proven pathophysiological relevance for a disease state. In this context, G protein-coupled receptors (GPCRs) constitute the most prominent family of validated drug targets within biomedical research, since approximately 60 % of approved drugs elicit their therapeutic effects by selectively addressing members of that target family. From a 3D structure point of view, these transmembrane signal transduction systems represent the most challenging task for structure determination, which is due to the heterogeneous and fine-balanced environment conditions that are necessary for structural and functional integrity of the receptor protein. This contribution will address the different concepts to derive structurally relevant information on the transmemebrane seven-helix protein (7TM) domain of GPCRs with special emphasis laid on the multidisciplinarity of the applied methodologies. The current status of electron-cryo-microscopy on 2D crystals and even high-resolution x-ray crystallography on 7TM proteins will be introduced highlighting the transferability of the emerging structural principles onto the GPCR superfamily. Special techniques from bioinformatics and homology-related molecular modeling in combination with tailor-made protein simulation methodologies complement the experimentally derived data, in that they facilitate the 3D structure generation and structure validation process. This contribution summarises the most recent results of GPCR structure studies with the aim to underline the impact of structure data not only for the purpose of rationalising structure-activity data on low-molecular weight antagonists within the context of a protein binding pocket, but also for a better understanding of e.g. mutagenesis experiments, thus qualifying GPCR structure models as valid communication platforms establishing a functional link between molecular biology, biophysics, bioinformatics and organic chemistry in a highly efficient manner.

 

Nakatsuji, H., J. Hasegawa, et al. (2000). "[Excited states and electron transfer in the photosynthetic reaction center of Rhodopseudomonas viridis: SAC-CI study]." Tanpakushitsu Kakusan Koso 45(4): 587-94.

           

Nelson, R. W., D. Nedelkov, et al. (2000). "Biosensor chip mass spectrometry: a chip-based proteomics approach." Electrophoresis 21(6): 1155-63.

            Rapid advances in genomic sequencing, bioinformatics, and analytical instrumentation have created the field of proteomics, which at present is based largely on two-dimensional electrophoresis (2-DE) separation of complex protein mixtures and identification of individual proteins using mass spectrometry. These analyses provide a wealth of data, which upon further evaluation leads to many questions regarding the structure and function of the proteins. The challenge of answering these questions create a need for high-specificity approaches that may be used in the analysis of biomolecular recognition events and interacting partners, and thereby places great demands on general protein characterization instrumentation and the types of analyses they need to perform. Over the past five years we have been actively involved in interfacing two general, instrumental techniques, surface plasmon resonance-biomolecular interaction analysis (SPR-BIA) and matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry, into a single concerted approach for use in the functional and structural characterization of proteins. Reviewed here is the recent progress made using biomolecular interaction analysis - mass spectrometry (BIA-MS) in the detailed characterization of proteins and protein-protein interactions and the development of biosensor chip mass spectrometry (BCMS) as a new chip-based proteomics approach.

 

Nilsson, C. L. and P. Davidsson (2000). "New separation tools for comprehensive studies of protein expression by mass spectrometry." Mass Spectrom Rev 19(6): 390-7.

            Mass spectrometry has emerged as a core technique for protein identification and characterization because of its high sensitivity, accuracy, and speed of analysis. The most widespread strategy for studying global protein expression in biological systems employs analytical two-dimensional polyacrylamide gel electrophoresis (2D PAGE) followed by enzymatic degradation of isolated protein spots, peptide mapping, and bioinformatics searches. Using this method, thousands of proteins can be resolved in a gel and their expression quantified. However, certain types of proteins possessing important cellular functions are not easily analyzed using this strategy. These proteins include membrane, low copy number, highly basic, and very large (> 150 kDa) and small (< 10 kDa) proteins. To meet the growing need to simultaneously monitor all types of proteins in a biological system, new separation strategies have emerged that are amenable to hyphenation to mass spectrometric techniques. This article will review these new techniques and examine their usefulness in studies of protein expression.

 

Nylund, S. and M. Sibakov (2000). "[Will genes become an engine to the industry?]." Duodecim 116(16): 1763-8.

           

Ohlstein, E. H., R. R. Ruffolo, Jr., et al. (2000). "Drug discovery in the next millennium." Annu Rev Pharmacol Toxicol 40: 177-91.

            Selection and validation of novel molecular targets have become of paramount importance in light of the plethora of new potential therapeutic drug targets that have emerged from human gene sequencing. In response to this revolution within the pharmaceutical industry, the development of high-throughput methods in both biology and chemistry has been necessitated. This review addresses these technological advances as well as several new areas that have been created by necessity to deal with this new paradigm, such as bioinformatics, cheminformatics, and functional genomics. With many of these key components of future drug discovery now in place, it is possible to map out a critical path for this process that will be used into the new millennium.

 

Palotie, L. (2000). "[Where is the genome project leading to?]." Duodecim 116(16): 1731-3.

           

Pang, C. P., L. Baum, et al. (2000). "Hunting for disease genes in multi-functional diseases." Clin Chem Lab Med 38(9): 819-25.

            Disease genes may be identified through functional, positional, and candidate gene approaches. Although extensive and often labor-intensive studies such as family linkage analysis, functional investigation of gene products and genome database searches are usually involved, thousands of human disease genes, especially for monogenic diseases with Mendelian transmission, have been identified. However, in diseases caused by more than one gene, or by a combination of genetic and environmental factors, identification of the genes is even more difficult. Common examples include atherosclerosis, cancer, Alzheimer's disease, asthma, diabetes, glaucoma, and age-related macular degeneration. There have been conflicting reports on the roles of associated genes. Even with population-based case-control studies and new statistical methods such as the sib-ship disequilibrium test and the discordant alleles test, there is no agreement on whether alpha2-macroglobulin (A2M) is a gene for Alzheimer's disease. Another example is the inconsistent association between age-related macular degeneration and ATP-binding cassette transporter (ABCR). Ethnic variation causes further complications. In our investigation of LDL-receptor variants in familial hypercholesterolemia, and the trabecular meshwork inducible glucocorticoid response protein, or myocillin (TIGR-MYOC) mutation pattern in primary open angle glaucoma, we did find dissimilar results in Chinese compared to Caucasians. New information from the Human Genome Project and advancements in technologies will aid the search for and confirm identification of disease genes despite such challenges.

 

Persson, B. (2000). "Bioinformatics in protein analysis." Exs 88: 215-31.

            The chapter gives an overview of bioinformatic techniques of importance in protein analysis. These include database searches, sequence comparisons and structural predictions. Links to useful World Wide Web (WWW) pages are given in relation to each topic. Databases with biological information are reviewed with emphasis on databases for nucleotide sequences (EMBL, GenBank, DDBJ), genomes, amino acid sequences (Swissprot, PIR, TrEMBL, GenePept), and three-dimensional structures (PDB). Integrated user interfaces for databases (SRS and Entrez) are described. An introduction to databases of sequence patterns and protein families is also given (Prosite, Pfam, Blocks). Furthermore, the chapter describes the widespread methods for sequence comparisons, FASTA and BLAST, and the corresponding WWW services. The techniques involving multiple sequence alignments are also reviewed: alignment creation with the Clustal programs, phylogenetic tree calculation with the Clustal or Phylip packages and tree display using Drawtree, njplot or phylo_win. Finally, the chapter also treats the issue of structural prediction. Different methods for secondary structure predictions are described (Chou-Fasman, Garnier-Osguthorpe-Robson, Predator, PHD). Techniques for predicting membrane proteins, antigenic sites and postranslational modifications are also reviewed.

 

Pesole, G., G. Grillo, et al. (2000). "The untranslated regions of eukaryotic mRNAs: structure, function, evolution and bioinformatic tools for their analysis." Brief Bioinform 1(3): 236-49.

            The crucial role of the non-coding portion of genomes is now widely acknowledged. In particular, mRNA untranslated regions are involved in many post-transcriptional regulatory pathways that control mRNA localisation, stability and translation efficiency. A review is given of the most recent research works on the functional characterisation of eukaryotic mRNA untranslated regions. In order to make possible a systematic and detailed sequence analysis of mRNA untranslated regions (UTRs), a non-redundant database of metazoan mRNA untranslated sequences annotated for the occurrence of specific functional elements, UTRdb, was devised.These elements, whose consensus structure has been devised on the basis of experimental assays and of comparative analyses, have been collected in the UTRsite database. A suitable pattern-matching software has been devised to search UTRsite patterns in user-submitted sequences, also assessing their statistical significance. Structural, compositional and evolutionary features of untranslated sequences of metazoan mRNAs have been investigated showing peculiar intra- and interspecific patterns.

 

Pitcher, D. G. and N. K. Fry (2000). "Molecular techniques for the detection and identification of new bacterial pathogens." J Infect 40(2): 116-20.

           

Pollock, D. D., J. A. Eisen, et al. (2000). "A case for evolutionary genomics and the comprehensive examination of sequence biodiversity." Mol Biol Evol 17(12): 1776-88.

            Comparative analysis is one of the most powerful methods available for understanding the diverse and complex systems found in biology, but it is often limited by a lack of comprehensive taxonomic sampling. Despite the recent development of powerful genome technologies capable of producing sequence data in large quantities (witness the recently completed first draft of the human genome), there has been relatively little change in how evolutionary studies are conducted. The application of genomic methods to evolutionary biology is a challenge, in part because gene segments from different organisms are manipulated separately, requiring individual purification, cloning, and sequencing. We suggest that a feasible approach to collecting genome-scale data sets for evolutionary biology (i.e., evolutionary genomics) may consist of combination of DNA samples prior to cloning and sequencing, followed by computational reconstruction of the original sequences. This approach will allow the full benefit of automated protocols developed by genome projects to be realized; taxon sampling levels can easily increase to thousands for targeted genomes and genomic regions. Sequence diversity at this level will dramatically improve the quality and accuracy of phylogenetic inference, as well as the accuracy and resolution of comparative evolutionary studies. In particular, it will be possible to make accurate estimates of normal evolution in the context of constant structural and functional constraints (i.e., site-specific substitution probabilities), along with accurate estimates of changes in evolutionary patterns, including pairwise coevolution between sites, adaptive bursts, and changes in selective constraints. These estimates can then be used to understand and predict the effects of protein structure and function on sequence evolution and to predict unknown details of protein structure, function, and functional divergence. In order to demonstrate the practicality of these ideas and the potential benefit for functional genomic analysis, we describe a pilot project we are conducting to simultaneously sequence large numbers of vertebrate mitochondrial genomes.

 

Reed, M. A. and J. M. Tour (2000). "Computing with molecules." Sci Am 282(6): 86-93.

           

Rew, D. A. (2000). "Modelling in surgical oncology--part III: massive data sets and complex systems." Eur J Surg Oncol 26(8): 805-9.

            Human tumours are complex and unstable biological systems. New intellectual and mathematical approaches together with massive computing power are transforming our capacity to model and investigate such complexity. Computers also allow massive data sets to be collated and analysed. Such sets include the medical and epidemiological records of entire populations; the entire genetic code of the human being and of other species, including parasites and disease vectors; and the genotype of each and every individual. Massive data sets take us into new dimensions of complexity for which simple linear mathematics are insufficient. The analysis of the grades of complexity which determine protein and cell construction, cell to cell interactions within tissues and organs, the morphogenesis of entire organisms and population interactions with disease vectors require the sophisticated mathematical tools of non-linear analysis, neural networks, chaos and complexity theory. The capacity for closer representations of reality through powerful computational models also allows us to look afresh at the generalizations of conventional statistics. Within this computational cauldron, we may also find help in the better understanding of oncogenesis and cancer therapy. This paper, the third in our series on modelling in tumour biology, considers the breadth of opportunity and challenge at the interface between cell biology and biomathematics.

 

Rieger, P. T. (2000). "The gene genies." Am J Nurs 100(10): 87-90.

           

Rigoutsos, I., A. Floratos, et al. (2000). "The emergence of pattern discovery techniques in computational biology." Metab Eng 2(3): 159-77.

            In the past few years, pattern discovery has been emerging as a generic tool of choice for tackling problems from the computational biology domain. In this presentation, and after defining the problem in its generality, we review some of the algorithms that have appeared in the literature and describe several applications of pattern discovery to problems from computational biology.