Home   About Us   eMedicine Search   Drug Development   Feedback   Google Scholar Search   Intranet 
Literature Database   News   Photo Gallery   Publications   Site Map   Site Search   Useful Links 
 

 Back to  Bioinformatics

Enhanced by Neuroinformation

Bioinformatics References: 2003

(229 References)

Ackerman, C. J., M. M. Harnett, et al. (2003). "19 A Solution Structure of the Filarial Nematode Immunomodulatory Protein, ES-62." Biophys J 84(1): 489-500.

            ES-62, a protein secreted by filarial nematodes, parasites of vertebrates including humans, has an unusual posttranslational covalent addition of phosphorylcholine to an N-type glycan. Studies on ES-62 from the rodent parasite Acanthocheilonema viteae ascribe it a dominant role in ensuring parasite survival by modulating the host immune system. Understanding this immunomodulation at the molecular level awaits full elucidation but distinct components of ES-62 may participate: the protein contributes aminopeptidase-like activity whereas the phosphorylcholine is thought to act as a signal transducer. We have used biophysical and bioinformatics-based structure prediction methods to define a low-resolution model of ES-62. Sedimentation equilibrium showed that ES-62 is a tightly bound tetramer. The sedimentation coefficient is consistent with this oligomer and the overall molecular shape revealed by small angle x-ray scattering. A 19 A model for ES-62 was restored from the small-angle x-ray scattering data using the program DAMMIN which uses simulated annealing to find a configuration of densely packed scattering elements consistent with the experimental scattering curve. Analysis of the primary sequence with the position-specific iterated basic local alignment search tool, PSI-BLAST, identified six closely homologous proteins, five of which are peptidases, consistent with observed aminopeptidase activity in ES-62. Differences between the secondary structure content of ES-62 predicted using the consensus output from the secondary structure prediction server JPRED and measured using circular dichroism are discussed in relation to multimeric glycosylated proteins. This study represents the first attempt to understand the multifunctional properties of this important parasite-derived molecule by studying its structure.

 

Adams, M. W., H. A. Dailey, et al. (2003). "The southeast collaboratory for structural genomics: a high-throughput gene to structure factory." Acc Chem Res 36(3): 191-8.

            The Southeast Collaboratory for Structural Genomics consists of four working groups. The protein production group supplies/develops high-output production of Pyrococcus furiosus, Caenorhabditis elegans, and selected human proteins. The X-ray crystallography group conducts high-throughput structure production in parallel with production-related research/development in nanocrystallization robotics, capillary crystallization cassette, synchrotron/home X-ray instrumentation, sample mounting robotics, data processing and pipelined structure analysis, combined refinement/validation protocols, and direct use of unlabeled native crystals (Direct Crystallography). The NMR group emphasizes/develops sample screening and backbone structure determination from residual dipolar coupling data. The bioinformatics group implements/develops local database interfaces, pipelined sequence/structure information search/updates, and database/bioinformatics toolkits.

 

Adeli, H., Z. Zhou, et al. (2003). "Analysis of EEG records in an epileptic patient using wavelet transform." J Neurosci Methods 123(1): 69-87.

            About 1% of the people in the world suffer from epilepsy and 30% of epileptics are not helped by medication. Careful analyses of the electroencephalograph (EEG) records can provide valuable insight and improved understanding of the mechanisms causing epileptic disorders. Wavelet transform is particularly effective for representing various aspects of non-stationary signals such as trends, discontinuities, and repeated patterns where other signal processing approaches fail or are not as effective. In this research, discrete Daubechies and harmonic wavelets are investigated for analysis of epileptic EEG records. Wavelet transform is used to analyze and characterize epileptiform discharges in the form of 3-Hz spike and wave complex in patients with absence seizure. Through wavelet decomposition of the EEG records, transient features are accurately captured and localized in both time and frequency context. The capability of this mathematical microscope to analyze different scales of neural rhythms is shown to be a powerful tool for investigating small-scale oscillations of the brain signals. Wavelet analyses of EEGs obtained from a population of patients can potentially suggest the physiological processes undergoing in the brain in epilepsy onset. A better understanding of the dynamics of the human brain through EEG analysis can be obtained through further analysis of such EEG records.

 

Aerts, S., G. Thijs, et al. (2003). "Toucan: deciphering the cis-regulatory logic of coregulated genes." Nucleic Acids Res 31(6): 1753-64.

            TOUCAN is a Java application for the rapid discovery of significant cis-regulatory elements from sets of coexpressed or coregulated genes. Biologists can automatically (i) retrieve genes and intergenic regions, (ii) identify putative regulatory regions, (iii) score sequences for known transcription factor binding sites, (iv) identify candidate motifs for unknown binding sites, and (v) detect those statistically over-represented sites that are characteristic for a gene set. Genes or intergenic regions are retrieved from Ensembl or EMBL, together with orthologs and supporting information. Orthologs are aligned and syntenic regions are selected as candidate regulatory regions. Putative sites for known transcription factors are detected using our MotifScanner, which scores position weight matrices using a probabilistic model. New motifs are detected using our MotifSampler based on Gibbs sampling. Binding sites characteristic for a gene set--and thus statistically over-represented with respect to a reference sequence set--are found using a binomial test. We have validated Toucan by analyzing muscle-specific genes, liver-specific genes and E2F target genes; we have easily detected many known binding sites within intergenic DNA and identified new biologically plausible sites for known and unknown transcription factors. Software available at http://www.esat.kuleuven.ac. be/ approximately dna/BioI/Software.html.

 

Alexandersson, M., S. Cawley, et al. (2003). "SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model." Genome Res 13(3): 496-502.

            Comparative-based gene recognition is driven by the principle that conserved regions between related organisms are more likely than divergent regions to be coding. We describe a probabilistic framework for gene structure and alignment that can be used to simultaneously find both the gene structure and alignment of two syntenic genomic regions. A key feature of the method is the ability to enhance gene predictions by finding the best alignment between two syntenic sequences, while at the same time finding biologically meaningful alignments that preserve the correspondence between coding exons. Our probabilistic framework is the generalized pair hidden Markov model, a hybrid of (1). generalized hidden Markov models, which have been used previously for gene finding, and (2). pair hidden Markov models, which have applications to sequence alignment. We have built a gene finding and alignment program called SLAM, which aligns and identifies complete exon/intron structures of genes in two related but unannotated sequences of DNA. SLAM is able to reliably predict gene structures for any suitably related pair of organisms, most notably with fewer false-positive predictions compared to previous methods (examples are provided for Homo sapiens/Mus musculus and Plasmodium falciparum/Plasmodium vivax comparisons). Accuracy is obtained by distinguishing conserved noncoding sequence (CNS) from conserved coding sequence. CNS annotation is a novel feature of SLAM and may be useful for the annotation of UTRs, regulatory elements, and other noncoding features.

 

Alexov, E. (2003). "Role of the protein side-chain fluctuations on the strength of pair-wise electrostatic interactions: comparing experimental with computed pK(a)s." Proteins 50(1): 94-103.

            The effect of the protein side-chain fluctuations on the strength of electrostatic interactions was studied. The effect was modeled on 7 different crystal structures on the same enzyme as well as on 20 molecular dynamics snapshot structures. It was shown that the side-chain flexibility affects predominantly the magnitude of the strong pair-wise interactions, that is, the pair-wise interaction among ion pairs, and practically does not affect the interactions with the rest of the protein. This was used to suggest a correction function that should be applied to the original pair-wise electrostatic interaction to mimic the effects of the fluctuations. The procedure is applied on three ion pairs identified in lysozyme. It was shown that sampling different side-chain rotamers and modifying the strength of the pair-wise interaction energies makes calculated pK(a)s less sensitive to the fluctuations of the structure and improves the prediction accuracy.

 

Altman, R. B. and J. M. Dugan (2003). "Defining bioinformatics and structural bioinformatics." Methods Biochem Anal 44: 3-14.

           

Arakawa, K., K. Mori, et al. (2003). "G-language Genome Analysis Environment: a workbench for nucleotide sequence data mining." Bioinformatics 19(2): 305-6.

            Summary: G-language Genome Analysis Environment (G-language GAE) is an open source generic software package aimed for higher efficiency in bioinformatics analysis. G-language GAE has an interface as a set of Perl libraries for software development, and a graphical user interface for easy manipulation. Both Windows and Linux versions are available. Availability: From http://www.g-language.org/ under GNU General Public License. CD-ROMs are distributed freely in major conferences. Contact: info@g-language.org

 

Archakov, A. I., V. M. Govorun, et al. (2003). "Protein-protein interactions as a target for drugs in proteomics." Proteomics 3(4): 380-91.

            Protein-protein interactions play a central role in numerous processes in the cell and are one of the main fields of functional proteomics. This review highlights the methods of bioinformatics and functional proteomics of protein-protein interaction investigation. The structures and properties of contact surfaces, forces involved in protein-protein interactions, kinetic and thermodynamic parameters of these reactions were considered. The properties of protein contact surfaces depend on their functions. The contact surfaces of permanent complexes resemble domain contacts or the protein core and it is reasonable to consider such complex formation as a continuation of protein folding. Characteristics of contact surfaces of temporary protein complexes share some similarities with active sites of enzymes. The contact surfaces of the temporary protein complexes have unique structure and properties and they are more conservative in comparison with active site of enzymes. So they represent prospective targets for a new generation of drugs. During the last decade, numerous investigations were undertaken to find or design small molecules that block protein dimerization or protein(peptide)-receptor interaction, or, on the contrary, to induce protein dimerization.

 

Armitage, J. P., C. J. Dorman, et al. (2003). "Thinking and decision making, bacterial style: Bacterial Neural Networks, Obernai, France, 7th-12th June 2002." Mol Microbiol 47(2): 583-93.

            Bacteria exhibit a bewildering range of behavioural responses and permutations of metabolic pathways for maximum exploitation of their environment. These are based on sensory perception of external and internal signals through batteries of surface and cytoplasmic receptors, evaluation of complex information flows and rapid decision making. Appreciation of the diversity of bacterial behaviour and adaptation capacities requires the study of a broad range of organisms and at this meeting we sampled more than 30 species with new findings which included the nature of gaseous receptors, advances in chemotaxis, subversion of host defences by pathogens, adaptation to high salt, community life and its obvious benefits, cell to cell communications and even the nature of bacterial circadian rhythms. With around 80 bacterial genomes now completed, and many more almost there, it was appropriate to complete the meeting with an introduction to Systems Biology and prospects for simulating the virtual cell. The versatility and seemingly 'intelligent' behaviour of bacteria will continue to fascinate, and this meeting on Bacterial Neural Networks fully reflected the excitement of this field.

 

Arnosti, D. N. (2003). "Analysis and function of transcriptional regulatory elements: insights from Drosophila." Annu Rev Entomol 48: 579-602.

            Analysis of gene expression is assuming an increasingly important role in elucidating the molecular basis of insect biology. Transcriptional regulation of gene expression is directed by a variety of cis-acting DNA elements that control spatial and temporal patterns of expression. This review summarizes current knowledge about properties of transcriptional regulatory elements, based largely on research in Drosophila melanogaster, and outlines ways that new technologies are providing tools to facilitate the study of transcriptional regulatory elements in other insects.

 

Azuaje, F. (2003). "Genomic data sampling and its effect on classification performance assessment." BMC Bioinformatics 4(1): 5.

            BACKGROUND: Supervised classification is fundamental in bioinformatics. Machine learning models, such as neural networks, have been applied to discover genes and expression patterns. This process is achieved by implementing training and test phases. In the training phase, a set of cases and their respective labels are used to build a classifier. During testing, the classifier is used to predict new cases. One approach to assessing its predictive quality is to estimate its accuracy during the test phase. Key limitations appear when dealing with small-data samples. This paper investigates the effect of data sampling techniques on the assessment of neural network classifiers. RESULTS: Three data sampling techniques were studied: Cross-validation, leave-one-out, and bootstrap. These methods are designed to reduce the bias and variance of small-sample estimations. Two prediction problems based on small-sample sets were considered: Classification of microarray data originating from a leukemia study and from small, round blue-cell tumours. A third problem, the prediction of splice-junctions, was analysed to perform comparisons. Different accuracy estimations were produced for each problem. The variations are accentuated in the small-data samples. The quality of the estimates depends on the number of train-test experiments and the amount of data used for training the networks. CONCLUSION: The predictive quality assessment of biomolecular data classifiers depends on the data size, sampling techniques and the number of train-test experiments. Conservative and optimistic accuracy estimations can be obtained by applying different methods. Guidelines are suggested to select a sampling technique according to the complexity of the prediction problem under consideration.

 

Bahl, A., B. Brunk, et al. (2003). "PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data." Nucleic Acids Res 31(1): 212-5.

            PlasmoDB (http://PlasmoDB.org) is the official database of the Plasmodium falciparum genome sequencing consortium. This resource incorporates the recently completed P. falciparum genome sequence and annotation, as well as draft sequence and annotation emerging from other Plasmodium sequencing projects. PlasmoDB currently houses information from five parasite species and provides tools for intra- and inter-species comparisons. Sequence information is integrated with other genomic-scale data emerging from the Plasmodium research community, including gene expression analysis from EST, SAGE and microarray projects and proteomics studies. The relational schema used to build PlasmoDB, GUS (Genomics Unified Schema) employs a highly structured format to accommodate the diverse data types generated by sequence and expression projects. A variety of tools allow researchers to formulate complex, biologically-based, queries of the database. A stand-alone version of the database is also available on CD-ROM (P. falciparum GenePlot), facilitating access to the data in situations where internet access is difficult (e.g. by malaria researchers working in the field). The goal of PlasmoDB is to facilitate utilization of the vast quantities of genomic-scale data produced by the global malaria research community. The software used to develop PlasmoDB has been used to create a second Apicomplexan parasite genome database, ToxoDB (http://ToxoDB.org).

 

Balsera, M., J. B. Arellano, et al. (2003). "Structural analysis of the PsbQ protein of photosystem II by Fourier transform infrared and circular dichroic spectroscopy and by bioinformatic methods." Biochemistry 42(4): 1000-7.

            The structure of PsbQ, one of the three main extrinsic proteins associated with the oxygen-evolving complex (OEC) of higher plants and green algae, is examined by Fourier transform infrared (FTIR) and circular dichroic (CD) spectroscopy and by computational structural prediction methods. This protein, together with two other lumenally bound extrinsic proteins, PsbO and PsbP, is essential for the stability and full activity of the OEC in plants. The FTIR spectra obtained in both H(2)O and D(2)O suggest a mainly alpha-helix structure on the basis of the relative areas of the constituents of the amide I and I' bands. The FTIR quantitative analyses indicate that PsbQ contains about 53% alpha-helix, 7% turns, 14% nonordered structure, and 24% beta-strand plus other beta-type extended structures. CD analyses indicate that PsbQ is a mainly alpha-helix protein (about 64%), presenting a small percentage assigned to beta-strand ( approximately 7%) and a larger amount assigned to turns and nonregular structures ( approximately 29%). Independent of the spectroscopic analyses, computational methods for protein structure prediction of PsbQ were utilized. First, a multiple alignment of 12 sequences of PsbQ was obtained after an extensive search in the public databases for protein and EST sequences. Based on this alignment, computational prediction of the secondary structure and the solvent accessibility suggest the presence of two different structural domains in PsbQ: a major C-terminal domain containing four alpha-helices and a minor N-terminal domain with a poorly defined secondary structure enriched in proline and glycine residues. The search for PsbQ analogues by fold recognition methods, not based on the secondary structure, also indicates that PsbQ is a four alpha-helix protein, most probably folding as an up-down bundle. The results obtained by both the spectroscopic and computational methods are in agreement, all indicating that PsbQ is mainly an alpha protein, and show the value of using both methodologies for protein structure investigation.

 

Batista, A. P. (2003). "A computational basis to object?" Neuron 37(2): 189-90.

            To use an object, we must be able to perceive the spatial relationship between the object's parts. The accepted view of how the brain coherently encodes an object is that some neurons in the frontal cortex employ an object-centered coordinate frame. A new computational model challenges this view, using the rich conceptual framework of neural basis functions.

 

Bekaert, M., L. Bidou, et al. (2003). "Towards a computational model for -1 eukaryotic frameshifting sites." Bioinformatics 19(3): 327-35.

            Motivation: Unconventional decoding events are now well acknowledged, but not yet well formalized. In this study, we present a bioinformatics analysis of eukaryotic -1 frameshifting, in order to model this event. Results: A consensus model has already been established for -1 frameshifting sites. Our purpose here is to provide new constraints which make the model more precise. We show how a machine learning approach can be used to refine the current model. We identify new properties that may be involved in frameshifting. Each of the properties found was experimentally validated. Initially, we identify features of the overall model that are to be simultaneously satisfied. We then focus on the following two components: the spacer and the slippery sequence. As a main result, we point out that the identity of the primary structure of the so-called spacer is of great importance. Availability: Sequences of the oligonucleotides in the functional tests are available at http://www.igmors.u-psud.fr/rousset/bioinformatics/ Contact: bekaert@igmors.u-psud.fr jpforest@lri.fr chris@lri.fr

 

Ben Miled, Z., Y. Liu, et al. (2003). "An efficient implementation of a drug candidate database." J Chem Inf Comput Sci 43(1): 25-35.

            The recent advances in laboratory technologies have resulted in a wealth of chemical and biological data. The rapid proliferation of a vast amount of data has led to a set of cheminformatics and bioinformatics applications that manipulate dynamic, heterogeneous, and massive data. An example of such application in the pharmaceutical industry is the computational process involved in the early discovery of lead drug candidates for a given target disease. In this paper, an efficient implementation of a drug candidate database is presented and evaluated. This study shows that high performance data access can be achieved through proper choices of data representation, database schema design, and parallel processing techniques.

 

Berendsen, H. J. (2003). "Inter-union bioinformatics group report." Acta Crystallogr D Biol Crystallogr 59(Pt 4): 777-82.

           

Bertini, I. and A. Rosato (2003). "Bioinorganic Chemistry Special Feature: Bioinorganic chemistry in the postgenomic era." Proc Natl Acad Sci U S A 100(7): 3601-4.

            Genome sequencing has revolutionized all fields of life sciences. Bioinorganic chemistry is certainly not immune to this influence, which is presenting unprecedented challenges. A new goal for bioinorganic chemistry is the investigation of the linkages between inorganic elements and genomic information. This requires new advancements andor the development of new expertise in fields such as bioinformatics and genetics but also provides a driving force to push forward the exploitation of traditional analytical techniques and spectroscopic tools. The "case study" of metal homeostasis in cells is discussed to provide a flavor of the current evolution of the field.

 

Bissantz, C., P. Bernard, et al. (2003). "Protein-based virtual screening of chemical databases. II. Are homology models of G-Protein Coupled Receptors suitable targets?" Proteins 50(1): 5-25.

            The aim of the current study is to investigate whether homology models of G-Protein-Coupled Receptors (GPCRs) that are based on bovine rhodopsin are reliable enough to be used for virtual screening of chemical databases. Starting from the recently described 2.8 A-resolution X-ray structure of bovine rhodopsin, homology models of an "antagonist-bound" form of three human GPCRs (dopamine D3 receptor, muscarinic M1 receptor, vasopressin V1a receptor) were constructed. The homology models were used to screen three-dimensional databases using three different docking programs (Dock, FlexX, Gold) in combination with seven scoring functions (ChemScore, Dock, FlexX, Fresno, Gold, Pmf, Score). Rhodopsin-based homology models turned out to be suitable, indeed, for virtual screening since known antagonists seeded in the test databases could be distinguished from randomly chosen molecules. However, such models are not accurate enough for retrieving known agonists. To generate receptor models better suited for agonist screening, we developed a new knowledge- and pharmacophore-based modeling procedure that might partly simulate the conformational changes occurring in the active site during receptor activation. Receptor coordinates generated by this new procedure are now suitable for agonist screening. We thus propose two alternative strategies for the virtual screening of GPCR ligands, relying on a different set of receptor coordinates (antagonist-bound and agonist-bound states).

 

Black, C. G., L. Wang, et al. (2003). "Apical location of a novel EGF-like domain-containing protein of Plasmodium falciparum." Mol Biochem Parasitol 127(1): 59-68.

            Using bioinformatics analyses of the unfinished malaria genome sequence, we have identified a novel protein of Plasmodium falciparum that contains two epidermal growth factor (EGF)-like domains near the C-terminus of the protein. The sequence contains a single open reading frame of 1572bp with the potential to encode a protein of 524 residues containing hydrophobic regions at the extreme N- and C-termini which appear to represent signal peptide and glycosylphosphatidylinositol (GPI)-attachment sites, respectively. RT-PCR analysis has confirmed that the novel gene is transcribed in asexual stages of P. falciparum. Antibodies to the EGF-like domains of the novel protein are highly specific and do not cross-react with the EGF-like domains of MSP1, MSP4, MSP5 or MSP8 expressed as GST fusion proteins. Antisera to the C-terminal fragments react with two bands of 80 and 36kDa in P. falciparum parasite lysates whereas antisera to the most N-terminal fusion protein only recognises the 80kDa band, suggesting that the novel protein may undergo processing in a similar way to MSP1 and MSP8, but with fewer cleavage events. Immunoblot analysis of stage-specific parasite samples reveals that the protein is present in trophozoites, schizonts and in isolated merozoites. The protein partitions in the detergent-enriched phase after Triton X-114 fractionation and is localised to the surfaces of trophozoites, schizonts and free merozoites in an apical distribution. Based on the accepted nomenclature in the field we now designate this protein MSP10. We have shown that the MSP10 fusion proteins are in a conformation that can be recognised by human immune sera and that there is very limited sequence diversity in an approximately lkb region of MSP10, encompassing the two EGF-like domains. A sequence similar to MSP10 can be identified in the available P. yoelii genomic sequence, offering the possibility of ascertaining whether this novel protein can induce host protective responses in an in vivo model.

 

Blake, J. A., J. E. Richardson, et al. (2003). "MGD: the Mouse Genome Database." Nucleic Acids Res 31(1): 193-5.

            The Mouse Genome Database (MGD) (http://www.informatics.jax.org) one component of a community database resource for the laboratory mouse, a key model organism for interpreting the human genome and for understanding human biology. MGD strives to provide an extensively integrated information resource with experimental details annotated from both literature and on-line genomic data sources. MGD curates and presents the consensus representation of genotype (sequence) to phenotype information including highly detailed information about genes and gene products. Primary foci of integration are through representations of relationships between genes, sequences and phenotypes. MGD collaborates with other bioinformatics groups to curate a definitive set of information about the laboratory mouse. Recent developments include a general implementation of database structures for controlled vocabularies and the integration of a phenotype classification system.

 

Blanc, G., K. Hokamp, et al. (2003). "A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome." Genome Res 13(2): 137-44.

            The Arabidopsis genome contains numerous large duplicated chromosomal segments, but the different approaches used in previous analyses led to different interpretations regarding the number and timing of ancestral large-scale duplication events. Here, using more appropriate methodology and a more recent version of the genome sequence annotation, we investigate the scale and timing of segmental duplications in Arabidopsis. We used protein sequence similarity searches to detect duplicated blocks in the genome, used the level of synonymous substitution between duplicated genes to estimate the relative ages of the blocks containing them, and analyzed the degree of overlap between adjacent duplicated blocks. We conclude that the Arabidopsis lineage underwent at least two distinct episodes of duplication. One was a polyploidy that occurred much more recently than estimated previously, before the Arabidopsis/Brassica rapa split and probably during the early emergence of the crucifer family (24-40 Mya). An older set of duplicated blocks was formed after the monocot/dicot divergence, and the relatively low level of overlap among these blocks indicates that at least some of them are remnants of a larger duplication such as a polyploidy or aneuploidy.

 

Bock, J. R. and D. A. Gough (2003). "Whole-proteome interaction mining." Bioinformatics 19(1): 125-34.

            Motivation: A major post-genomic scientific and technological pursuit is to describe the functions performed by the proteins encoded by the genome. One strategy is to first identify the protein-protein interactions in a proteome, then determine pathways and overall structure relating these interactions, and finally to statistically infer functional roles of individual proteins. Although huge amounts of genomic data are at hand, current experimental protein interaction assays must overcome technical problems to scale-up for high-throughput analysis. In the meantime, bioinformatics approaches may help bridge the information gap required for inference of protein function. In this paper, a previously described data mining approach to prediction of protein-protein interactions (Bock and Gough, 2001, Bioinformatics, 17, 455-460) is extended to interaction mining on a proteome-wide scale. An algorithm (the phylogenetic bootstrap) is introduced, which suggests traversal of a phenogram, interleaving rounds of computation and experiment, to develop a knowledge base of protein interactions in genetically-similar organisms. Results: The interaction mining approach was demonstrated by building a learning system based on 1,039 experimentally validated protein-protein interactions in the human gastric bacterium Helicobacter pylori. An estimate of the generalization performance of the classifier was derived from 10-fold cross-validation, which indicated expected upper bounds on precision of 80% and sensitivity of 69% when applied to related organisms. One such organism is the enteric pathogen Campylobacter jejuni, in which comprehensive machine learning prediction of all possible pairwise protein-protein interactions was performed. The resulting network of interactions shares an average protein connectivity characteristic in common with previous investigations reported in the literature, offering strong evidence supporting the biological feasibility of the hypothesized map. For inferences about complete proteomes in which the number of pairwise non-interactions is expected to be much larger than the number of actual interactions, we anticipate that the sensitivity will remain the same but precision may decrease. We present specific biological examples of two subnetworks of protein-protein interactions in C. jejuni resulting from the application of this approach, including elements of a two-component signal transduction systems for thermoregulation, and a ferritin uptake network. Contact: dgough@bioeng.ucsd.edu

 

Boffelli, D., J. McAuliffe, et al. (2003). "Phylogenetic shadowing of primate sequences to find functional regions of the human genome." Science 299(5611): 1391-4.

            Nonhuman primates represent the most relevant model organisms to understand the biology of Homo sapiens. The recent divergence and associated overall sequence conservation between individual members of this taxon have nonetheless largely precluded the use of primates in comparative sequence studies. We used sequence comparisons of an extensive set of Old World and New World monkeys and hominoids to identify functional regions in the human genome. Analysis of these data enabled the discovery of primate-specific gene regulatory elements and the demarcation of the exons of multiple genes. Much of the information content of the comprehensive primate sequence comparisons could be captured with a small subset of phylogenetically close primates. These results demonstrate the utility of intraprimate sequence comparisons to discover common mammalian as well as primate-specific functional elements in the human genome, which are unattainable through the evaluation of more evolutionarily distant species.

 

Bourne, P. E. (2003). "Free access to publicly funded databases is vital." Nature 421(6925): 786.

           

Boutselakis, H., D. Dimitropoulos, et al. (2003). "E-MSD: the European Bioinformatics Institute Macromolecular Structure Database." Nucleic Acids Res 31(1): 458-62.

            The E-MSD macromolecular structure relational database (http://www.ebi.ac.uk/msd) is designed to be a single access point for protein and nucleic acid structures and related information. The database is derived from Protein Data Bank (PDB) entries. Relational database technologies are used in a comprehensive cleaning procedure to ensure data uniformity across the whole archive. The search database contains an extensive set of derived properties, goodness-of-fit indicators, and links to other EBI databases including InterPro, GO, and SWISS-PROT, together with links to SCOP, CATH, PFAM and PROSITE. A generic search interface is available, coupled with a fast secondary structure domain search tool.

 

Brazma, A., H. Parkinson, et al. (2003). "ArrayExpress--a public repository for microarray gene expression data at the EBI." Nucleic Acids Res 31(1): 68-71.

            ArrayExpress is a new public database of microarray gene expression data at the EBI, which is a generic gene expression database designed to hold data from all microarray platforms. ArrayExpress uses the annotation standard Minimum Information About a Microarray Experiment (MIAME) and the associated XML data exchange format Microarray Gene Expression Markup Language (MAGE-ML) and it is designed to store well annotated data in a structured way. The ArrayExpress infrastructure consists of the database itself, data submissions in MAGE-ML format or via an online submission tool MIAMExpress, online database query interface, and the Expression Profiler online analysis tool. ArrayExpress accepts three types of submission, arrays, experiments and protocols, each of these is assigned an accession number. Help on data submission and annotation is provided by the curation team. The database can be queried on parameters such as author, laboratory, organism, experiment or array types. With an increasing number of organisations adopting MAGE-ML standard, the volume of submissions to ArrayExpress is increasing rapidly. The database can be accessed at http://www.ebi.ac.uk/arrayexpress.

 

Brezillon, S., V. Lannoy, et al. (2003). "Identification of natural ligands for the orphan G protein-coupled receptors GPR7 and GPR8." J Biol Chem 278(2): 776-83.

            GPR7 and GPR8 are two structurally related orphan G protein-coupled receptors, presenting high similarities with opioid and somatostatin receptors. Two peptides, L8 and L8C, derived from a larger precursor, were recently described as natural ligands for GPR8 (Mori, M., Shimomura, Y., Harada, M., Kurihara, M., Kitada, C., Asami, T., Matsumoto, Y., Adachi, Y., Watanabe, T., Sugo, T., and Abe, M. (December, 27, 2001) World Patent Cooperation Treaty, Patent Application WO 01/98494A1). L8 is a 23-amino acid peptide, whereas L8C is the same peptide with a C terminus extension of 7 amino acids, running through a dibasic motif of proteolytic processing. Using as a query the amino acid sequence of the L8 peptide, we have identified in DNA databases a human gene predicted to encode related peptides and its mouse ortholog. By analogy with L8 and L8C, two peptides, named L7 and L7C could result from the processing of a 125-amino acid human precursor through the alternative usage of a dibasic amino acid motif. The activity of these four peptides was investigated on GPR7 and GPR8. In binding assays, L7, L7C, L8, and L8C were found to bind with low nanomolar affinities to the GPR7 and GPR8 receptors expressed in Chinese hamster ovary (CHO)-K1 cells. They inhibited forskolin-stimulated cAMP accumulation through a pertussis toxin-sensitive mechanism. The tissue distribution of prepro-L7 (ppL7) and prepro-L8 (ppL8) was investigated by reverse transcription-PCR. Abundant ppL7 transcripts were found throughout the brain as well as in spinal cord, spleen, testis, and placenta; ppL8 transcripts displayed a more restricted distribution in brain, with high levels in substantia nigra, but were more abundant in peripheral tissues. The ppL7 and ppL8 genes therefore encode the precursors of a class of peptide ligands, active on two receptor subtypes, GPR7 and GPR8. The distinct tissue distribution of the receptor and peptide precursors suggest that each ligand and receptor has partially overlapping but also specific roles in this signaling system.

 

Brooksbank, C., E. Camon, et al. (2003). "The European Bioinformatics Institute's data resources." Nucleic Acids Res 31(1): 43-50.

            As the amount of biological data grows, so does the need for biologists to store and access this information in central repositories in a free and unambiguous manner. The European Bioinformatics Institute (EBI) hosts six core databases, which store information on DNA sequences (EMBL-Bank), protein sequences (SWISS-PROT and TrEMBL), protein structure (MSD), whole genomes (Ensembl) and gene expression (ArrayExpress). But just as a cell would be useless if it couldn't transcribe DNA or translate RNA, our resources would be compromised if each existed in isolation. We have therefore developed a range of tools that not only facilitate the deposition and retrieval of biological information, but also allow users to carry out searches that reflect the interconnectedness of biological information. The EBI's databases and tools are all available on our website at www.ebi.ac.uk.

 

Bruins, M. R., S. Kapil, et al. (2003). "Characterization of a small plasmid (pMBCP) from bovine Pseudomonas pickettii that confers cadmium resistance." Ecotoxicol Environ Saf 54(3): 241-248.

            This is the first report of isolation of Pseudomonas pickettii from a normal adult bovine duodenum. This organism was one of several bacteria isolated as part of a study to examine cadmium resistance genes (cad(r)) for use in generating transgenic plants to reclaim cadmium-contaminated soils in Kansas. P. pickettii containing a plasmid of 2.2kb (designated pMBCP) grew in Luria-Bertani broth and agar containing up to 800&mgr;M of cadmium chloride and was resistant to 16 antibiotics. Curing the organism of plasmid revealed that antibiotic resistances were not plasmid-mediated. Low-level cadmium resistance was conferred by the plasmid because uncured organism grew significantly better (P<0.05) at 55&mgr;M compared to cured organism. Both plasmid and chromosomal DNA were probed by DNA-DNA hybridization for the presence of known cadmium resistance genes (cadA, cadC, and cadD from Gram-positive (Staphylococcus aureus), but none were detected. The plasmid had one restriction site each for BamHI, PstI, SmaI, and XhoI; two sites each for HincII, SacI, and SphI; and multiple sites for AluI and XcmI. DNA sequence analyses of the cloned and original plasmids showed a GC content of greater than 60% and no homology to any published sequences in the GenBank, European Bioinformatics Institute, or Japanese Genome Net databases. The DNA sequence is contained in GenBank accession number AF144733. Thus, pMBCP offers low-level cadmium resistance to P. picketttii.

 

Bruschweiler, R. (2003). "Efficient RMSD measures for the comparison of two molecular ensembles. Root-mean-square deviation." Proteins 50(1): 26-34.

            Quantitative measures are presented for comparing the conformations of two molecular ensembles. The measures are based on Kabsch's formula for the root-mean-square deviation (RMSD) and the covariance matrix of atomic positions of isotropically distributed ensembles (IDE). By using a Taylor series expansion, it is shown that the RMSD can be expressed solely in terms of the IDE matrices. A fast approximate method is introduced for the pairwise RMSD determination whose computational cost scales linearly with the number of structures. A similarity measure for two structural ensembles that is based on the trace metric of the differences of powers of the IDE matrices is presented. The measures are illustrated for conformational ensembles generated by a molecular dynamics computer simulation of a partially folded A-state analog of ubiquitin.

 

Buchan, D. W., S. C. Rison, et al. (2003). "Gene3D: structural assignments for the biologist and bioinformaticist alike." Nucleic Acids Res 31(1): 469-73.

            The Gene3D database (http://www.biochem.ucl.ac.uk/bsm/cath_new/Gene3D/) provides structural assignments for genes within complete genomes. These are available via the internet from either the World Wide Web or FTP. Assignments are made using PSI-BLAST and subsequently processed using the DRange protocol. The DRange protocol is an empirically benchmarked method for assessing the validity of structural assignments made using sequence searching methods where appropriate assignment statistics are collected and made available. Gene3D links assignments to their appropriate entries in relevent structural and classification resources (PDBsum, CATH database and the Dictionary of Homologous Superfamilies). Release 2.0 of Gene3D includes 62 genomes, 2 eukaryotes, 10 archaea and 40 bacteria. Currently, structural assignments can be made for between 30 and 40 percent of any given genome. In any genome, around half of those genes assigned a structural domain are assigned a single domain and the other half of the genes are assigned multiple structural domains. Gene3D is linked to the CATH database and is updated with each new update of CATH.

 

Bystroff, C. and S. Garde (2003). "Helix propensities of short peptides: molecular dynamics versus bioinformatics." Proteins 50(4): 552-62.

            Knowledge-based potential functions for protein structure prediction assume that the frequency of occurrence of a given structure or a contact in the protein database is a measure of its free energy. Here, we put this assumption to test by comparing the results obtained from sequence-structure cluster analysis with those obtained from long all-atom molecular dynamics simulations. Sixty-four eight-residue peptide sequences with varying degrees of similarity to the canonical sequence pattern for amphipathic helix were drawn from known protein structures, regardless of whether they were helical in the protein. Each was simulated using AMBER6.0 for at least 10 ns using explicit waters. The total simulation time was 1176 ns. The resulting trajectories were tested for reproducibility, and the helical content was measured. Natural peptides whose sequences matched the amphipathic helix motif with greater than 50% confidence were significantly more likely to form helix during the course of the simulation than peptides with lower confidence scores. The sequence pattern derived from the simulation data closely resembles the motif pattern derived from the database cluster analysis. The difficulties encountered in sampling conformational space and sequence space simultaneously are discussed.

 

Calver, A. R., D. Michalovich, et al. (2003). "Molecular cloning and characterisation of a novel GABA(B)-related G-protein coupled receptor." Brain Res Mol Brain Res 110(2): 305-17.

            Using a homology-based bioinformatics approach we have analysed human genomic sequence and identified the human and rodent orthologues of a novel putative seven transmembrane G protein coupled receptor, termed GABA(BL). The amino acid sequence homology of these cDNAs compared to GABA(B1) and GABA(B2) led us to postulate that GABA(BL) was a putative novel GABA(B) receptor subunit. The C-terminal sequence of GABA(BL) contained a putative coiled-coil domain, di-leucine and several RXR(R) ER retention motifs, all of which have been shown to be critical in GABA(B) receptor subunit function. In addition, the distribution of GABA(BL) in the central nervous system was reminiscent of that of the other known GABA(B) subunits. However, we were unable to detect receptor function in response to any GABA(B) ligands when GABA(BL) was expressed in isolation or in the presence of either GABA(B1) or GABA(B2). Therefore, if GABA(BL) is indeed a GABA(B) receptor subunit, its partner is a potentially novel receptor subunit or chaperone protein which has yet to be identified.

 

Camon, E., M. Magrane, et al. (2003). "The Gene Ontology Annotation (GOA) Project: Implementation of GO in SWISS-PROT, TrEMBL, and InterPro." Genome Res 13(4): 662-72.

            Gene Ontology Annotation (GOA) is a project run by the European Bioinformatics Institute (EBI) that aims to provide assignments of terms from the Gene Ontology (GO) resource to gene products in a number of its databases (http://www.ebi.ac.uk/GOA). In the first stage of this project, GO assignments have been applied to a data set representing the complete human proteome by a combination of electronic mappings and manual curation. This vocabulary has also been applied to the nonredundant proteome sets for all other completely sequenced organisms as well as to proteins from a wide range of organisms where the proteome is not yet complete.

 

Cash, H. D., J. W. Hoyle, et al. (2003). "Development under extreme conditions: forensic bioinformatics in the wake of the World Trade Center disaster." Pac Symp Biocomput: 638-53.

            The terrorist attacks of September 11, 2001 resulted in death and devastation in three locations, and extraordinary efforts have been exerted to identify the remains of all victims. As mass fatalities go, this one has been unusual at a policy level because the goal has been not merely to identify remains for every decedent, but to identify every bit of remains found so that even small pieces of tissue can be returned to families for burial. While the human impact at the Pentagon and Shanksville, PA was horrific, the World Trade Center site presented a particularly complex challenge for forensic DNA matching and data handling. A complete and definitive list of all those killed is still elusive, and human remains were crushed and co-mingled by the falling towers. Software tools had never been considered for a problem of this scale and scope. New data handling systems had to be created under extreme software development conditions characterized by incomplete requirements specifications, chaotically changing priorities, truly impossible deadlines and rapidly rolling production releases. Partly because of the company's experience with mtDNA tools built for the Armed Forces DNA Identification Lab starting in 1997, the New York City Office of Chief Medical Examiner [OCME] contacted Gene Codes Corporation in late September as existing data-handling tools began to fail. We began work on the project in mid-October, 2001. Our approach to the problem included: Extreme Programming [XP] methodology for functional software development, On-site time and motion analysis at the OCME for user interface design, Evidentiary references between STR, SNP and mtDNA analysis results, and Separate data Quality Control [QC] and software Quality Assurance [QA] initiatives. A substantial software suite was developed called M-FISys, an acronym for Mass-Fatality Identification System.

 

Cheng, Q., S. Wang, et al. (2003). "New approaches for anti-infective drug discovery: antibiotics, vaccines and beyond." Curr Drug Targets Infect Disord 3(1): 66-75.

            Infectious disease is the leading cause of death worldwide, and billions of dollars are invested every year in developing anti-infective drugs. In the meantime, resistant bacteria are on the steady rise and render many once effective drugs useless. The tremendous funding and the urgent need to treat the resistant bacterial infections lead to the rapid progress on development of new drugs and potential new drug targets. New discoveries are being made that increase our understanding of microbial pathogenesis. Technological advancement is also being made to accelerate the drug discovery process. This review will mainly focus on discussing novel strategies on the development of antibiotics and vaccines for treating bacterial infections. Details of how some of the emerging technologies such as genomics and bioinformatics are accelerating the drug discovery process will be highlighted. Newly emerging concepts in controlling bacterial infections such as the use of probiotics and enzybiotics will also be briefly described.

 

Clamp, M., D. Andrews, et al. (2003). "Ensembl 2002: accommodating comparative genomics." Nucleic Acids Res 31(1): 38-42.

            The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of human, mouse and other genome sequences, available as either an interactive web site or as flat files. Ensembl also integrates manually annotated gene structures from external sources where available. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. These range from sequence analysis to data storage and visualisation and installations exist around the world in both companies and at academic sites. With both human and mouse genome sequences available and more vertebrate sequences to follow, many of the recent developments in Ensembl have focusing on developing automatic comparative genome analysis and visualisation.

 

Conway, T. and G. K. Schoolnik (2003). "Microarray expression profiling: capturing a genome-wide portrait of the transcriptome." Mol Microbiol 47(4): 879-89.

            The bacterial transcriptome is a dynamic entity that reflects the organism's immediate, ongoing and genome-wide response to its environment. Microarray expression profiling provides a comprehensive portrait of the transcriptional world enabling us to view the organism as a 'system' that is more than the sum of its parts. The vigilance of microorganisms to environmental change, the alacrity of the transcriptional response, the short half-life of bacterial mRNA and the genome-scale nature of the investigation collectively explain the power of this method. These same features pose the most significant experimental design and execution issues which, unless surmounted, predictably generate a distorted image of the transcriptome. Conversely, the expression profile of a properly conceived and conducted microarray experiment can be used for hypothesis testing: disclosure of the metabolic and biosynthetic pathways that underlie adaptation of the organism to chang-ing conditions of growth; the identification of co-ordinately regulated genes; the regulatory circuits and signal transduction systems that mediate the adaptive response; and temporal features of developmental programmes. The study of bacterial pathogenesis by microarray expression profiling poses special challenges and opportunities. Although the technical hurdles are many, obtaining expression profiles of an organism growing in tissue will probably reveal strategies for growth and survival in the host's microenvironment. Identifying these colonization strategies and their cognate expression patterns involves a 'deconstruction' process that combines bioinformatics analysis and in vitro DNA array experimentation.

 

Corvin, A. and M. Gill (2003). "Psychiatric genetics in the post-genome age." Br J Psychiatry 182: 95-6.

           

Couzin, J. (2003). "Functional genomics. How to make sense of sequence." Science 299(5613): 1642.

           

Creighton, C., S. Hanash, et al. (2003). "Gene expression patterns define pathways correlated with loss of differentiation in lung adenocarcinomas." FEBS Lett 540(1-3): 167-70.

            An analysis of microarray data from 86 lung adenocarcinomas reveals hundreds of genes significantly correlated with tumor cell differentiation. A bioinformatics approach of linking these genes to public information from the Locuslink and KEGG databases yields evidence for a loss of tumor cell differentiation being associated with biological processes of cell division, protein degradation, pyrimidine and purine metabolism, oxidative phosphorylation, glyoxylate and dicarboxylate metabolism, folate biosynthesis, and glutamate metabolism. The increased expression of genes involved in these processes is consistent with increased proliferation and metabolism characteristics of poorly differentiated tumors. The complete results of this analysis are available at http://dot.ped.med.umich.edu:2000/pub/diff/index.htm.

 

de Bakker, P. I., M. A. DePristo, et al. (2003). "Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all-atom statistical potential and the AMBER force field with the Generalized Born solvation model." Proteins 51(1): 21-40.

            The accuracy of model selection from decoy ensembles of protein loop conformations was explored by comparing the performance of the Samudrala-Moult all-atom statistical potential (RAPDF) and the AMBER molecular mechanics force field, including the Generalized Born/surface area solvation model. Large ensembles of consistent loop conformations, represented at atomic detail with idealized geometry, were generated for a large test set of protein loops of 2 to 12 residues long by a novel ab initio method called RAPPER that relies on fine-grained residue-specific phi/psi propensity tables for conformational sampling. Ranking the conformers on the basis of RAPDF scores resulted in selected conformers that had an average global, non-superimposed RMSD for all heavy mainchain atoms ranging from 1.2 A for 4-mers to 2.9 A for 8-mers to 6.2 A for 12-mers. After filtering on the basis of anchor geometry and RAPDF scores, ranking by energy minimization of the AMBER/GBSA potential energy function selected conformers that had global RMSD values of 0.5 A for 4-mers, 2.3 A for 8-mers, and 5.0 A for 12-mers. Minimized fragments had, on average, consistently lower RMSD values (by 0.1 A) than their initial conformations. The importance of the Generalized Born solvation energy term is reflected by the observation that the average RMSD accuracy for all loop lengths was worse when this term is omitted. There are, however, still many cases where the AMBER gas-phase minimization selected conformers of lower RMSD than the AMBER/GBSA minimization. The AMBER/GBSA energy function had better correlation with RMSD to native than the RAPDF. When the ensembles were supplemented with conformations extracted from experimental structures, a dramatic improvement in selection accuracy was observed at longer lengths (average RMSD of 1.3 A for 8-mers) when scoring with the AMBER/GBSA force field. This work provides the basis for a promising hybrid approach of ab initio and knowledge-based methods for loop modeling.

 

De Luc, A., A. Buccino, et al. (2003). "NF1 gene analysis based on DHPLC." Hum Mutat 21(2): 171-2.

            The high mutation rate at the NF1 locus results in a wide range of molecular abnormalities. The majority of these mutations are private and rare, generating elevated allelic diversity with a restricted number of recurrent mutations. In this study, we have assessed the efficacy of denaturing high-performance liquid chromatography (DHPLC), for detecting mutation in the NF1 gene. DHPLC is a fast and highly sensitive technique based on the detection of heteroduplexes in PCR products by ion pair reverse-phase HPLC under partially denaturing conditions. We established theoretical conditions for DHPLC analysis of all coding exons and splice junctions of the NF1 gene using the WAVEmaker software version 4.1.40 and screened for mutations a panel of 40 unrelated NF1 patients (25 sporadic and 15 familial), genetically uncharacterized. Disruptive mutations were identified in 29 individuals with an overall mutation detection rate of 72.5%. The mutations included eight deletions (exons 4b, 7, 10a, 14, 26, and 31), one insertion (exon 8), nine nonsense mutation (exons 10a, 13, 23.1, 27a, 29, 31, and 36), six missense mutations (exons 15, 16, 17, 24, and 31), four splice errors (exons 11, 14, 36, and 40) and a complex rearrangement within exon 16. Eighteen (62%) of the identified disruptive mutations are novel. Seven unclassified and three previously reported polymorphisms were also detected. None of the missense mutations identified in this study were found after screening of 150 controls. Our results suggest that DHPLC provides an accurate method for the rapid identification of NF1 mutations.

 

De Oliveira, T., R. Miller, et al. (2003). "An integrated genetic data environment (GDE)-based LINUX interface for analysis of HIV-1 and other microbial sequences." Bioinformatics 19(1): 153-4.

            Motivation: Sequence databases encode a wealth of information needed to develop improved vaccination and treatment strategies for the control of HIV and other important pathogens. To facilitate effective utilization of these datasets, we developed a user-friendly GDE-based LINUX interface that reduces input/output file formatting. Design and Results: GDE was adapted to the Linux operating system, bioinformatics tools were integrated with microbe-specific databases, and up-to-date GDE menus were developed for several clinically important viral, bacterial and parasitic genomes. Each microbial interface was designed for local access and contains Genbank, BLAST-formatted and phylogenetic databases. Availability: GDE-Linux is available for research purposes by direct application to the corresponding author. Application-specific menus and support files can be downloaded from (http://www.bioafrica.net) Contact: toliveira@mrc.ac.za

 

Devulder, G., G. Perriere, et al. (2003). "BIBI, a Bioinformatics Bacterial Identification Tool." J Clin Microbiol 41(4): 1785-7.

            BIBI was designed to automate DNA sequence analysis for bacterial identification in the clinical field. BIBI relies on the use of BLAST and CLUSTAL W programs applied to different subsets of sequences extracted from GenBank. These sequences are filtered and stored in a new database, which is adapted to bacterial identification.

 

DeYoung, M. P., M. Tress, et al. (2003). "Identification of Down's syndrome critical locus gene SIM2-s as a drug therapy target for solid tumors." Proc Natl Acad Sci U S A.

            We report here a cancer drug therapy use of a gene involved in Down's syndrome. Using bioinformatics approaches, we recently predicted Single Minded 2 gene (SIM2) from Down's syndrome critical region to be specific to certain solid tumors. Involvement of SIM2 in solid tumors has not previously been reported. Intrigued by a possible association between a Down's syndrome gene and solid tumors, we monitored SIM2 expression in solid tumors. Isoform-specific expression of SIM2 short-form (SIM2-s) was seen selectively in colon, prostate, and pancreatic carcinomas but not in breast, lung, or ovarian carcinomas nor in most normal tissues. In colon tumors, SIM2-s expression was seen in early stages. Antisense inhibition of SIM2-s expression in a colon cancer cell line caused inhibition of gene expression, growth inhibition, and apoptosis. The administration of the antisense, but not the control, oligonucleotides caused a pronounced inhibition of tumor growth in nude mice with no major toxicity. Our findings provide a strong rationale for the genes-to-drugs paradigm, establish SIM2-s as a molecular target for cancer therapeutics, and may further understanding of the cancer risk of Down's syndrome patients.

 

Dilks, K., R. W. Rose, et al. (2003). "Prokaryotic utilization of the twin-arginine translocation pathway: a genomic survey." J Bacteriol 185(4): 1478-83.

            The twin-arginine translocation (Tat) pathway, which has been identified in plant chloroplasts and prokaryotes, allows for the secretion of folded proteins. However, the extent to which this pathway is used among the prokaryotes is not known. By using a genomic approach, a comprehensive list of putative Tat substrates for 84 diverse prokaryotes was established. Strikingly, the results indicate that the Tat pathway is utilized to highly varying extents. Furthermore, while many prokaryotes use this pathway predominantly for the secretion of redox proteins, analyses of the predicted substrates suggest that certain bacteria and archaea secrete mainly nonredox proteins via the Tat pathway. While no correlation was observed between the number of Tat machinery components encoded by an organism and the number of predicted Tat substrates, it was noted that the composition of this machinery was specific to phylogenetic taxa.

 

Dreger, M. (2003). "Proteome analysis at the level of subcellular structures." Eur J Biochem 270(4): 589-99.

            The targeting of proteins to particular subcellular sites is an important principle of the functional organization of cells at the molecular level. In turn, knowledge about the subcellular localization of a protein is a characteristic that may provide a hint as to the function of the protein. The combination of classic biochemical fractionation techniques for the enrichment of particular subcellular structures with the large-scale identification of proteins by mass spectrometry and bioinformatics provides a powerful strategy that interfaces cell biology and proteomics, and thus is termed 'subcellular proteomics'. In addition to its exceptional power for the identification of previously unknown gene products, the analysis of proteins at the subcellular level is the basis for monitoring important aspects of dynamic changes in the proteome such as protein transloction. This review summarizes data from recent subcellular proteomics studies with an emphasis on the type of data that can retrieved from such studies depending on the design of the analytical strategy.

 

Durand, D. (2003). "Vertebrate evolution: doubling and shuffling with a full deck." Trends Genet 19(1): 2-5.

            The number and role of whole-genome duplications in vertebrate evolution has intrigued evolutionary biologists since Ohno first proposed genome duplication as the force driving the 'big leap' in vertebrate morphological innovation. Attempts to resolve these issues have been thwarted by small and noisy datasets, and by lack of computational accuracy and statistical rigor. Recently, Ken Wolfe and colleagues presented a genome-scale, statistically rigorous analysis of evidence based on the spatial organization of duplicated genes, as well as estimates of duplication times. Their results provide the strongest evidence to date of large-scale duplication throughout the vertebrate genome, consistent with at least one whole-genome duplication.

 

Dzioba, J., C. C. Hase, et al. (2003). "Experimental verification of a sequence-based prediction: F(1)F(0)-type ATPase of Vibrio cholerae transports protons, not Na(+) ions." J Bacteriol 185(2): 674-8.

            The membrane energetics of the intestinal pathogen Vibrio cholerae involves both H(+) and Na(+) as coupling ions. The sequence of the c subunit of V. cholerae F(0)F(1) ATPase suggested that this enzyme is H(+) specific, in contrast to the results of previous studies on the Na(+)-dependent ATP synthesis in closely related Vibrio spp. Measurements of the pH gradient and membrane potential in membrane vesicles isolated from wild-type and DeltaatpE mutant V. cholerae show that the F(1)F(0) ATPase of V. cholerae is an H(+), not Na(+), pump, confirming the bioinformatics assignments that were based on the Na(+)-binding model of S. Rahlfs and V. Muller (FEBS Lett. 404:269-271, 1999). Application of this model to the AtpE sequences from other bacteria and archaea indicates that Na(+)-specific F(1)F(0) ATPases are present in a number of important bacterial pathogens.

 

Eapen, B. R. (2003). "A new insight into the pathogenesis of Reiter's syndrome using bioinformatics tools." Int J Dermatol 42(3): 242-3.

           

Edwards, Y. J. and A. Cottage (2003). "Bioinformatics methods to predict protein structure and function. A practical approach." Mol Biotechnol 23(2): 139-66.

            Protein structure prediction by using bioinformatics can involve sequence similarity searches, multiple sequence alignments, identification and characterization of domains, secondary structure prediction, solvent accessibility prediction, automatic protein fold recognition, constructing three-dimensional models to atomic detail, and model validation. Not all protein structure prediction projects involve the use of all these techniques. A central part of a typical protein structure prediction is the identification of a suitable structural target from which to extrapolate three-dimensional information for a query sequence. The way in which this is done defines three types of projects. The first involves the use of standard and well-understood techniques. If a structural template remains elusive, a second approach using nontrivial methods is required. If a target fold cannot be reliably identified because inconsistent results have been obtained from nontrivial data analyses, the project falls into the third type of project and will be virtually impossible to complete with any degree of reliability. In this article, a set of protocols to predict protein structure from sequence is presented and distinctions among the three types of project are given. These methods, if used appropriately, can provide valuable indicators of protein structure and function.

 

Eichenberger, P., S. T. Jensen, et al. (2003). "The sigma(E) Regulon and the Identification of Additional Sporulation Genes in Bacillus subtilis." J Mol Biol 327(5): 945-72.

            We report the identification and characterization on a genome-wide basis of genes under the control of the developmental transcription factor sigma(E) in Bacillus subtilis. The sigma(E) factor governs gene expression in the larger of the two cellular compartments (the mother cell) created by polar division during the developmental process of sporulation. Using transcriptional profiling and bioinformatics we show that 253 genes (organized in 157 operons) appear to be controlled by sigma(E). Among these, 181 genes (organized in 121 operons) had not been previously described as members of this regulon. Promoters for many of the newly identified genes were located by transcription start site mapping. To assess the role of these genes in sporulation, we created null mutations in 98 of the newly identified genes and operons. Of the resulting mutants, 12 (in prkA, ybaN, yhbH, ykvV, ylbJ, ypjB, yqfC, yqfD, ytrH, ytrI, ytvI and yunB) exhibited defects in spore formation. In addition, subcellular localization studies were carried out using in-frame fusions of several of the genes to the coding sequence for GFP. A majority of the fusion proteins localized either to the membrane surrounding the developing spore or to specific layers of the spore coat, although some fusions showed a uniform distribution in the mother cell cytoplasm. Finally, we used comparative genomics to determine that 46 of the sigma(E)-controlled genes in B.subtilis were present in all of the Gram-positive endospore-forming bacteria whose genome has been sequenced, but absent from the genome of the closely related but not endospore-forming bacterium Listeria monocytogenes, thereby defining a core of conserved sporulation genes of probable common ancestral origin. Our findings set the stage for a comprehensive understanding of the contribution of a cell-specific transcription factor to development and morphogenesis.

 

Elkin, P. L. (2003). "Primer on medical genomics part V: bioinformatics." Mayo Clin Proc 78(1): 57-64.

            Bioinformatics is the discipline that develops and applies informatics to the field of molecular biology. Although a comprehensive review of the entire field of bioinformatics is beyond the scope of this article, I review the basic tenets of the field and provide a topical sampling of the popular technologies available to clinicians and researchers. These technologies include tools and methods for sequence analysis (nucleotide and protein sequences), rendering of secondary and tertiary structures for these molecules, and protein fold prediction that can lead to rational drug design. I then discuss signaling pathways, new standards for data representation of genes and proteins, and finally the promise of merging these molecular data with the clinical world (the new science of phenomics).

 

Elnitski, L., R. C. Hardison, et al. (2003). "Distinguishing regulatory DNA from neutral sites." Genome Res 13(1): 64-72.

            We explore several computational approaches to analyzing interspecies genomic sequence alignments, aiming to distinguish regulatory regions from neutrally evolving DNA. Human-mouse genomic alignments were collected for three sets of human regions: (1) experimentally defined gene regulatory regions, (2) well-characterized exons (coding sequences, as a positive control), and (3) interspersed repeats thought to have inserted before the human-mouse split (a good model for neutrally evolving DNA). Models that potentially could distinguish functional noncoding sequences from neutral DNA were evaluated on these three data sets, as well as bulk genome alignments. Our analyses show that discrimination based on frequencies of individual nucleotide pairs or gaps (i.e., of possible alignment columns) is only partially successful. In contrast, scoring procedures that include the alignment context, based on frequencies of short runs of alignment columns, dramatically improve separation between regulatory and neutral features. Such scoring functions should aid in the identification of putative regulatory regions throughout the human genome.

 

Ernst, P., K. H. Glatting, et al. (2003). "A task framework for the web interface W2H." Bioinformatics 19(2): 278-82.

            Summary: The W3H task framework allows the execution of compound jobs utilizing the description of work and data flows in a heterogeneous bioinformatics environment using meta-data information. By means of these descriptions, the task system can schedule the necessary execution of applications available in the environment, depending on rules specified in the meta-data. By integrating this task framework into the publicly available web interface W2H, similarly based on meta-data, web access and data management are immediately available for each task description. Authors of task descriptions can base their work on the underlying classes and objects to be able to describe dependency rules between previously independent applications. The result of a compound task is given as XML data that is translated according to XSLT data into web pages or plain text to report the result of the task to the user. Availability: Within the HUSAR environment at DKFZ http://genome.dkfz-heidelberg.de/ Contact: P.Ernst@dkfz.de

 

Fauman, E. B., A. L. Hopkins, et al. (2003). "Structural bioinformatics in drug discovery." Methods Biochem Anal 44: 477-97.

           

Fiehn, O. and W. Weckwerth (2003). "Deciphering metabolic networks." Eur J Biochem 270(4): 579-88.

            All higher organisms divide major biochemical steps into different cellular compartments and often use tissue-specific division of metabolism for the same purpose. Such spatial resolution is accompanied with temporal changes of metabolite synthesis in response to environmental stimuli or developmental needs. Although analyses of primary and secondary gene products, i.e. transcripts, proteins, and metabolites, regularly do not cope with this spatial and temporal resolution, these gene products are often observed to be highly coregulated forming complex networks. Methods to study such networks are reviewed with respect to data acquisition, network statistics, and biochemical interpretation.

 

Flicek, P., E. Keibler, et al. (2003). "Leveraging the mouse genome for gene prediction in human: from whole-genome shotgun reads to a global synteny map." Genome Res 13(1): 46-54.

            The availability of draft sequences for both the mouse and human genomes makes it possible, for the first time, to annotate whole mammalian genomes using comparative methods. TWINSCAN is a gene-prediction system that combines the methods of single-genome predictors like GENSCAN with information derived from genome comparison, thereby improving accuracy. Because TWINSCAN uses genomic sequence only, it is less biased toward highly and/or ubiquitously expressed genes than GENEWISE, GENOMESCAN, and other methods based on evidence derived from transcripts. We show that TWINSCAN improves gene prediction in human using intermediate products from various stages of the sequencing and analysis of the mouse genome, from low-redundancy, whole-genome shotgun reads to the draft assembly and the synteny map. TWINSCAN improves on the prior state of the art even when alignments from only 1X coverage of the mouse genome are available. Gene prediction accuracy improves steadily from 1X through 3X, more slowly from 3X to 4X, and relatively little thereafter. The assembly and the synteny map greatly speed the computations, however. Our human annotation using the mouse assembly is conservative, predicting only 25,622 genes, and appears to be one of the best de novo annotations of the human genome to date.

 

Forster, J., I. Famili, et al. (2003). "Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network." Genome Res 13(2): 244-53.

            The metabolic network in the yeast Saccharomyces cerevisiae was reconstructed using currently available genomic, biochemical, and physiological information. The metabolic reactions were compartmentalized between the cytosol and the mitochondria, and transport steps between the compartments and the environment were included. A total of 708 structural open reading frames (ORFs) were accounted for in the reconstructed network, corresponding to 1035 metabolic reactions. Further, 140 reactions were included on the basis of biochemical evidence resulting in a genome-scale reconstructed metabolic network containing 1175 metabolic reactions and 584 metabolites. The number of gene functions included in the reconstructed network corresponds to approximately 16% of all characterized ORFs in S. cerevisiae. Using the reconstructed network, the metabolic capabilities of S. cerevisiae were calculated and compared with Escherichia coli. The reconstructed metabolic network is the first comprehensive network for a eukaryotic organism, and it may be used as the basis for in silico analysis of phenotypic functions.

 

Foth, B. J., S. A. Ralph, et al. (2003). "Dissecting apicoplast targeting in the malaria parasite Plasmodium falciparum." Science 299(5607): 705-8.

            Transit peptides mediate protein targeting into plastids and are only poorly understood. We extracted amino acid features from transit peptides that target proteins to the relict plastid (apicoplast) of malaria parasites. Based on these amino acid characteristics, we identified 466 putative apicoplast proteins in the Plasmodium falciparum genome. Altering the specific charge characteristics in a model transit peptide by site-directed mutagenesis severely disrupted organellar targeting in vivo. Similarly, putative Hsp70 (DnaK) binding sites present in the transit peptide proved to be important for correct targeting.

 

Frantz, S. (2003). "Analytical jobs for analytical minds." Nat Rev Drug Discov 2(3): 243.

           

Frazer, K. A., L. Elnitski, et al. (2003). "Cross-species sequence comparisons: a review of methods and available resources." Genome Res 13(1): 1-12.

            With the availability of whole-genome sequences for an increasing number of species, we are now faced with the challenge of decoding the information contained within these DNA sequences. Comparative analysis of DNA sequences from multiple species at varying evolutionary distances is a powerful approach for identifying coding and functional noncoding sequences, as well as sequences that are unique for a given organism. In this review, we outline the strategy for choosing DNA sequences from different species for comparative analyses and describe the methods used and the resources publicly available for these studies.

 

Frishman, D., M. Mokrejs, et al. (2003). "The PEDANT genome database." Nucleic Acids Res 31(1): 207-11.

            The PEDANT genome database (http://pedant.gsf.de) provides exhaustive automatic analysis of genomic sequences by a large variety of established bioinformatics tools through a comprehensive Web-based user interface. One hundred and seventy seven completely sequenced and unfinished genomes have been processed so far, including large eukaryotic genomes (mouse, human) published recently. In this contribution, we describe the current status of the PEDANT database and novel analytical features added to the PEDANT server in 2002. Those include: (i) integration with the BioRS data retrieval system which allows fast text queries, (ii) pre-computed sequence clusters in each complete genome, (iii) a comprehensive set of tools for genome comparison, including genome comparison tables and protein function prediction based on genomic context, and (iv) computation and visualization of protein-protein interaction (PPI) networks based on experimental data. The availability of functional and structural predictions for 650 000 genomic proteins in well organized form makes PEDANT a useful resource for both functional and structural genomics.

 

Fuller, S. D. (2003). "Depositing electron microscopy maps." Structure (Camb) 11(1): 11-2.

            A meeting was held at the European Bioinformatics Institute (EBI) in Hinxton, United Kingdom to discuss recent progress in the development of EMD, a database for maps determined by electron microscopy that is now integrated with MSD, the macromolecular structure database at EBI. This meeting of representatives of many of the major image processing groups in electron microscopy also discussed possible software developments that would ease the documentation and deposition of such datasets. The meeting concluded with a strong endorsement of map deposition in electron microscopy and its linkage with the family of archival databases in biomedical research.

 

Gadek, T. R. and J. B. Nicholas (2003). "Small molecule antagonists of proteins." Biochem Pharmacol 65(1): 1-8.

            The identification of small molecule antagonists of protein function is at the core of the pharmaceutical industry. Successful approaches to this problem, including screening and rational design, have been developed over the years to identify antagonists of enzymes and cellular receptors. These methods have been extended to the search for inhibitors of protein-protein interactions. While the very possibility of designing a small molecule inhibitor for such interactions was once doubted, there are examples of such inhibitors that are currently marketed products and many more inhibitors in various stages of research and development. Here we review the progress in identifying and designing small molecule protein inhibitors, with particular attention to those that block protein-protein interactions. We also discuss the physical character of protein-protein interfaces, and the resulting implications for small molecule lead discovery and design.

 

Gander, T. R., E. N. Brody, et al. (2003). "Driving forces in cancer diagnostics." MLO Med Lab Obs 35(1): 10-6, 20; quiz 20-1.

           

Garavelli, J. S. (2003). "The RESID Database of Protein Modifications: 2003 developments." Nucleic Acids Res 31(1): 499-501.

            The RESID Database is a comprehensive collection of annotations and structures for protein pre-, co- and post-translational modifications including amino-terminal, carboxyl-terminal and peptide chain cross-link modifications. The RESID Database includes: systematic and alternate names, atomic formulas and masses, enzyme activities generating the modifications, keywords, literature citations, Gene Ontology cross-references, Protein Information Resource (PIR) and SWISS-PROT protein sequence database feature table annotations, structure diagrams and molecular models. This database is freely accessible on the Internet through the European Bioinformatics Institute at http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-page+LibInfo+-lib+RESID, through the National Cancer Institute - Frederick Advanced Biomedical Computing Center at http://www.ncifcrf.gov/RESID, or through the Protein Information Resource at http://pir.georgetown.edu/pirwww/dbinfo/resid.html.

 

Gaulton, A. and T. K. Attwood (2003). "Bioinformatics approaches for the classification of G-protein-coupled receptors." Curr Opin Pharmacol 3(2): 114-20.

            G-protein-coupled receptors are found abundantly in the human genome, and are the targets of numerous prescribed drugs. However, many receptors remain orphaned (i.e. with unknown ligand specificity), and others remain poorly characterised, with little structural information available. Consequently, there is often a gulf between sequence data and structural and functional knowledge of a receptor. Bioinformatics approaches may offer one approach to bridging this gap. In particular, protein family databases, which distil information from multiple sequence alignments into characteristic signatures, could be used to identify the families to which orphan receptors belong, and might facilitate discovery of novel motifs associated with ligand binding and G-protein-coupling.

 

Ghosh, A. (2003). "Guest section. Computational bioinorganic chemistry. Part III. The tools of the trade: from high-level ab initio calculations to structural bioinformatics." Curr Opin Chem Biol 7(1): 110-2.

            Having long focused on the electronic-structural aspects of metalloenzymes and their synthetic models, Abhik is currently exploring a number of synthetic problems related to self-assembly processes, dynamic combinatorial libraries, and fluorine chemistry.

 

Gibbs, R. A. and D. L. Nelson (2003). "Human genetics. Primate shadow play." Science 299(5611): 1331-3.

           

Giorgetti, A. and P. Carloni (2003). "Molecular modeling of ion channels: structural predictions." Curr Opin Chem Biol 7(1): 150-6.

            Recent advances in membrane protein crystallography have greatly increased structural information of channels permeating metal ions. Structural bioinformatics techniques and molecular dynamics calculations are providing structural models of ion channels for which the three-dimensional structure is not known. Most of the reported structure prediction studies focus on K(+) channels and are based on the KcsA K(+) channel structure.

 

Giorgianni, F., D. M. Desiderio, et al. (2003). "Proteome analysis using isoelectric focusing in immobilized pH gradient gels followed by mass spectrometry." Electrophoresis 24(1-2): 253-9.

            Over the past several years, a large effort has been focused on improvements of two-dimensional (2-D) gel electrophoresis-based proteomics technology, and on development of novel approaches for proteome analysis. Here, we describe the application of an alternative strategy for the analysis of complex proteomes. The strategy combines isoelectric focusing in immobilized pH gradient strips (in-gel IEF), mass spectrometry (MS), and bioinformatics. A protein mixture is separated by in-gel IEF, and the entire strip is cut into a set of gel sections. Proteins in each gel section are digested with trypsin, and the tryptic peptides are subjected to liquid chromatography-nanoelectrospray-quadrupole ion-trap tandem mass spectrometry (LC-ESI-MS/MS). The LC-ESI-MS/MS data are used to identify the proteins through searches of a protein sequence database. Using this in-gel IEF-LC-MS/MS strategy, we have identified 127 proteins from a human pituitary. This study demonstrates the potential of the in-gel IEF-LC-MS/MS approach for analyses of complex mammalian proteomes.

 

Goetze, S., A. Gluch, et al. (2003). "Computational and in vitro analysis of destabilized DNA regions in the interferon gene cluster: potential of predicting functional gene domains." Biochemistry 42(1): 154-66.

            Recent evidence adds support to a traditional concept according to which the eukaryotic nucleus is organized into functional domains by scaffold or matrix attachment regions (S/MARs). These regions have previously been predicted to have a high potential for stress-induced duplex destabilization (SIDD). Here we report the parallel results of binding (reassociation) and computational SIDD analyses for regions within the human interferon gene cluster on the short arm of chromosome 9 (9p22). To verify and further refine the biomathematical methods, we focus on a 10 kb region in the cluster with the pseudogene IFNWP18 and the interferon alpha genes IFNA10 and IFNA7. In a series of S/MAR binding assays, we investigate the promoter and termination regions and additional attachment sequences that were detected in the SIDD profile. The promoters of the IFNA10 and the IFNA7 genes have a moderate approximately 20% binding affinity to the nuclear matrix; the termination sequences show stronger association (70-80%) under our standardized conditions. No comparable destabilized elements were detected flanking the IFNWP18 pseudogene, suggesting that selective pressure acts on the physicochemical properties detected here. In extended, noncoding regions a striking periodicity is found of rather restricted SIDD minima with scaffold binding potential. By various criteria, the underlying sequences represent a new class of S/MARs, thought to be involved in a higher level organization of the genome. Together, these data emphasize the relevance of SIDD calculations as a valid approach for the localization of structural, regulatory, and coding regions in the eukaryotic genome.

 

Goshe, M. B. and R. D. Smith (2003). "Stable isotope-coded proteomic mass spectrometry." Curr Opin Biotechnol 14(1): 101-9.

            Developing the ability to quantify changes in protein abundance between cells subjected to a variety of physiological and environmental conditions is an extremely active area of proteome research. Although advances in chromatography, mass spectrometry instrumentation, and bioinformatics have contributed to producing a viable method for comparative proteome-wide analyses, the highest precision of quantitation is based, in part, upon improved methods for chemical and metabolic stable isotope labeling of proteins and peptides. The ability to quantify differences in protein expression and post-translational modifications using stable isotope labeling has been achieved, but insights into the biochemical mechanisms that will contribute to the development of new biotechnologies have yet to be realized.

 

Greenblatt, M. S., J. G. Beaudet, et al. (2003). "Detailed computational study of p53 and p16: using evolutionary sequence analysis and disease-associated mutations to predict the functional consequences of allelic variants." Oncogene 22(8): 1150-63.

            Deciding whether a missense allelic variant affects protein function is important in many contexts. We previously demonstrated that a detailed analysis of p53 intragenic conservation correlates with somatic mutation hotspots. Here we refine these evolutionary studies and expand them to the p16/Ink4a gene. We calculated that in order for 'absolute conservation' of a codon across multiple species to achieve P<0.05, the evolutionary substitution database must contain at least 3(M) variants, where M equals the number of codons in the gene. Codons in p53 were divided into high (73% of codons), intermediate (29% of codons), and low (0 codons) likelihood of being mutation hotspots. From a database of 263 somatic missense p16 mutations, we identified only four codons that are mutational hotspots at P<0.05 (8 mutations). However, data on function, structure, and disease association support the conclusion that 11 other codons with > or =5 somatic mutations also likely indicate functionally critical residues, even though P0.05. We calculated p16 evolution using amino acid substitution matrices and nucleotide substitution distances. We looked for evolutionary parameters at each codon that would predict whether missense mutations were disease associated or disrupted function. The current p16 evolutionary substitution database is too small to determine whether observations of 'absolute conservation' are statistically significant. Increasing the number of sequences from three to seven significantly improved the predictive value of evolutionary computations. The sensitivity and specificity for conservation scores in predicting disease association of p16 codons is 70-80%. Despite the small p16 sequence database, our calculations of high conservation correctly predicted loss of cell cycle arrest function in 75% of tested codons, and low conservation correctly predicted wild-type function in 80-90% of codons. These data validate our hypothesis that detailed evolutionary analyses help predict the consequences of missense amino-acid variants.

 

Gurwitz, D., A. Weizman, et al. (2003). "Education: Teaching pharmacogenomics to prepare future physicians and researchers for personalized medicine." Trends Pharmacol Sci 24(3): 122-5.

            The vision of personalized medicine, the practice of medicine where each patient receives the most appropriate medical treatments and the most fitting dosage and combination of drugs based on his or her genetic make-up, seems to become more realistic as our knowledge about the human genome rapidly expands. We already know the reason for many types of adverse drug reactions, which are often related to polymorphic gene alleles of drug metabolizing enzymes. Moreover, insight into reasons for poor drug efficacy, often related to single nucleotide polymorphisms or larger polymorphisms in genes encoding drug target proteins, has been gained. There is a growing need to incorporate this increasingly complex body of knowledge to the standard curriculum of medical schools, so that the forthcoming generation of clinicians and researchers will be familiar with the latest developments in pharmacogenomics and medical bioinformatics, and will be capable of providing patients with the expected benefits of personalized medicine.

 

Haese, A., M. Chaudhari, et al. (2003). "Quantitative biopsy pathology for the prediction of pathologically organ-confined prostate carcinoma: a multiinstitutional validation study." Cancer 97(4): 969-78.

            BACKGROUND: Quantitative biopsy pathology with prostate specific antigen significantly improves the prediction of pathologic stage in patients with clinically localized prostate carcinoma (PCa). The authors recently reported a computational model for predicting patient specific likelihood of organ confinement of PCa using biopsy pathology and clinical data. The current study validates the initial models and presents an new, improved tool for clinical decision making. METHODS: The authors assessed 10 biopsy pathologic parameters and 2 clinical parameters using data from two institutions. Of 1287 patients, 798 men had pathologically organ confined (OC) PCa, 282 men had nonorgan-confined disease with capsular penetration (NOC-CP) only, and 207 men showed seminal vesicle or lymph node invasion (NOC-AD) after undergoing pelvic lymphadenectomy and radical pr