Human chromosome 2 is unique to the human lineage in being the product of a head-to-head fusion of two intermediate-sized ancestral chromosomes. Chromosome 4 has received attention primarily related to the search for the Huntington's disease gene, but also for genes associated with Wolf-Hirschhorn syndrome, polycystic kidney disease and a form of muscular dystrophy. Here we present approximately 237 million base pairs of sequence for chromosome 2, and 186 million base pairs for chromosome 4, representing more than 99.6% of their euchromatic sequences. Our initial analyses have identified 1,346 protein-coding genes and 1,239 pseudogenes on chromosome 2, and 796 protein-coding genes and 778 pseudogenes on chromosome 4. Extensive analyses confirm the underlying construction of the sequence, and expand our understanding of the structure and evolution of mammalian chromosomes, including gene deserts, segmental duplications and highly variant regions.
The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution.
We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
Human chromosome 7 has historically received prominent attention in the human genetics community, primarily related to the search for the cystic fibrosis gene and the frequent cytogenetic changes associated with various forms of cancer. Here we present more than 153 million base pairs representing 99.4% of the euchromatic sequence of chromosome 7, the first metacentric chromosome completed so far. The sequence has excellent concordance with previously established physical and genetic maps, and it exhibits an unusual amount of segmentally duplicated sequence (8.2%), with marked differences between the two arms. Our initial analyses have identified 1,150 protein-coding genes, 605 of which have been confirmed by complementary DNA sequences, and an additional 941 pseudogenes. Of genes confirmed by transcript sequences, some are polymorphic for mutations that disrupt the reading frame.
Comparison of the genomes and proteomes of the two diptera Anopheles gambiae and Drosophila melanogaster, which diverged about 250 million years ago, reveals considerable similarities. However, numerous differences are also observed; some of these must reflect the selection and subsequent adaptation associated with different ecologies and life strategies. Almost half of the genes in both genomes are interpreted as orthologs and show an average sequence identity of about 56%, which is slightly lower than that observed between the orthologs of the pufferfish and human (diverged about 450 million years ago). This indicates that these two insects diverged considerably faster than vertebrates. Aligned sequences reveal that orthologous genes have retained only half of their intron/exon structure, indicating that intron gains or losses have occurred at a rate of about one per gene per 125 million years. Chromosomal arms exhibit significant remnants of homology between the two species, although only 34% of the genes colocalize in small "microsyntenic" clusters, and major interarm transfers as well as intra-arm shuffling of gene order are detected.
Four years after the original sequence submission, we have re-annotated the genome of Mycoplasma pneumoniae to incorporate novel data. The total number of ORFss has been increased from 677 to 688 (10 new proteins were predicted in intergenic regions, two further were newly identified by mass spectrometry and one protein ORF was dismissed) and the number of RNAs from 39 to 42 genes. For 19 of the now 35 tRNAs and for six other functional RNAs the exact genome positions were re-annotated and two new tRNA(Leu) and a small 200 nt RNA were identified. Sixteen protein reading frames were extended and eight shortened. For each ORF a consistent annotation vocabulary has been introduced. Annotation reasoning, annotation categories and comparisons to other published data on M.pneumoniae functional assignments are given. Experimental evidence includes 2-dimensional gel electrophoresis in combination with mass spectrometry as well as gene expression data from this study. Compared to the original annotation, we increased the number of proteins with predicted functional features from 349 to 458. The increase includes 36 new predictions and 73 protein assignments confirmed by the published literature. Furthermore, there are 23 reductions and 30 additions with respect to the previous annotation. mRNA expression data support transcription of 184 of the functionally unassigned reading frames.
As a part of our cDNA project for deducing the coding sequence of unidentified human genes, we newly determined the sequences of 100 cDNA clones from a set of size-fractionated human brain cDNA libraries, and predicted the coding sequences of the corresponding genes, named KIAA0919 to KIAA1018. The sequencing of these clones revealed that the average sizes of the inserts and corresponding open reading frames were 4.9 kb and 2.6 kb (882 amino acid residues), respectively. A computer search of the sequences against the public databases indicated that predicted coding sequences of 87 genes contained sequences similar to known genes, 53% of which (46 genes) were categorized as proteins relating to cell signaling/communication, cell structure/motility and nucleic acid management. The chromosomal locations of the genes were determined by using human-rodent hybrid panels unless their mapping data were already available in the public databases. The expression profiles of all the genes among 10 human tissues, 8 brain regions (amygdala, corpus callosum, cerebellum, caudate nucleus, hippocampus, substania nigra, subthalamic nucleus, and thalamus), spinal cord, fetal brain and fetal liver were also examined by reverse transcription-coupled polymerase chain reaction, products of which were quantified by enzyme-linked immunosorbent assay.
        
Title: Prediction of the coding sequences of unidentified human genes. X. The complete sequences of 100 new cDNA clones from brain which can code for large proteins in vitro Ishikawa K, Nagase T, Suyama M, Miyajima N, Tanaka A, Kotani H, Nomura N, Ohara O Ref: DNA Research, 5:169, 1998 : PubMed
As an extension of our cDNA analysis for deducing the coding sequences of unidentified human genes, we have newly determined the sequences of 100 cDNA clones from a set of size-fractionated human brain cDNA libraries, and predicted the coding sequences of the corresponding genes, named KIAA0611 to KIAA0710. In vitro transcription-coupled translation assay was applied as the first screening to select cDNA clones which produce proteins with apparent molecular mass of 50 kDa and over. One hundred unidentified cDNA clones thus selected were then subjected to sequencing of entire inserts. The average size of the inserts and corresponding open reading frames was 4.9 kb and 2.8 kb (922 amino acid residues), respectively. Computer search of the sequences against the public databases indicated that predicted coding sequences of 87 genes were similar to those of known genes, 62% of which (54 genes) were categorized as proteins related to cell signaling/communication, cell structure/motility and nucleic acid management. The expression profiles in 10 human tissues of all the clones characterized in this study were examined by reverse transcription-coupled polymerase chain reaction and the chromosomal locations of the clones were determined by using human-rodent hybrid panels.