(Below N is a link to NCBI taxonomic web page and E link to ESTHER at designed phylum.) > cellular organisms: NE > Eukaryota: NE > Opisthokonta: NE > Metazoa: NE > Eumetazoa: NE > Bilateria: NE > Protostomia: NE > Ecdysozoa: NE > Panarthropoda: NE > Arthropoda: NE > Mandibulata: NE > Pancrustacea: NE > Hexapoda: NE > Insecta: NE > Dicondylia: NE > Pterygota: NE > Neoptera: NE > Holometabola: NE > Diptera: NE > Brachycera: NE > Muscomorpha: NE > Eremoneura: NE > Cyclorrhapha: NE > Schizophora: NE > Acalyptratae: NE > Ephydroidea: NE > Drosophilidae: NE > Drosophilinae: NE > Drosophilini: NE > Drosophila [fruit fly, genus]: NE > Sophophora: NE > melanogaster group: NE > melanogaster subgroup: NE > Drosophila melanogaster: NE
LegendThis sequence has been compared to family alignement (MSA) red => minority aminoacid blue => majority aminoacid color intensity => conservation rate title => sequence position(MSA position)aminoacid rate Catalytic site Catalytic site in the MSA MTIVLNLPVCVLFLLGTLVPSNSQKWIPYANPLDEQSQLQEDILDNTKLT TVDLIEKYGYPSETNYVTSEDGYRLCLHRIPRPGAEPVLLVHGLMASSAS WVELGPKDGLAYILYRKGYDVWMLNTRGNIYSRENLNRRLKPNKYWDFSF HEIGKFDVPAAIDHILIHTHKPKIQYIGHSQGSTVFFVMCSERPNYAHKV NLMQALSPTVYLQENRSPVLKFLGMFKGKYSMLLNLLGGYEISAKTKLIQ QFRQHICSGSELGSSICAIFDFVLCGFDWKSFNTTLTPIVAAHASQGASA KQIYHYAQLQGDLNFQRFDHGAVLNRVRYESSEPPAYNLSQTTSKVVLHH GEGDWLGSTSDVIRLQERLPNLVESRKVNFEGFSHFDFTLSKDVRPLLYS HVLRHLSTSLSG
Drosophila melanogaster males transfer seminal fluid proteins along with sperm during mating. Among these proteins, ACPs (Accessory gland proteins) from the male's accessory gland induce behavioral, physiological, and life span reduction in mated females and mediate sperm storage and utilization. A previous evolutionary EST screen in D. simulans identified partial cDNAs for 57 new candidate ACPs. Here we report the annotation and confirmation of the corresponding Acp genes in D. melanogaster. Of 57 new candidate Acp genes previously reported in D. melanogaster, 34 conform to our more stringent criteria for encoding putative male accessory gland extracellular proteins, thus bringing the total number of ACPs identified to 52 (34 plus 18 previously identified). This comprehensive set of Acp genes allows us to dissect the patterns of evolutionary change in a suite of proteins from a single male-specific reproductive tissue. We used sequence-based analysis to examine codon bias, gene duplications, and levels of divergence (via dN/dS values and ortholog detection) of the 52 D. melanogaster ACPs in D. simulans, D. yakuba, and D. pseudoobscura. We show that 58% of the 52 D. melanogaster Acp genes are detectable in D. pseudoobscura. Sequence comparisons of ACPs shared and not shared between D. melanogaster and D. pseudoobscura show that there are separate classes undergoing distinctly dissimilar evolutionary dynamics.
BACKGROUND: The recent completion of the Drosophila melanogaster genomic sequence to high quality and the availability of a greatly expanded set of Drosophila cDNA sequences, aligning to 78% of the predicted euchromatic genes, afforded FlyBase the opportunity to significantly improve genomic annotations. We made the annotation process more rigorous by inspecting each gene visually, utilizing a comprehensive set of curation rules, requiring traceable evidence for each gene model, and comparing each predicted peptide to SWISS-PROT and TrEMBL sequences. RESULTS: Although the number of predicted protein-coding genes in Drosophila remains essentially unchanged, the revised annotation significantly improves gene models, resulting in structural changes to 85% of the transcripts and 45% of the predicted proteins. We annotated transposable elements and non-protein-coding RNAs as new features, and extended the annotation of untranslated (UTR) sequences and alternative transcripts to include more than 70% and 20% of genes, respectively. Finally, cDNA sequence provided evidence for dicistronic transcripts, neighboring genes with overlapping UTRs on the same DNA sequence strand, alternatively spliced genes that encode distinct, non-overlapping peptides, and numerous nested genes. CONCLUSIONS: Identification of so many unusual gene models not only suggests that some mechanisms for gene regulation are more prevalent than previously believed, but also underscores the complex challenges of eukaryotic gene prediction. At present, experimental data and human curation remain essential to generate high-quality genome annotations.
The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.
Drosophila melanogaster males transfer seminal fluid proteins along with sperm during mating. Among these proteins, ACPs (Accessory gland proteins) from the male's accessory gland induce behavioral, physiological, and life span reduction in mated females and mediate sperm storage and utilization. A previous evolutionary EST screen in D. simulans identified partial cDNAs for 57 new candidate ACPs. Here we report the annotation and confirmation of the corresponding Acp genes in D. melanogaster. Of 57 new candidate Acp genes previously reported in D. melanogaster, 34 conform to our more stringent criteria for encoding putative male accessory gland extracellular proteins, thus bringing the total number of ACPs identified to 52 (34 plus 18 previously identified). This comprehensive set of Acp genes allows us to dissect the patterns of evolutionary change in a suite of proteins from a single male-specific reproductive tissue. We used sequence-based analysis to examine codon bias, gene duplications, and levels of divergence (via dN/dS values and ortholog detection) of the 52 D. melanogaster ACPs in D. simulans, D. yakuba, and D. pseudoobscura. We show that 58% of the 52 D. melanogaster Acp genes are detectable in D. pseudoobscura. Sequence comparisons of ACPs shared and not shared between D. melanogaster and D. pseudoobscura show that there are separate classes undergoing distinctly dissimilar evolutionary dynamics.
BACKGROUND: The Drosophila melanogaster genome was the first metazoan genome to have been sequenced by the whole-genome shotgun (WGS) method. Two issues relating to this achievement were widely debated in the genomics community: how correct is the sequence with respect to base-pair (bp) accuracy and frequency of assembly errors? And, how difficult is it to bring a WGS sequence to the accepted standard for finished sequence? We are now in a position to answer these questions. RESULTS: Our finishing process was designed to close gaps, improve sequence quality and validate the assembly. Sequence traces derived from the WGS and draft sequencing of individual bacterial artificial chromosomes (BACs) were assembled into BAC-sized segments. These segments were brought to high quality, and then joined to constitute the sequence of each chromosome arm. Overall assembly was verified by comparison to a physical map of fingerprinted BAC clones. In the current version of the 116.9 Mb euchromatic genome, called Release 3, the six euchromatic chromosome arms are represented by 13 scaffolds with a total of 37 sequence gaps. We compared Release 3 to Release 2; in autosomal regions of unique sequence, the error rate of Release 2 was one in 20,000 bp. CONCLUSIONS: The WGS strategy can efficiently produce a high-quality sequence of a metazoan genome while generating the reagents required for sequence finishing. However, the initial method of repeat assembly was flawed. The sequence we report here, Release 3, is a reliable resource for molecular genetic experimentation and computational analysis.
BACKGROUND: Most eukaryotic genomes include a substantial repeat-rich fraction termed heterochromatin, which is concentrated in centric and telomeric regions. The repetitive nature of heterochromatic sequence makes it difficult to assemble and analyze. To better understand the heterochromatic component of the Drosophila melanogaster genome, we characterized and annotated portions of a whole-genome shotgun sequence assembly. RESULTS: WGS3, an improved whole-genome shotgun assembly, includes 20.7 Mb of draft-quality sequence not represented in the Release 3 sequence spanning the euchromatin. We annotated this sequence using the methods employed in the re-annotation of the Release 3 euchromatic sequence. This analysis predicted 297 protein-coding genes and six non-protein-coding genes, including known heterochromatic genes, and regions of similarity to known transposable elements. Bacterial artificial chromosome (BAC)-based fluorescence in situ hybridization analysis was used to correlate the genomic sequence with the cytogenetic map in order to refine the genomic definition of the centric heterochromatin; on the basis of our cytological definition, the annotated Release 3 euchromatic sequence extends into the centric heterochromatin on each chromosome arm. CONCLUSIONS: Whole-genome shotgun assembly produced a reliable draft-quality sequence of a significant part of the Drosophila heterochromatin. Annotation of this sequence defined the intron-exon structures of 30 known protein-coding genes and 267 protein-coding gene models. The cytogenetic mapping suggests that an additional 150 predicted genes are located in heterochromatin at the base of the Release 3 euchromatic sequence. Our analysis suggests strategies for improving the sequence and annotation of the heterochromatic portions of the Drosophila and other complex genomes.
BACKGROUND: Transposable elements are found in the genomes of nearly all eukaryotes. The recent completion of the Release 3 euchromatic genomic sequence of Drosophila melanogaster by the Berkeley Drosophila Genome Project has provided precise sequence for the repetitive elements in the Drosophila euchromatin. We have used this genomic sequence to describe the euchromatic transposable elements in the sequenced strain of this species. RESULTS: We identified 85 known and eight novel families of transposable element varying in copy number from one to 146. A total of 1,572 full and partial transposable elements were identified, comprising 3.86% of the sequence. More than two-thirds of the transposable elements are partial. The density of transposable elements increases an average of 4.7 times in the centromere-proximal regions of each of the major chromosome arms. We found that transposable elements are preferentially found outside genes; only 436 of 1,572 transposable elements are contained within the 61.4 Mb of sequence that is annotated as being transcribed. A large proportion of transposable elements is found nested within other elements of the same or different classes. Lastly, an analysis of structural variation from different families reveals distinct patterns of deletion for elements belonging to different classes. CONCLUSIONS: This analysis represents an initial characterization of the transposable elements in the Release 3 euchromatic genomic sequence of D. melanogaster for which comparison to the transposable elements of other organisms can begin to be made. These data have been made available on the Berkeley Drosophila Genome Project website for future analyses.
BACKGROUND: The recent completion of the Drosophila melanogaster genomic sequence to high quality and the availability of a greatly expanded set of Drosophila cDNA sequences, aligning to 78% of the predicted euchromatic genes, afforded FlyBase the opportunity to significantly improve genomic annotations. We made the annotation process more rigorous by inspecting each gene visually, utilizing a comprehensive set of curation rules, requiring traceable evidence for each gene model, and comparing each predicted peptide to SWISS-PROT and TrEMBL sequences. RESULTS: Although the number of predicted protein-coding genes in Drosophila remains essentially unchanged, the revised annotation significantly improves gene models, resulting in structural changes to 85% of the transcripts and 45% of the predicted proteins. We annotated transposable elements and non-protein-coding RNAs as new features, and extended the annotation of untranslated (UTR) sequences and alternative transcripts to include more than 70% and 20% of genes, respectively. Finally, cDNA sequence provided evidence for dicistronic transcripts, neighboring genes with overlapping UTRs on the same DNA sequence strand, alternatively spliced genes that encode distinct, non-overlapping peptides, and numerous nested genes. CONCLUSIONS: Identification of so many unusual gene models not only suggests that some mechanisms for gene regulation are more prevalent than previously believed, but also underscores the complex challenges of eukaryotic gene prediction. At present, experimental data and human curation remain essential to generate high-quality genome annotations.
The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.