Ensembl .

 

Introduction

All data sets generated by the Ensembl project are freely available to download from the ftp.ensembl.org site. Please see also the disclaimer.

Please note: Ensembl supports downloading of many more correlation tables via the highly customisable BioMart data mining tool. You may find exploring this web-based data mining tool easier than extracting information from our normalised database dumps.

Additionally, Ensembl would like to encourage users to directly extract information from our databases via SQL rather than downloading huge flat files. We offer a public MySQL interface at ensembldb.ensembl.org that accepts SQL queries as user 'anonymous'. Client programs for accessing this interface are available via MySQL.

It is also possible to install the Ensembl API code locally and configure it to access databases on ensembldb.ensembl.org. Users can develop their own analysis scripts to access Ensembl's object orientated representation of biological objects. This is much easier than querying SQL and avoids downloading huge databases.

The Ensembl FTP Site

Ensembl provides sequence databases of gene, transcript and protein predictions. These sequences are suitable for a local installation of a sequence similarity search system. MySQL database table dumps are available for all databases underlying the Ensembl system. These text format dumps can be imported into relational database management systems which would enable installation of a complete Ensembl mirror site.

There are five types of data dumps for each species on the FTP site:

  • FASTA - FASTA sequence databases of Ensembl gene, transcript and protein model predictions. Since the FASTA format does not permit sequence annotation, these database files are mainly intended for use with local sequence similarity search algorithms. Please see the FASTA files document for a more detailed description of the header line format and the file naming conventions.

    • DNA - Masked and unmasked genome sequences associated with the assembly (contigs, chromosomes etc.).

    • cDNA - cDNA sequences for Ensembl or ab initio predicted genes.

    • Peptides - Protein sequences for Ensembl or ab initio predicted genes.

    • RNA - Non-coding RNA gene preditions.

  • Flatfile - Flat files allow more extensive sequence annotation by means of feature tables and contain thus the genome sequence as annotated by the automated Ensembl genome annotation pipeline. Each nucleotide sequence record in a flat file represents a 1Mb slice of the genome sequence. Flat files are broken into chunks of 1000 sequence records for easier downloading.

  • MySQL - Database table dumps in text format, as well as SQL table definition files should be portable to any SQL database. Generally, the FTP directory tree contains one one directory per database. For more information about these databases and their associated Application Programming Interfaces (or APIs) see the software section.

  • GTF - Gene sets for each species. These files include annotations of both coding and non-coding genes. This file format is described here.

  • EMF flatfile dumps - Alignments of resequencing data are available for several species as Ensembl Multi Format (EMF) flatfile dumps. The accompanying README file describes the file format.

    Also, the same format is used to dump whole-genome multiple alignments as well as gene-based multiple alignments and phylogentic trees used to infer Ensembl orthologues and paralogues. These files are available in the ensembl_compara database which will be found in the multi_species directory.

These additional documents explain the FTP directory structure and the FASTA file naming and header conventions used on this site.

Species DNA cDNA Peptides EMBL GenBank MySQL GTF EMF
Aedes aegypti (yellow fever mosquito) FTP FTP FTP FTP FTP FTP FTP -
Anopheles gambiae (African malaria mosquito) FTP FTP FTP FTP FTP FTP FTP -
Bos taurus (cattle) FTP FTP FTP FTP FTP FTP FTP -
Caenorhabditis elegans (nematode) FTP FTP FTP FTP FTP FTP FTP -
Canis familiaris (dog) FTP FTP FTP FTP FTP FTP FTP -
Cavia porcellus (domestic guinea pig) FTP FTP FTP FTP FTP FTP FTP -
Ciona intestinalis (Sea squirt Ciona intestinalis) FTP FTP FTP FTP FTP FTP FTP -
Ciona savignyi (Sea squirt Ciona savignyi) FTP FTP FTP FTP FTP FTP FTP -
Danio rerio (zebrafish) FTP FTP FTP FTP FTP FTP FTP -
Dasypus novemcinctus (nine-banded armadillo) FTP FTP FTP FTP FTP FTP FTP -
Drosophila melanogaster (fruit fly) FTP FTP FTP FTP FTP FTP FTP -
Echinops telfairi (small Madagascar hedgehog) FTP FTP FTP FTP FTP FTP FTP -
Erinaceus europaeus (western European hedgehog) FTP FTP FTP FTP FTP FTP FTP -
Felis catus (cat) FTP FTP FTP FTP FTP FTP FTP -
Gallus gallus (chicken) FTP FTP FTP FTP FTP FTP FTP -
Gasterosteus aculeatus (three spined stickleback) FTP FTP FTP FTP FTP FTP FTP -
Homo sapiens (human) FTP FTP FTP FTP FTP FTP FTP FTP
Loxodonta africana (African savanna elephant) FTP FTP FTP FTP FTP FTP FTP -
Macaca mulatta (rhesus monkey) FTP FTP FTP FTP FTP FTP FTP -
Monodelphis domestica (gray short-tailed opossum) FTP FTP FTP FTP FTP FTP FTP -
Mus musculus (house mouse) FTP FTP FTP FTP FTP FTP FTP FTP
Myotis lucifugus (little brown bat) FTP FTP FTP FTP FTP FTP FTP -
Ornithorhynchus anatinus (platypus) FTP FTP FTP FTP FTP FTP FTP -
Oryctolagus cuniculus (rabbit) FTP FTP FTP FTP FTP FTP FTP -
Oryzias latipes (Japanese medaka) FTP FTP FTP FTP FTP FTP FTP -
Otolemur garnettii (small-eared galago) FTP FTP FTP FTP FTP FTP FTP -
Pan troglodytes (chimpanzee) FTP FTP FTP FTP FTP FTP FTP -
Rattus norvegicus (Norway rat) FTP FTP FTP FTP FTP FTP FTP FTP
Saccharomyces cerevisiae (baker's yeast) FTP FTP FTP FTP FTP FTP FTP -
Sorex araneus (European shrew) FTP FTP FTP FTP FTP FTP FTP -
Spermophilus tridecemlineatus (thirteen-lined ground squirrel) FTP FTP FTP FTP FTP FTP FTP -
Takifugu rubripes (torafugu) FTP FTP FTP FTP FTP FTP FTP -
Tetraodon nigroviridis (Fresh water pufferfish) FTP FTP FTP FTP FTP FTP FTP -
Tupaia belangeri (northern tree shrew) FTP FTP FTP FTP FTP FTP FTP -
Xenopus tropicalis (western clawed frog) FTP FTP FTP FTP FTP FTP FTP -
Multi-species - - - - - FTP - FTP
Ensembl Mart - - - - - FTP - -

 

© 2008 WTSI / EBI. Ensembl is available to download for public use - please see the code licence for details.

                
Ensembl release 47 - Oct 2007
HOME · BLAST · BIOMART · SITEMAP HELP