Installing Ensembl Data¶
The install step converts the downloaded data into more efficient data structures. These are written to the install_path as specified in the config file.
The install command requires the path to the download directory.
$ eti install -d <dirname>
Note
You can utilize multiple processes on your machine for this installation step with the -np # argument. We recommend specifying the same number of processes as the number of genomes, e.g. -np 10 for ten genomes.
Warning
At present, installation is not interruptible. If you restart an installation, you will need to force overwriting of the current one using the --force_overwrite argument.
What Is Installed¶
$ ls data/apes-115
compara
genomes
installed.cfg
The installed.cfg file also specifies the Ensembl release, the software versions used during installation, and under the section species_map, the mapping of genome names to different "names", such as the abbreviation. This is just a plain text file which you can edit.
Note
Changing the abbreviations to something that you find easier to type can be useful as these are employed in the command line interface.
$ cat data/apes-115/installed.cfg
[release]
release = 115
[software versions]
ensembl_tui = 0.4.3
cogent3_h5seqs = 0.7.0
numpy = 2.3.3
polars = 1.33.1
typing_extensions = 4.15.0
trogon = 0.6.0
duckdb = 1.3.1
unsync = 1.4.0
rich = 14.1.0
click = 8.2.1
scitrack = 2024.10.8
pyarrow = 21.0.0
numba = 0.62.1
cogent3 = 2025.9.8a3
[species_map]
header = genome_name abbrev common_name db_prefix
pan_troglodytes = chimp chimpanzee pan_troglodytes
gorilla_gorilla = gorilla gorilla gorilla_gorilla
homo_sapiens = human human homo_sapiens
The output also lists the versions of the software dependencies that were present at the time of installation. This is intended for debugging purposes.
Check Your Installation¶
Once you have finished your installation, you can check its contents using the installed command. This includes the listing of software versions at the time of the installation (useful for troubleshooting) plus species names, abbreviations etc..
$ eti installed -i data/apes-115
Ensembl release: 115
Installed genomes:
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ abbrev ┃ genome ┃ common name ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ gorilla │ gorilla_gorilla │ gorilla │
│ human │ homo_sapiens │ human │
│ chimp │ pan_troglodytes │ chimpanzee │
└─────────┴─────────────────┴─────────────┘
Installed homologies: ✅
Installed alignments: ✅
Installation software versions:
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ package ┃ version ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ click │ 8.2.1 │
│ cogent3 │ 2025.9.8a3 │
│ cogent3_h5seqs │ 0.7.0 │
│ duckdb │ 1.3.1 │
│ ensembl_tui │ 0.4.3 │
│ numba │ 0.62.1 │
│ numpy │ 2.3.3 │
│ polars │ 1.33.1 │
│ pyarrow │ 21.0.0 │
│ rich │ 14.1.0 │
│ scitrack │ 2024.10.8 │
│ trogon │ 0.6.0 │
│ typing_extensions │ 4.15.0 │
│ unsync │ 1.4.0 │
└───────────────────┴────────────┘
Note
Here we start specifying the installation directory using the -i option. This is required for all commands that reference an installation.