Quickstart¶
Installation of ensembl-tui creates a command line tool eti which contains a number of subcommands that allow you to acquire and then sample from Ensembl genomic datasets. The general workflow is:
- Create a demo config file
- Download raw data from Ensembl
- Make the local installation
- Show summaries of the installation
- Export gene meta-data
- Export homology data
Create A Demo Config¶
You specify the genomic resources that you want from Ensembl using a config file. ensembl-tui comes with an example file with comments describing what each of the components of that file are.
$ eti demo-config -o demo
Contents written to demo
Note
The config specifies download the genomes and annotations from Ensembl release 115 of Saccharomyces cerevisiae and Caenorhabditis elegans and gene homology data. It also specifies the path to write the downloaded files and where to install them.
Warning
Edit this file before using! It also specifies primate whole genome alignments -- which are large!
Download The Specified Data¶
We use a custom config file which specifies just "yeast" and "worm". (You can do this yourself by downloading the small.cfg and executing the following command
$ eti download -c <path to>/small.cfg
The data will be downloaded to staging_path specified in small.cfg, which is interpreted relative to the directory in which you executed the command.
Note
If a download is interrupted and restarted, eti resumes downloads from where it stopped.
Make The Local Installation¶
$ eti install -d data/small-download
Installing features ๐ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100% 0:00:00 0:02:17
Installing ๐งฌ๐งฌ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100% 0:00:00 0:00:01
Installing homologies โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100% 0:00:00 0:00:42
Contents installed to
'/home/runner/work/ensembl_tui/ensembl_tui/docs/data/small-install'
Show Summaries Of The Installation¶
The Top Level¶
$ eti installed -i data/small-install
Ensembl release: 115
Installed genomes:
โโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ abbrev โ genome โ common name โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ cae-eleg โ caenorhabditis_elegans โ caenorhabditis elegans (nematode, n2) โ
โ sac-cere โ saccharomyces_cerevisiae โ saccharomyces cerevisiae โ
โโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Installed homologies: โ
Installed alignments: โ
Installation software versions:
โโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโ
โ package โ version โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ click โ 8.3.1 โ
โ cogent3 โ 2026.1.12a1 โ
โ cogent3_h5seqs โ 0.7.3 โ
โ duckdb โ 1.4.3 โ
โ ensembl_tui โ 0.7.5 โ
โ numba โ 0.63.1 โ
โ numpy โ 2.3.5 โ
โ polars โ 1.37.1 โ
โ pyarrow โ 23.0.0 โ
โ rich โ 14.2.0 โ
โ scitrack โ 2024.10.8 โ
โ trogon โ 0.6.0 โ
โ typing_extensions โ 4.15.0 โ
โ unsync โ 1.4.0 โ
โโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโ
Summary Of A Species¶
$ eti species-summary -i data/small-install --species sac-cere
Saccharomyces cerevisiae
features
โโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโ
โ biotype โ count โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ ncRNA โ 18 โ
โ snoRNA โ 77 โ
โ tRNA โ 299 โ
โ protein_coding โ 6,600 โ
โ rRNA โ 24 โ
โ snRNA โ 6 โ
โ transposable_element โ 91 โ
โ pseudogene โ 12 โ
โโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโ
Saccharomyces cerevisiae repeat
โโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโ
โ repeat_type โ repeat_class โ count โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ Dust โ dust โ 8,662 โ
โ LTRs โ LTR/Gypsy โ 63 โ
โ LTRs โ LTR/Copia โ 464 โ
โ Low complexity regions โ Low_complexity โ 2 โ
โ Other repeats โ Other/subtelomeric โ 35 โ
โ RNA repeats โ rRNA โ 7 โ
โ Simple repeats โ Simple_repeat โ 149 โ
โ Tandem repeats โ trf โ 3,210 โ
โ Type II Transposons โ DNA โ 2 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโ
Note
We can use the abbrev listed above to identify the species.
Summary Of Compara¶
$ eti compara-summary -i data/small-install
Homology types
โโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโ
โ homology_type โ count โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ gene_split โ 44 โ
โ within_species_paralog โ 11,998 โ
โ ortholog_one2one โ 2,878 โ
โ other_paralog โ 6,476 โ
โ ortholog_one2many โ 1,509 โ
โ ortholog_many2many โ 1,344 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโ
This shows the relationships between the species installed.
Export Gene meta-data¶
$ eti dump-genes -i data/small-install --species sac-cere --outdir yeast
Finished: wrote 'yeast/saccharomyces_cerevisiae-115-gene_metadata.tsv'!
Show the first five lines of the output file.
$ head -n 5 yeast/saccharomyces_cerevisiae*.tsv
species seqid source biotype transcript_biotypes num_transcripts start stop strand symbol description
saccharomyces_cerevisiae XV protein_coding protein_coding 1 234939 238185 1 GAL11 Subunit of the RNA polymerase II mediator complex; associates with core polymerase subunits to form the RNA polymerase II holoenzyme; affects transcription by acting as target of activators and repressors; forms part of the tail domain of mediator [Source:SGD;Acc:S000005411]
saccharomyces_cerevisiae XV protein_coding protein_coding 1 231569 231755 -1 DDR2 Multi-stress response protein; expression is activated by a variety of xenobiotic agents and environmental or physiological stresses; DDR2 has a paralog, HOR7, that arose from the whole genome duplication [Source:SGD;Acc:S000005413]
saccharomyces_cerevisiae XV protein_coding protein_coding 1 219210 220473 1 ARG1 Arginosuccinate synthetase; catalyzes the formation of L-argininosuccinate from citrulline and L-aspartate in the arginine biosynthesis pathway; potential Cdc28p substrate [Source:SGD;Acc:S000005419]
saccharomyces_cerevisiae VII protein_coding protein_coding 1 903473 904748 -1 PCT1 Cholinephosphate cytidylyltransferase; a rate-determining enzyme of the CDP-choline pathway for phosphatidylcholine synthesis, inhibited by Sec14p, activated upon lipid-binding; contains an element within the regulatory domain involved in both silencing and activation of enzymatic activity [Source:SGD;Acc:S000003434]
Export Homology Data¶
$ eti homologs -i data/small-install --ref cae-eleg --outdir worm_yeast --homology_type ortholog_one2one --limit 5
Homolog search โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100% 0:00:00 0:00:00
Extracting ๐งฌ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100% 0:00:00 0:00:00
Listing the files that are written into the specified worm_yeast directory.
$ ls worm_yeast
WBGene00016623.fa
WBGene00017749.fa
WBGene00018723.fa
WBGene00020113.fa
WBGene00194707.fa
logs
md5
not_completed
Note
The not_completed directory will contain any errors that occurred during the homolog command.