Skip to content

Quickstart

Installation of ensembl-tui creates a command line tool eti which contains a number of subcommands that allow you to acquire and then sample from Ensembl genomic datasets. The general workflow is:

  1. Create a demo config file
  2. Download raw data from Ensembl
  3. Make the local installation
  4. Show summaries of the installation
  5. Export gene meta-data
  6. Export homology data

Create A Demo Config

You specify the genomic resources that you want from Ensembl using a config file. ensembl-tui comes with an example file with comments describing what each of the components of that file are.

$ eti demo-config -o demo
Contents written to demo

Note

The config specifies download the genomes and annotations from Ensembl release 115 of Saccharomyces cerevisiae and Caenorhabditis elegans and gene homology data. It also specifies the path to write the downloaded files and where to install them.

Warning

Edit this file before using! It also specifies primate whole genome alignments -- which are large!

Download The Specified Data

We use a custom config file which specifies just "yeast" and "worm". (You can do this yourself by downloading the small.cfg and executing the following command

$ eti download -c <path to>/small.cfg

The data will be downloaded to staging_path specified in small.cfg, which is interpreted relative to the directory in which you executed the command.

Note

If a download is interrupted and restarted, eti resumes downloads from where it stopped.

Make The Local Installation

$ eti install -d data/small-download
Installing features ๐Ÿ“š โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 100% 0:00:00 0:02:17
Installing  ๐Ÿงฌ๐Ÿงฌ       โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 100% 0:00:00 0:00:01
Installing homologies  โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 100% 0:00:00 0:00:42
Contents installed to 
'/home/runner/work/ensembl_tui/ensembl_tui/docs/data/small-install'

Show Summaries Of The Installation

The Top Level

$ eti installed -i data/small-install
Ensembl release: 115
Installed genomes:                                                             
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ abbrev   โ”ƒ genome                   โ”ƒ common name                           โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ cae-eleg โ”‚ caenorhabditis_elegans   โ”‚ caenorhabditis elegans (nematode, n2) โ”‚
โ”‚ sac-cere โ”‚ saccharomyces_cerevisiae โ”‚ saccharomyces cerevisiae              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Installed homologies: โœ…
Installed alignments: โŒ
Installation software versions:    
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ package           โ”ƒ version     โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ click             โ”‚ 8.3.1       โ”‚
โ”‚ cogent3           โ”‚ 2026.1.12a1 โ”‚
โ”‚ cogent3_h5seqs    โ”‚ 0.7.3       โ”‚
โ”‚ duckdb            โ”‚ 1.4.3       โ”‚
โ”‚ ensembl_tui       โ”‚ 0.7.5       โ”‚
โ”‚ numba             โ”‚ 0.63.1      โ”‚
โ”‚ numpy             โ”‚ 2.3.5       โ”‚
โ”‚ polars            โ”‚ 1.37.1      โ”‚
โ”‚ pyarrow           โ”‚ 23.0.0      โ”‚
โ”‚ rich              โ”‚ 14.2.0      โ”‚
โ”‚ scitrack          โ”‚ 2024.10.8   โ”‚
โ”‚ trogon            โ”‚ 0.6.0       โ”‚
โ”‚ typing_extensions โ”‚ 4.15.0      โ”‚
โ”‚ unsync            โ”‚ 1.4.0       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Summary Of A Species

$ eti species-summary -i data/small-install --species sac-cere
Saccharomyces cerevisiae        
features                        
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ biotype              โ”ƒ count โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ ncRNA                โ”‚    18 โ”‚
โ”‚ snoRNA               โ”‚    77 โ”‚
โ”‚ tRNA                 โ”‚   299 โ”‚
โ”‚ protein_coding       โ”‚ 6,600 โ”‚
โ”‚ rRNA                 โ”‚    24 โ”‚
โ”‚ snRNA                โ”‚     6 โ”‚
โ”‚ transposable_element โ”‚    91 โ”‚
โ”‚ pseudogene           โ”‚    12 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Saccharomyces cerevisiae repeat                        
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ repeat_type            โ”ƒ repeat_class       โ”ƒ count โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ Dust                   โ”‚ dust               โ”‚ 8,662 โ”‚
โ”‚ LTRs                   โ”‚ LTR/Gypsy          โ”‚    63 โ”‚
โ”‚ LTRs                   โ”‚ LTR/Copia          โ”‚   464 โ”‚
โ”‚ Low complexity regions โ”‚ Low_complexity     โ”‚     2 โ”‚
โ”‚ Other repeats          โ”‚ Other/subtelomeric โ”‚    35 โ”‚
โ”‚ RNA repeats            โ”‚ rRNA               โ”‚     7 โ”‚
โ”‚ Simple repeats         โ”‚ Simple_repeat      โ”‚   149 โ”‚
โ”‚ Tandem repeats         โ”‚ trf                โ”‚ 3,210 โ”‚
โ”‚ Type II Transposons    โ”‚ DNA                โ”‚     2 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Note

We can use the abbrev listed above to identify the species.

Summary Of Compara

$ eti compara-summary -i data/small-install
Homology types                     
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ homology_type          โ”ƒ  count โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ gene_split             โ”‚     44 โ”‚
โ”‚ within_species_paralog โ”‚ 11,998 โ”‚
โ”‚ ortholog_one2one       โ”‚  2,878 โ”‚
โ”‚ other_paralog          โ”‚  6,476 โ”‚
โ”‚ ortholog_one2many      โ”‚  1,509 โ”‚
โ”‚ ortholog_many2many     โ”‚  1,344 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

This shows the relationships between the species installed.

Export Gene meta-data

$ eti dump-genes -i data/small-install --species sac-cere --outdir yeast
Finished: wrote 'yeast/saccharomyces_cerevisiae-115-gene_metadata.tsv'!

Show the first five lines of the output file.

$ head -n 5 yeast/saccharomyces_cerevisiae*.tsv
species seqid   source  biotype transcript_biotypes num_transcripts start   stop    strand  symbol  description
saccharomyces_cerevisiae    XV      protein_coding  protein_coding  1   234939  238185  1   GAL11   Subunit of the RNA polymerase II mediator complex; associates with core polymerase subunits to form the RNA polymerase II holoenzyme; affects transcription by acting as target of activators and repressors; forms part of the tail domain of mediator [Source:SGD;Acc:S000005411]
saccharomyces_cerevisiae    XV      protein_coding  protein_coding  1   231569  231755  -1  DDR2    Multi-stress response protein; expression is activated by a variety of xenobiotic agents and environmental or physiological stresses; DDR2 has a paralog, HOR7, that arose from the whole genome duplication [Source:SGD;Acc:S000005413]
saccharomyces_cerevisiae    XV      protein_coding  protein_coding  1   219210  220473  1   ARG1    Arginosuccinate synthetase; catalyzes the formation of L-argininosuccinate from citrulline and L-aspartate in the arginine biosynthesis pathway; potential Cdc28p substrate [Source:SGD;Acc:S000005419]
saccharomyces_cerevisiae    VII     protein_coding  protein_coding  1   903473  904748  -1  PCT1    Cholinephosphate cytidylyltransferase; a rate-determining enzyme of the CDP-choline pathway for phosphatidylcholine synthesis, inhibited by Sec14p, activated upon lipid-binding; contains an element within the regulatory domain involved in both silencing and activation of enzymatic activity [Source:SGD;Acc:S000003434]

Export Homology Data

$ eti homologs -i data/small-install --ref cae-eleg --outdir worm_yeast --homology_type ortholog_one2one --limit 5
Homolog search โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 100% 0:00:00 0:00:00
Extracting ๐Ÿงฌ  โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 100% 0:00:00 0:00:00

Listing the files that are written into the specified worm_yeast directory.

$ ls worm_yeast
WBGene00016623.fa
WBGene00017749.fa
WBGene00018723.fa
WBGene00020113.fa
WBGene00194707.fa
logs
md5
not_completed

Note

The not_completed directory will contain any errors that occurred during the homolog command.