Advanced Search — ENA Browser Documentation 1 documentation (2024)

Introduction

Customise your own search query and retrieve a set of ENA records tailored to yoursearch criteria.

All searches are performed against a subset of the archive specified bythe Data type you choose to search against. You can then build your searchquery to specify what data you are looking for and select what fields you want toretrieve from your search. There are additional options to include/exclude specificdatasets as well as filter the number of results you wish to return.

If you intend to repeat the same search at a later date, you can save thisas a Rule using Rulespace. If youwant to access the same data programmatically, you can copy the produced curl request and runthis yourself against the ENA Portal API.

Data Types

Each data type is a subset of the data and metadata held within the ENA.

You must specify which datatype you wish to search across before you cannarrow down your search results with any additional criteria.

What is in each data type?

Data Type	Description	API Result type
Studies	All studies in ENA. Studies can be searched by the study metadata, e.g. title or by related taxonomy of data.	study
Studies used for raw reads	All studies that hold raw read datasets. Searching across this data type can give you information on these studies and can be filtered down by sample metadata (information about the collected samples in the study).	read_study
Studies used for nucleotide sequence analyses from reads	All studies that hold any nucleotide analyses. Searching across this data type can give you information on these studies and can be filtered down by sample metadata (information about the collected samples in the study).	analysis_study
Samples	All samples in ENA. Samples can be searched by sample metadata.	sample
Experiments used for raw reads	All metadata associated with sequencing event used to generate raw reads submitted to ENA.	read_experiment
Raw reads	All raw read datasets in ENA. This datatype can be filtered by sample metadata (information about the collected sample) as well as return directly downloadable files.	read_run
Nucleotide sequence analyses from reads	All nucleotide analyses held in ENA. This datatype can be filtered by sample metadata (information about the collected sample) as well as return directly downloadable files.	analysis
Genome assemblies	All genome assembly records (excluding primary or binned metagenomes which are considered analyses). This datatype can only be filtered by assembly metadata e.g. project accession or taxonomy.	assembly
Genome assembly contig sets (WGS)	All genome assembly contig sets in ENA. This datatype can be filtered by sample metadata (information about the collected sample).	wgs_set
Transcriptome assembly contig sets (TSA)	All transcriptome assembly contig sets in ENA. This datatype can be filtered by sample metadata (information about the collected sample).	tsa_set
Nucleotide sequences	All assembled and annotated sequences publicly available in ENA (including high level sequences from assembly submissions e.g. fully assembled chromosomes).	sequence
Protein coding sequences	All protein coding sequences publicly available in ENA	coding
Non-coding sequences	All non coding sequences publicly available in ENA	noncoding
Taxonomic classification	All taxa and their tax IDs. Search for a specific taxa or look for all taxa below a tax node.	taxon

Search Query

To build a query you need to create a list of rules that the resultingdata sets should be restricted to.

This is done by clicking any relevant metadata types you would like torestrict (listed as buttons on the left) then selecting the relevant filtersand specifying the desired restrictions for those:

Advanced Search — ENA Browser Documentation 1 documentation (1)

When specifying taxonomy in your search, you can include all subordinate taxawithin that tax tree and when searching by geographical range you caninteractively drag the box or circle over the desired region to automatically fill outthe location range.

These rules can be grouped and nested within AND or OR logical statements.For example, a query for all metagenomic analyses where the sample wascollected after 01 Jan 2019 AND the environmental material is either dental ORsaliva would look as follows:

Inclusion/Exclusion of datasets

If there are any known public datasets that do or do not fit the criteriayou have specified that you wish to include or exclude from the results,you can list the accessions in a comma separated list here (with no spaces).

Return Fields

By default, you will receive the accession and description/titleof the main datatype you are searching against. If you wish to customise themetadata which your search will return, you can manually select your searchreturn fields from a list of all indexed fields for the specified datatype.

Select and order fields

To select fields you would like returned from your search, drag across anydesired fields from the Available Fields list to the Selected Fieldslist. Alternatively, use the arrow buttons in the middle to move fields acrossfrom one list to the other.

The order of the Selected Fields list will define the order that youreceive those metadata from your search. To specify the return order of thesefields, you can drag and drop these into the desired order.

Field sets

Field sets are a pre-defined set of fields that can be returned together andare available for some data types. For example, for the analysis datatype,you can toggle the ‘Submitted Files’ field set which can be used to returnall relavent fields relating to the original set of submitted files (e.g.this set includes the aspera, ftp and galaxy links for the submitted files,the size of the files (in bytes) and the files’ md5 checksums).

Data Filters

Offset

You can specify an offset for the number of records you would like to skipfrom the beginning of your search. This can be used to view resultsbeyond the maximum number of records that can be viewed in theresults table (100 000) or to break up queries that result in a largenumber of records into smaller batches.

If you do not wish to skip any records, you can leave this field blank orenter an offset of ‘0’.

Limit

You can specify a data limit for the maximum number of records you would liketo retrieve from your search.

If you wish to fetch the full result set enter ‘0’. Leaving the limit field blank appliesthe deafult limit of 100 000. For largeresult sets, to get all records please download the report (JSON/TSV) or copy and runthe curl command outside of the browser.

Download ENA records

Here you can download the ENA records resulting from your search.

This will download the whole ENA record stored for each of the results. If youwish to only download the fields returned that were specified in your search,use one of the Download report options (JSON or TSV).

XML records

XML records are available for all standard metadata objects held within ENA (allresults with the exception of sequence records).

XML records hold all the metadata for each object concatonated into a singlebulk XML file. These XML metadata records are formatted in the standard ENA XMLformat (the same XML format that is used for data submission and for data to bedisplayed in the browser).

FASTA records

FASTA records hold all sequences resulting from your search concatonated into oneFASTA file. FASTA records are only available when searching against sequencedatatypes.

EMBL records

EMBL flatfile records hold all sequences resulting from your search and their annotation (ifavailable) concatenated into a single EMBL flat file. EMBL records are only availablewhen searching against sequence datatypes.

Download results report

This feature allows you to download all the results from your search in theformat of a JSON or TSV file. Any data filters set by you will apply here.

Download associated data files

Pre-Conditions

To see file download columns in your results, you have to search against eitherthe analysis or read_run data types and select the relevant fields that end with ‘_ftp’.

For example:

Data Type = analysis and fields = submitted_ftp

Data Type = read_run and fields = fastq_ftp / sra_ftp / submitted_ftp

Download data files

You can download the data files resulting from your search in one of four ways:

The ENA File Downloader is a new Java based command line application. You have tosubmit one or more comma separated accessions, or a file with accessions that youwant to download data for. This tool allows downloading of read and analysis files,using FTP or Aspera. It has an easy to use interactive interface and can also createa script which can be run programatically or integrated with pipelines.
Download the latest version fromENA Tools.
or from github at ena-ftp-downloader.
You can download a single file by clicking on its link in the FASTQ FTP, SRA FTP, or SUBMITTED FTP column.
You can select one or more files using the check boxes, and either download a downloader script which you can run separatelyor as individual files using the “Get download script” or “Download selected files” links above the table.
You can download a script that contains the download requests for ALL files resulting from your search by clicking the download icon in the column header.

Advanced Search — ENA Browser Documentation 1 documentation (3)

Tips:

If you wish to exclude any records from your search results before you download all the resulting files,you can go back and list these in the “Exclude Accessions” field and then repeat the search.
If you selected multiple files and clicked the “Individually” link but only the first file is downloading, this couldbe because your browser is restricting multiple download pop-ups. Look for a browser warning or confirmation dialogto allow this.
If selesting many files and using the download “Individually” option, you may wish to changethe default download location of your browser. Look in your browser settings for this.
You can also download files using a terminal from ENA using enaBrowserTools.