Sign In

Documentation

Manual

Assembly Management Tutorial

An assembly represents a group of related genomic sequence and annotated features on that sequence. The assembly itself has a name and a few configuration options for representing this dataset. Assemblies can also be used to share datasets with a group of users.

Creating Assemblies

To create a new assembly, navigate to the assembly listing (Data -> Assemblies) and click the [Create Assembly] button on the top bar.

Asm form e4f128bf552b5ce0c929ad551b3dccca441b80a07e5b787bf5f00ae22ff315ae

A name and version is required for every assembly. After submitting the form, your new assembly will be listed in the assemblies table with links to assembly details and edit form.

Name
GxSeq stores NCBI taxonomy names for use with assemblies. The name field will autocomplete from this dataset. If you can't find the species you are studying, or this is a multi-organism assembly, you can type any descriptive name in the field.

Version
The same organism may be uploaded multiple times into the site. To help differentiate, each assembly requires a version along with the name. For example Arabidospsis thaliana might have TAIR version 9 and TAIR version 10. This should be concise but it can be any text you choose. Version numbers are displayed in parenthesis after the assembly name: TAIR (10)

Group
If you want to share this assembly and its data with other users, assign it to a group. All of the users in this group will have access to the assembly.

Updates
You can update your assemblies at any time by clicking the Edit link in the assembly listing. All assembly attributes can be changed at any time.

Adding Sequence

Sequence can be any string of nucleotides applicable to your experiment. Chromosomes, scaffolds and de-novo assembled transcripts are all valid. New sequence must be uploaded to an existing Assembly in FASTA format.

Fasta form 15d4262c2a3cd083b32fa4a7f4e3de3fe288ce6c02181ac34c10b8674f3e5065

Start by clicking 'Add Sequence' on the assembly details page.

After selecting a FASTA file from you local system, a preview of the sequence will be displayed. A button below the preview 'Check Format' will show how the sequence accession and description will be parsed by the database.

DeNovo Contigs
If you are loading sequence for denovo contigs from an RNA-Seq experiment, or another source of sequence with no annotation, you may want to add a feature to each contig. Entering a feature name into the Feature Type field will create these features for you automatically. For Transcriptome studies, we suggest using mRNA as the feature type. These features will be used to upload expression and functional annotation data for transcriptome.

Rename Enumeration
If you want to rename the contigs you can enter a prefix in the 'Re-number Prefix' field. Contigs will be enumerated and given unique names with your prefix + enumeration. For example the prefix "Contig" will create names:
Contig000001, Contig000002 ...

The format used to pad zeros can also be changed. By default it is your prefix followed by a six digit padded decimal.

Adding Features

Features represent interesting regions of genomic sequence. They may have functional annotations, multiple locations, and expression data assigned to them. Generally expression data is assigned to features of type Gene or mRNA. Features must be uploaded onto existing sequence in the GFF format.

Start by clicking 'Add Features' on the assembly details page.

Gff form 6fcff99e9528b1ad98a0e80b5cfe6925f77a448e71fee99c1bf33de8c307503d

Sequence Concordance
It is important to match the sequence identifiers in the GFF file with the sequence identifiers in GX. To assist with this, GX stores Concordance Sets or alias files of Sequence ID's. A default concordance set is creating when sequence is uploaded. Additional concordance sets can be created with a simple cut and paste interface as described below.

ID Attribute
GFF files have 1 start and stop position per line. Annotations in GxSeq can have multiple locations. To help convert GFF entries an ID attribute can be selected. Only this first entry for each unique value will be entered. Subsequent GFF entries with the same ID value will only have their locations recorded.

Type Selection
It some feature types in the file are not important or will add clutter to visual representations they can be skipped during load.

Sequence Lookup
After selecting your GFF file a preview and results of database lookup will be displayed. It is important to check these results and address any issues.

Browsing
After upload, the features listing will include your new data. You can view details pages for each feature or visualize them in the genomic context. The features will also be available for further annotation and expression upload.

Concordance Sets

Alias form 182d6245f565fad7a4c7b65f6334d8f25adc0865475f50b4b5893cf4cc296cb7

You may want to load data that has sequence identifiers different from the database accessions. Concordance sets allow you to do this. You will need to enter a table of ID's with 1 row per sequence. Each row should contain the current database ID, followed by the new alias in your file.

Start by clicking 'New Concordance' on the concordance set listing

Aliases can be comma, tab or whitespace delimited. After creating a new concordance, you can use it to upload feature data or sample files such as aligned reads in BAM format.