Sign In

Documentation

Manual

Sample Management Tutorial

Samples group together common datafiles with descriptive metadata and are used throughout GxSeq for display. Supported satafiles include tabular expression data in csv/txt, read alignments in BAM format and read density in wig/bigWig format.

Creating Samples

Sample form e1e398e734cd49eb7a821cbf7dae120a4dcc1fc6f82666582b44d095102e9981

Start by clicking "New Sample" on top of an experiment.

A sample must have a name. All other fields are optional

Name
Choose a concise name for your sample

State
Each sample has a state of private or shared. Private samples are only visible to their creator regardless of the assembly group assignment. Shared samples are accessible to all group members of the sample's assembly.

Traits

Use sample traits to set metadata attributes. This might include genotype, tissue, dpa strain etc.. Traits represent study variables or factors in your experiment. They can be used for filtering and grouping samples. They are also used to create sample groups within experiments.

If your trait is not listed, you can click the 'Manage Selection' link in the trait help text to add more items. New traits are global to the site and will be available to all users.

Sample traits 90209346f00de9c9b1ea5a0c391b6b4ec080cba29c74fb76d64992db0cbb3ed5

Adding Datafiles

Samples can have several different types of datafiles attached to them. These datatypes are described in more detail below. To add a new datafile save your new sample or choose a sample from the listing. Then edit the sample by clicking the [edit] button on the sample details page. This form contains a button 'Add Data'. Clicking this button displays a menu of different datatypes that can attached to your sample.

Sample add data fdcb78a2f85425db38fffbe70345af30752436f7961245a063f25a8a2c59ac46

Adding Expression (CountTable)

Count table form 3b16fbe8e291dd1bf0ad3119e49905951456a45c10947fd6946ff6b88969a339

Expression data is added to the site by uploading a table of gene expression counts to your sample. Start by clicking 'Expression' in the sample 'Add Data' menu.

Count tables should include 1 row for each gene or other feature in the assembly. Finding the correct feature match for each entry in your file is based on two attributes. The selections for these attributes display the value name along with a count of features matching that value. The database values for this selection must match your data file in order for expression to be uploaded. By default, if a match is not found you will get an error message and the file will not be uploaded. If you are unsure about feature types and attributes you can use the feature listing to explore annotations.

Feature Type
The Feature Type selection is used to denote the kind of annotation you want to lookup. This is often 'Gene' but it may also be 'mRNA' or 'exon' or any other assembly on the feature. The count should be greater than or equal to the number of entries in your file.

ID Key
The ID Key selection is used to choose the feature attribute for text matching. This attribute is generally ID or locus_tag. It may also be 'Name' or 'Gene' depending on how your annotations and expression files were generated. Make sure the count of features with this attribute is greater than or equal to the number of entries in your file.

Columns

After selecting a text file from your local system a preview will be displayed including columns selections. You can use this preview to verify column assignment before upload. Columns are assigned by entering the 1-based column index into the form. Changes in the form will be reflected in the preview.

Feature ID
Text column matching the ID attributes in the database. Will be used to lookup features for expression assignment.

Normalized value
Decimal column containing normalized expression results for the genes. This could be TPM, RPKM, etc..

Total count- optional
Integer count of gene observations in the expression study. For example, the number of aligned reads. This is useful for downstream analysis such as DESeq.
Enter '0' to ignore this column

Unique count- optional
Integer count of unique gene observations. Non specific read mappings are ignored in this count.
Enter '0' to ignore this column

Adding Read alignments (BAM)

Bam form aca5ee0da3864e5bd7db5c606acca5706cbe3b5df741911a928d31961eba980f

Read alignments are added to the site by uploading a BAM formatted file. Start by clicking 'Reads(Bam)' in the sample 'Add Data' menu.

BAM files are binary indexed versions of the SAM sequence alignment format: http://www.htslib.org/

These files store read positions aligned to sequence. Sequence are referenced using a text identifier. It is important to verify that BAM identifiers match the sample's concordance set identifiers. For more details on concordance sets, read the Assembly Management Tutorial

BigWig Creation
By default, GxSeq will create a bigWig from the BAM files you upload. The bigWig is used to This is done in two steps:

1. a bedgraph is created using bedtools genomeCoverageBed -split -bg -ibam
http://bedtools.readthedocs.io/en/latest/content/tools/genomecov.html

2. the bedgraph is converted to bigWig format using Bio::Ucsc::Util.bed_graph_to_big_wig
https://github.com/throwern/bio-ucsc-util

Bulk Sample Upload

Samples and associated datafiles can also be created in bulk. This tool is split into two steps. Bulk upload of datafiles and bulk creation of samples associated with those datafiles. This tool enables rapid entry of large numbers of samples.

Multiple Upload

Bulk upload 45bf46fbe0f83a740d7a99de498f4bcd090818ac80588686c80cf0f0f88876ee

The bulk upload interface is used to stage datafiles for subsequent sample creation. It accepts any file from your system and allows mutiple uploads at the same time. Complete uploads are listed with the opiton to remove them from the staging directory. Partial uploads that are interrupted can be resumed. After an upload is complete, a checksum will be calculated and compared with the stored data to verify data integrity.

Sample Builder

Bulk assignment 57a927d9b3f18891fa849ea28683dfcb5bdc30c13157f2903a9c59ccffe4b9fc

The bulk assignment tool is used to create samples and assign the staged datafiles. This tool can build samples based on the unique filenames in your staging directory or using a sample metadata file. It can also assign expression data to each sample from a matrix.

You have the option to add samples, metadata traits, and re-arrange datafile assignments before saving the new samples. When everything looks good, submit the form using the 'Submit' button on the bottom of the page. This will take significant time as each sample is validated and saved. Generally 5-10 seconds per sample.

Automatic Samples
1 sample will be added for each unique filename found in the staging directory (ignoring extensions). This option works well when you have a bam file and count table for each sample and limited metadata.



Metadata Samples
1 sample will be added for each row in the selected metadata file. The file must be a comma or tab delimited with columns for metadata attributes and a header on the first line describing each column.

The following columns are parsed for special use. *Name is required. All other columns become sample traits.

Name: Sample name
Description: Sample description
CountTable: Expression file name
Bam: Alignment file name




Count Table Options
Similar to the Count Table upload form, the count table options field allows column assignment for expression files. Bulk uploads apply these selections to all datafiles. If you have different file formats upload them separately.

Matrix Options
Matrix uploads have the same Feature Type and ID Key options as single Count Tables. However, a matrix can not be used to upload Unique Counts and Total counts. Only Normalized expression values should be entered using a matrix.

Bulk count table opts 395331919b21c7de1c774da5adef9106365d78be1146a928800509b94a8c011a