Sample Management Tutorial
Samples group together common datafiles with descriptive metadata and are used throughout GxSeq for display. Supported satafiles include tabular expression data in csv/txt, read alignments in BAM format and read density in wig/bigWig format.
Start by clicking "New Sample" on top of an experiment.
A sample must have a name. All other fields are optional
Choose a concise name for your sample
Each sample has a state of private or shared. Private samples are only visible to their creator regardless of the assembly group assignment. Shared samples are accessible to all group members of the sample's assembly.
Use sample traits to set metadata attributes. This might include genotype, tissue, dpa strain etc.. Traits represent study variables or factors in your experiment. They can be used for filtering and grouping samples. They are also used to create sample groups within experiments.
If your trait is not listed, you can click the 'Manage Selection' link in the trait help text to add more items. New traits are global to the site and will be available to all users.
Samples can have several different types of datafiles attached to them. These datatypes are described in more detail below. To add a new datafile save your new sample or choose a sample from the listing. Then edit the sample by clicking the  button on the sample details page. This form contains a button 'Add Data'. Clicking this button displays a menu of different datatypes that can attached to your sample.
Adding Expression (CountTable)
Expression data is added to the site by uploading a table of gene expression counts to your sample. Start by clicking 'Expression' in the sample 'Add Data' menu.
Count tables should include 1 row for each gene or other feature in the assembly. Finding the correct feature match for each entry in your file is based on two attributes. The selections for these attributes display the value name along with a count of features matching that value. The database values for this selection must match your data file in order for expression to be uploaded. By default, if a match is not found you will get an error message and the file will not be uploaded. If you are unsure about feature types and attributes you can use the feature listing to explore annotations.
The Feature Type selection is used to denote the kind of annotation you want to lookup. This is often 'Gene' but it may also be 'mRNA' or 'exon' or any other assembly on the feature. The count should be greater than or equal to the number of entries in your file.
The ID Key selection is used to choose the feature attribute for text matching. This attribute is generally ID or locus_tag. It may also be 'Name' or 'Gene' depending on how your annotations and expression files were generated. Make sure the count of features with this attribute is greater than or equal to the number of entries in your file.
After selecting a text file from your local system a preview will be displayed including columns selections. You can use this preview to verify column assignment before upload. Columns are assigned by entering the 1-based column index into the form. Changes in the form will be reflected in the preview.
Text column matching the ID attributes in the database. Will be used to lookup features for expression assignment.
Decimal column containing normalized expression results for the genes. This could be TPM, RPKM, etc..
Total count- optional
Integer count of gene observations in the expression study. For example, the number of aligned reads. This is useful for downstream analysis such as DESeq.
Enter '0' to ignore this column
Unique count- optional
Integer count of unique gene observations. Non specific read mappings are ignored in this count.
Enter '0' to ignore this column
Adding Read alignments (BAM)
Read alignments are added to the site by uploading a BAM formatted file. Start by clicking 'Reads(Bam)' in the sample 'Add Data' menu.
BAM files are binary indexed versions of the SAM sequence alignment format: http://www.htslib.org/
These files store read positions aligned to sequence. Sequence are referenced using a text identifier. It is important to verify that BAM identifiers match the sample's concordance set identifiers. For more details on concordance sets, read the Assembly Management Tutorial
By default, GxSeq will create a bigWig from the BAM files you upload. The bigWig is used to This is done in two steps:
1. a bedgraph is created using bedtools genomeCoverageBed -split -bg -ibam
2. the bedgraph is converted to bigWig format using Bio::Ucsc::Util.bed_graph_to_big_wig
Bulk Sample Upload
Samples and associated datafiles can also be created in bulk. This tool is split into two steps. Bulk upload of datafiles and bulk creation of samples associated with those datafiles. This tool enables rapid entry of large numbers of samples.
The bulk upload interface is used to stage datafiles for subsequent sample creation. It accepts any file from your system and allows mutiple uploads at the same time. Complete uploads are listed with the opiton to remove them from the staging directory. Partial uploads that are interrupted can be resumed. After an upload is complete, a checksum will be calculated and compared with the stored data to verify data integrity.
The bulk assignment tool is used to create samples and assign the staged datafiles. This tool can build samples based on the unique filenames in your staging directory or using a sample metadata file. It can also assign expression data to each sample from a matrix.
You have the option to add samples, metadata traits, and re-arrange datafile assignments before saving the new samples. When everything looks good, submit the form using the 'Submit' button on the bottom of the page. This will take significant time as each sample is validated and saved. Generally 5-10 seconds per sample.
1 sample will be added for each unique filename found in the staging directory (ignoring extensions). This option works well when you have a bam file and count table for each sample and limited metadata.
1 sample will be added for each row in the selected metadata file. The file must be a comma or tab delimited with columns for metadata attributes and a header on the first line describing each column.
The following columns are parsed for special use. *Name is required. All other columns become sample traits.
|CountTable:||Expression file name|
|Bam:||Alignment file name|
Count Table Options
Similar to the Count Table upload form, the count table options field allows column assignment for expression files. Bulk uploads apply these selections to all datafiles. If you have different file formats upload them separately.
Matrix uploads have the same Feature Type and ID Key options as single Count Tables. However, a matrix can not be used to upload Unique Counts and Total counts. Only Normalized expression values should be entered using a matrix.