---USER UPLOADS---

DIGITtally can run bespoke analyses on user-uploaded BULK RNAseq expression data.

To take advantage of this feature, we require two files in tab- or comma- separated format (TSV/CSV). 

Examples are provided in this folder, taken from the 2023 update to FlyAtlas2 expression values. 
Full documentation on this update can be found at http://flybase.org/reports/FBrf0258027.htm.
In brief, FlyAtlas2 was re-analysed utilising modern annotation information and analysis software, removing ribosomal RNA reads due to high variance between samples.
As a result, expression values are similar to those found on https://flyatlas.gla.ac.uk/FlyAtlas2/index.html, but not identical.

---Metadata---
-example file = digittally_metadata.csv

This file DESCRIBES the expression data for the user upload, allowing it to be parsed and incorporated into the DIGITtally pipeline.

The metadata is required to include two columns, "Sample" and "Tissue", for sample IDs and tissues respectively.

The sample IDs must be unique and there may be no empty sample ID fields.

Any tissue including the word "whole" (case insensitive) will be treated as a Whole Fly "tissue" (ignored in specificity analysis and included in enrichment analysis).

Only one wholefly "tissue" is allowed. Providing any additional wholefly "tissues" will overwrite results from the first.

The optional columns "Age" and "Sex" may be used to run the analysis on subsets of the samples.

Age may be stated in days or life stages.

Samples with missing metadata may be exluded from the analysis depending on chosen options. 

---Expression Matrix---
-example file = digittally_expressionmatrix.csv

This file CONTAINS expression data for the user-defined dataset, giving expression of each gene/transcript by sample.

The first column of the expression matrix must contain gene names and the remaining column headers must be the sample IDs.

The gene expression values may be given in TPM, RPKM, FPKM, or other normalised measure of expression.

There may be sample IDs present in the matrix that are not present in the metadata, however all sample IDs in the metadata must be present in the matrix. 

---Gene List---
-example file = digittally_vha_genelist.txt

This file is simply a .txt file containing line-separated gene symbols. Lists such as these can be generated using the DIGITtally Gene List Builder (https://www.digittally.org/utils/genelistbuilder/)

This can be used as input during DIGITtally setup for either manually defining a gene of interest list, or for defining reference genes to compare expression patterns with.
Note that such a list is not strictly necessary - the DIGITtally wizard can link directly to the Gene List Builder and allow on-the-fly definition.

When creating a new gene list, it is highly recommended to put genes of interest into the Gene List Builder utility.
This allows us to avoid any source of ambiguity in user-specified genes or format mismatch.


---Example Results---

-EpitheliomeResults.zip

A set of results carrying out the SAME analysis with different parameters, to highlight how DIGITtally can be customised to answer varied research questions. 

These results centre around identifying a Drosophila "epitheliome" - that is, a set of genes which define epithelial cell fate. As such, in all cases we're looking for genes which appear "interesting" in all Drosophila transporting epithelia (ie, Midgut, Hindgut, Salivary Gland and Malpighian Tubules). To keep this broadly applicable, we have used ALL of DIGITtally's built in Data Sources; with all Scoring Metrics barring Co-Expression.

Four run results, and associated settings files, are included:

--Epitheliome-Base

This run represents an unbiased approach to epitheliome gene discovery.

All parameters are kept at their DIGITtally Defaults


--Epitheliome-ConservedEnriched

This run represents a case where a user is particularly interested in genes which are important in epithelia across insects, rather than in Drosophila alone.

To fit this case, the Score Weight for "Orthology" has been increased from 1 to 3. As a result, conserved genes score more highly.


--Epitheliome-NewGeneEnriched

This run represents a case where a user is particularly interested in genes which haven't been studied before, allowing novel studies to be carried out.

To fit this case, a negative weight of -10 has been applied to Published Association - Any Association scores. Thus, if FlyBase has any information linking a gene to an epithelial tissue from published data, that gene will score 10 less points than it otherwise would. As a result, genes which haven't been studied in this context will score more highly.


--Epitheliome-HighlySpecific

This run represents a case where a user is interested only in genes which are VERY restricted to the epithelia

To fit this case, the scoring thresholds have been altered. The Enrichment threshold now requires a gene to be expressed 5-fold more in all epithelial tissues versus the relevant whole-fly sample (up from 2-fold default). The Specificity threshold requires a gene to be expressed 5-fold more in all epithelial tissues versus the highest non-epithelial sample (up from any higher expression default). As a result, the final gene of interest list is VERY restricted to only those genes which are effectively purely epithelial.
