SMARTIV Manual

Input
Results
- PWM motif presentation
  - WebLogo graphical presentation
  - Matrix presentation
- K-mers presentation

Input

Genome and Database assembly: SMARTIV supports input sequences, extracted from binding experiments performed on the following genomes and database assemblies:

Human
- December 2013 (GRCh38/hg38) assembly, provided by the Genome Reference Consortium.
- February 2009 (GRCh37/hg19) assembly, provided by the Genome Reference Consortium (defined as the default assembly for human).
- March 2006 (NCBI36/hg18) assembly, provided by the International Human Genome Sequencing Consortium.
Mouse
- December 2011 (GRCm38/mm10) assembly, provided by the Mouse Genome Reference Consortium.
- July 2007 (NCBI37/mm10) assembly, provided by NCBI and the Mouse Genome Sequencing Consortium (defined as the default assembly for mouse).

Input file format: SMARTIV gets a list of genomic coordinates in BED format (view example) or a list of sequences in FASTA format (view example). The list should be sorted by the binding score in a descending order. In case both formats are available, we recommend providing the BED file.

Sample data: Clicking on the 'Load sample data' button loads an example of an input list in BED format. The calculation parameters are set to default but can be changed by the user. By clicking on the 'Submit' button, the job will be submitted and the results will be presented automatically on the server. The provided sample data is PAR-CLIP binding data obtained for the human PUM2 protein¹. The dataset was extracted from the doRiNA² database.

1. M. Hafner, M. Landthaler, L. Burger, M. Khorshid, J. Hausser, P. Berninger, A. Rothballer, M. Ascano, Jr., A.C. Jungkamp, M. Munschauer, A. Ulrich, G.S. Wardle, S. Dewell, M. Zavolan, T. Tuschl, Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP, Cell 141(1) (2010) 129-41.

2. K. Blin, C. Dieterich, R. Wurmus, N. Rajewsky, M. Landthaler, A. Akalin, DoRiNA 2.0--upgrading the doRiNA database of RNA interactions in post-transcriptional regulation, Nucleic Acids Res 43(Database issue) (2015) D160-7.

Calculations parameters

K-mer length range: SMARTIV uses a k-mer-based algorithm to search for enriched motifs (Note: the length of the k-mers does not define the final motif length).
By default, SMARTIV provides a pre-defined range, 5-7. Another option is to set a custom range.
Custom range: The maximal length range allowed is 4 to 10 nucleotides. To select a specific length, insert the same value to both 'Min. length' and 'Max. length' boxes.

Motif type: SMARTIV calculates two types of motifs: a combined sequence and structure motif (8-letter alphabet) and a sequence-based motif (4-letter alphabet). SMARTIV provides an option to display only one of the motif types or both.

General parameters

Job name: An optional parameter that enables you to give your job an informative name, otherwise, the job will get a unique number identifier.

Email address: The E-mail address is an optional field, required in order to get a link to the results page. If you don't get an E-mail from SMARTIV within a reasonable time, check your spam folder, it might accidentally get there.

Results

By default, SMARTIV presents the best eight-letter motif, based on sequence and structure, for each requested k-mer length. However, the user can choose to display in addition the best standard four-letter motifs, based on sequence only.
For each motif (PWM) SMARTIV provides both a graphical presentation using the WebLogo software and the matrix itself as a text file. In addition, SMARTIV presents the exact words (k-mers) that were used to build the PWM (view an example of the results page).

PWM motif presentation

WebLogo graphical presentation: The PWM motif is presented as a logo, using an adjusted version of the WebLogo software. The logo can be displayed in two versions:

No gap correction (the default) - the information content in each position depends on the frequency of each nucleotide only and ignores gaps if any.
Including gap correction - the information content for positions that include gaps is reduced.

Each version of the logo can be downloaded in JPG or PDF while selected for display.
P-value: The p-value presented above the logo reflects the correspondence between the derived PWM and the original binding scores of the sequences (derived from the CLIP experiment). It is estimated using the mmHG statistics, which evaluates the association between two ranked lists, assigning an FDR corrected p-value to each PWM (Steinfeld et al., 2013).

Matrix presentation: The PWM (Position Weight Matrix) is available for download as a text file (view example).

K-mers presentation

By clicking on 'View the list of k-mers composing the motif', SMARTIV displays a table, including the significant exact strings of length k (k-mers) that were used to build the PWM and the related statistical information. The table is also provided for download as a text file.
K-mer: The exact motif string color-coded by the logo color scheme.
P-value: The value presented is the mHG score, corrected for multiple testing, which is a tight bound for the P-value (P-value ≤ corrected mHG score).
N: The total number of input sequences.
B: The total number of sequences containing the motif.
n: The index, in which the division of the input list into target and background by the mHG statistics, gives the optimal enrichment of the motif at the top of the list.
b: The number of sequences containing the motif among the n top sequences.
Enrichment: Measures to what extent the motif is found at the top of the list comparing to the total list. Defined as: (b/n) / (B/N).
For more information about the mHG statistics, please refer to: Eden et al. (2007)