CAGEd-oPOSSUM Detailed Input Options and Analysis Results

Analysis Input

Select Target CAGE Peaks
Select by FANTOM5 sample expression levels
Select specific FANTOM5 CAGE peak IDs
Select custom CAGE peaks

Select Target CAGE Peak Filters
Filter FANTOM5 CAGE peaks by TSS status
Filter FANTOM5 CAGE peaks to those associated with specific genes
Filter CAGE peak regions to those which intersect with a specific set of genomic regions

Select Background CAGE Peaks
Use a matched random background
Select by FANTOM5 sample expression levels
Select specific FANTOM5 CAGE peak IDs
Select custom CAGE peaks

Select Background CAGE Peak Filters
Filter FANTOM5 CAGE peaks by TSS status
Filter FANTOM5 CAGE peaks to those associated with specific genes
Filter CAGE peak regions to those which intersect with a specific set of genomic regions

Select Transcription Factor Binding Site Parameters
Select TFBS Profiles
JASPAR CORE Profiles
Custom Profiles
Select TFBS Search Criteria
Matrix match threshold
Upstream / downstream sequence
Number of results to return
Sort results
Output binding site details for each TF
Email address

Analysis Summary

Notification Email

Analysis Results

Summary
Analysis Results Table

Binding Site Details Page




Analysis Input

Select Target CAGE Peaks

You may specify your set of target CAGE peaks in one of several different ways:

You may select only one method of specifing CAGE peaks. These are described in more detail below.


Select by FANTOM5 sample expression levels

The target set of CAGE peaks may be specified by selecting one or more FANTOM5 samples along with a minimum level of expression. By selecting a set of one or more FANTOM5 samples and a mimimum expression level, the target set of CAGE peaks will consist of all FANTOM5 CAGE peaks which have at least this level of expression in any of the selected FANTOM5 samples.

The FANTOM5 samples are provided as a heirarchical ontology tree. You can expand or collapse branches of the tree by clicking on the small triangles next to the sample name. The lowest level of the tree, the "leaf nodes", contain the individual FANTOM5 sample names. Nodes higher up in the tree describe higher level groupings of those samples. You may select one or more FANTOM5 samples by clicking the checkboxes next to the sample names on the ontology tree. Clicking a checkbox at the lowest level of the hierarchy (one of the leaf nodes) chooses that sample only. Clicking on a node higher up in the tree selects all samples that are children of that node. For more infomation on the samples contained in the tree, right click on the item name and Open Link in... to open a new browser link to the FANTOM5 SSTAR page describing that sample (or higher level group).

A convenient search funtion has also been provided to find samples in the tree. Type a term in the search box, e.g. "liver" and click the Search ontology button (or just hit Enter on your keyboard) and the tree will expand to display all the samples and/or higher level groupings containing the search string and as highlight them in blue. PLEASE NOTE that using the search function does not automatically select (check) the samples matching your search term(s). You still need to explicitly check the samples you want to use in your analysis. Hitting the Clear button will clear the search terms, unhighlight them and collapse the tree. PLEASE NOTE that any samples you may have checked previously are not automatically unchecked by the Clear button.

The expression level may be selected in one of two ways, by choosing either the "relative expression" level OR a combination of the "raw tag count" AND the number of "tags per million". Relative expression refers to the log10(relative expression over median). By selecting relative expression, CAGE peaks which have at least this level of relative expression in any of the FANTOM5 samples chosen above will comprise the target set of CAGE peaks. By selecting raw tag count and tags per million to specify the minimum expression level then CAGE peaks which have at least this number of raw tag counts and at least this many tags per million for the selected FANTOM5 sample(s) will comprise the target set of CAGE peaks.

Select specific FANTOM5 CAGE peak IDs

Alternatively you may specify the CAGE peaks to use directly by providing a list of FANTOM5 CAGE peak IDs. The FANTOM5 CAGE peak ID format is chr:start..end,strand, e.g. chr1:8938736..8938756,-. This can be done in two ways; by either pasting the IDs directly into the text area, or by uploading a file of IDs. The file must be formatted in plain text (no MS Word documents or other more complex formats) with one ID per line. Click the Browse... button to select a file from your compute to upload. For demonstration purposes, clicking the Use sample FANTOM5 CAGE peak IDs fills the text box with sample CAGE peak IDs. Use the Clear button to clear the text box.


Select custom CAGE peaks

Finally, you may select CAGE peaks by pasting or uploading a BED formatted file defining the CAGE peaks. Note that only the first 6 fields specified by the BED file format are required (up to the strand). The score field is not used and can be set to 0. Clicking the Use sample custom CAGE peaks button fills the text box with an example BED formatted file. Use the Clear button to clear the text box. If you upload a file with of custom CAGE peaks, then the file must be formatted in plain text (no MS Word documents or other complex formats) with one CAGE peak per line.


Once you have selected your target CAGE peaks, click the Select target CAGE peak filters button to progress to the next step which provides you the option to apply one of more filters to the selected CAGE peak data. If you are unhappy with the selections you have made, clicking the Reset button clears the selections from the form and allows you to start again (please note that any samples checked in the FANTOM5 sample ontology tree are not automatically cleared).


Select Target CAGE Peak Filters

This page allows you to optionally specify filters to apply to the selected CAGE peaks. Different filtering options are available depending on whether you selected FANTOM5 of user-defined (custom) CAGE peaks in the first step of the analysis.

You may apply multiple filering criteria. The filters act in a cumulative manner such that only CAGE peaks and their resulting TFBS search regions which pass all filters are retained for the analysis. For example, if you select all 3 filters, then the CAGE peaks selected in the first step are filtered so that only those CAGE peaks which are classified as TSSs and which are associated to at lease one of the genes in the list of gene identifiers are retained and used to compute an initial set of CAGE peak regions. These regions are then intersected with the filtering regions such that only those portions of the initial CAGE peak regions which overlap with the filtering regions are retained to form the final set of CAGE peak regions searched for TFBS.

Please see the sub-sections below for a detailed description of each filter.


Filter FANTOM5 CAGE peaks by TSS status

This option only appears if you selected FANTOM5 CAGE peaks in the previous step. If this box is checked (it's checked by default), only those CAGE peaks selected in the previous step which are also predicted to be TSSs by the FANTOM5 TSS classifier are retained for the analyis. This document describes the process used to classify FANTOM5 CAGE peaks as TSSs: TSSpredictionREADME.pdf


Filter FANTOM5 CAGE peaks to those associated with specific genes

This option only appears if you selected FANTOM5 CAGE peaks in the previous step. If a list of gene IDs/symbols is provided, then only those FANTOM5 CAGE peaks which are associated to one or more of the specified genes are retained for the analysis. You may provide a list of genes by either pasting the IDs/symbols into the text box or uploading a file containing the IDs/symbols. For human, you may specify HGNC IDs/symbols, EntrezGene IDs or UniProt IDs. For mouse, you may specify either EntrezGene or UniProt IDs. Again, any file uploaded must be in plain text format with one gene ID/symbol per line.


Filter CAGE peaks to those which intersect with a specific set of genomic regions

This option applies to both FANTOM5 and user-defined CAGE peaks. These regions act to filter the search space used to search for TFBS. A list of genomic regions is provided by either pasting the regions into the text box or by uploading a file containing the regions in BED format. More explicitly, CAGE peak search regions are computed by applying flanking regions to each of the CAGE peaks defined in the first step. The amount of flanking region to apply to the 5' and 3' edges of the CAGE peaks is selected later on in the final step during TFBS parameter selection. This forms the set of CAGE peak regions used to search for TFBS. By defining filtering regions in this step, only those portions of the CAGE peak regions which overlap with these filtering regions are used to search for TFBS. That is, the TFBS search space is the intersection of the CAGE peak regions and the filtering regions. Note: only the first 3 fields of the BED file (chromosome, start and end) are required (any extra columns are ignored). Again any file uploaded must be in plain text format with one filtering region specified per line.



Select Background CAGE Peaks

The same methods for providing target CAGE peaks are also available for selecting background CAGE peaks. Please see the Select Target CAGE Peaks section for details. Additionally, you may select to use a random background which is %GC composition and length matched to the target CAGE peak regions. This is the default and helps to reduce biases in the results which may appear when the %GC composition of the foreground and background regions differs. See below for more details.


Use a matched random background

This is selected by default. When this option is selected, a random background is generated to match as closely as possible the %GC composition and length distribution of the foreground CAGE peak regions. This is performed using the HOMER software. PLEASE NOTE the regions generated by HOMER for the random background are not specifically CAGE peaks. Rather HOMER pulls out genomic regions which closely match the sequence composition properties of the target CAGE peak regions. This helps to reduce any biases as a result of sequence composition differences in the foreground and background. For example, if the foreground regions have a higher %GC composition than the background regions, TFBS motifs which are more GC rich will have a tendency to appear more enriched in the foreground. This bias can be ascertained by examining the motif score vs. %GC composition plots generated as part of the results. Please refer to Worsley Hunt et al for an in depth discussion of this issue.

As this is the best way to ensure that biases due to %GC composition mismatches are reduced, this is the recommended way to provide the background. Please note however that generating the background is quite computationally intensive and will result in longer analysis times. Additionally, a side affect of using a random background is that it requires the background TFBS search to be performed on-the-fly rather than taking advantage of the pre-computed TFBS set even when the pre-computed JASPAR profiles are selected later in the Select TFBS Profiles step which also adds to analysis time.

If you publish any work which includes analyses where HOMER generated random backgrounds were used, please also cite HOMER (see the Citing CAGEd-oPOSSUM section on the main help page).


Select by FANTOM5 sample expression levels

See the description of this option under the Select Target CAGE Peaks section


Select specific FANTOM5 CAGE peak IDs

See the description of this option under the Select Target CAGE Peaks section


Select custom CAGE peaks

See the description of this option under the Select Target CAGE Peaks section


Select Background CAGE Peak Filters

PLEASE NOTE that if you select a random background in the Select Background CAGE peaks step, then CAGE peak filters do not apply and this step is skiipped.

Otherwise, the same choices are provided for selecting background CAGE peak filters as for target CAGE peak filters. Please see the Select Target CAGE Peak Filters section for details.


Filter FANTOM5 CAGE peaks by TSS status

See the description of this option under the Select Target CAGE Peak Filters section


Filter FANTOM5 CAGE peaks to those associated with specific genes

See the description of this option under the Select Target CAGE Peak Filters section


Filter CAGE peaks to those which intersect with a specific set of genomic regions

See the description of this option under the Select Target CAGE Peak Filters section


Select Transcription Factor Binding Site Parameters

In this step you select various parameters affecting how the actual TFBS search is performed. This includes selecting which TFBS profiles to use in the analysis, the TFBS score threshold to use, how much flanking sequence to apply around the CAGE peaks (the TFBS search space) as well as how the results are displayed.


Select TFBS Profiles

You may select TFBS profiles by choosing ones from the JASPAR collection of curated TFBS profiles or by supplying your own custom matrices. If you choose JASPAR profiles, the binding sites have been pre-computed and stored in the CAGEd-oPOSSUM database. If you use your own custom profile matrices then the binding sites are computed on-the-fly, although note that other parameter selections may cause parts of the analysis to be computed on-the-fly regardles of whether pre-computed profiles are used. For example, choosing to use a random background results in the background TFBS search to be performed on-the-fly regardles regardles of the TFBS profile selection method. Generally, analyses performed with custom profiles will take longer than analyses performed with the pre-computed JASPAR binding sites.


JASPAR CORE Profiles

The matrices used to pre-compute binding sites in CAGEd-oPOSSUM were obtained from the 2016 release of the JASPAR database. This pre-computed set includes all JASPAR CORE vertebrates profiles with a minimum information content (specificity) of 8 bits. You may use the entire pre-computed set or a selected subset of the profiles.

The default setting is to use the entire set JASPAR 2016 CORE vertebrate profiles with a minimum specificity of 8 bits. The specificity is also known as the information content (IC). Loosely defined, the IC combines the length and the complexity of the motif into a value that describes what we know collectively about the sequences a TF recognizes. The minimum specificity of 8 bits was chosen because TF matrices with IC of less than 8 bits are relatively uninformative for scoring DNA sequence. If you choose to threshold matrices with a higher specificity, you are limiting your analyses to matrices with stronger, more selective patterns. In general, it is not necessary to use a threshold higher than 8 bits unless you have specific requirements, as you may later filter your final results by specificity (the IC column of the results table). The only advantage of using a higher threshold is that it will generally result in a faster compute time, although other parameter choices can have a much greater impact on this.

You may also select a specific set of TFBS profiles by choosing one or more TFBS profiles from the scrollable list. Click on an individual TF to select it. You may select multiple profiles using the Ctrl or Shift keys. You can select a group of profiles by clicking on one, then holding down the Shift key and clicking another which selects all profiles between the two click points. By holding the Ctrl key down while clicking, each profile clicked will be selected.


Custom TFBS Profiles

To use custom TFBS profiles, you may either paste in or upload a file containing your own set of custom TFBS profile matrices. The matrices must be in one of the following formats, e.g.:


  > NFE2L2
  A  [10  0  0 20  0  6  5 16  0  0 15 ]
  C  [ 1  0  0  0 17  2 10  0  0 20  2 ]
  G  [ 9  0 19  0  1  1  1  2 20  0  2 ]
  T  [ 0 20  1  0  2 11  4  2  0  0  1 ]
  
or simply

  > NFE2L2
  10  0  0 20  0  6  5 16  0  0 15
   1  0  0  0 17  2 10  0  0 20  2
   9  0 19  0  1  1  1  2 20  0  2
   0 20  1  0  2 11  4  2  0  0  1
  

NOTE that the header line of the matrix format may contain just a name or an ID and a name separated by space, e.g.:

either
  > NFE2L2
or
  > MA0150 NFE2L2

The latter replicates the JASPAR format but is not necessary. If only a name is provided then the name will appear in both the "TF" and "TF ID" columns in the results table


Select TFBS Search Criteria

Here you select the parameters used to search for TFBS including the matrix score threshold, the amount of upstream and downstream flanking sequence that is applied around the CAGE peaks to be searched for TFBS, the number and sort order of the results to display and whether to output binding site details for each TF.


Matrix score threshold

Putative TF binding sites are predicted by sliding the TFBS profile's postion weight matrix (PWM) along a sequence giving a score at each position. Positions where the score is above some threshold are stored in the CAGEd-oPOSSUM database. Thus the threshold is the minimum relative (percent) score used to report the position as a putative TFBS. The minimum score threshold used to generate the pre-computed set of CAGEd-oPOSSUM binding sites was 80%. The default threshold is 85%. Thus, you may choose any value of 80% or higher.

The relative (percent) score of a putative binding site is computed from the raw matrix score as:
relative_score = (site_raw_score - min_matrix_raw_score) / (max_matrix_raw_score - min_matrix_raw_score)

The default threshold of 85% is a commonly used threshold for TFBS analyses using PWMs. If you have prior knowledge of which TFs are of interest for your analyses and what their properties are, you may change this threshold based on that knowledge. For instance, if the matrix for a TF of interest has a low IC, then you may want to use a higher threshold, whereas for a TF with a high IC, you might try using a lower threshold. A threshold of 80% or 85% will generally provide you with satisfactory results, but if you are uncertain, we recommend trying multiple analyses with various thresholds.


Amount of upstream / downstream sequence

This refers to the amount of flanking sequence applied upstream and downstream of each CAGE peak sequence. The default is 500 bp both upstream and downstream. The maximum amount of upstream / downstream sequence is 2000 / 2000 (the values used to for the JASPAR pre-computed TFBS). Generally the larger the flanking sequences used, the slower the compute time due to the increase in search space.


Number of results to return

You can specify the number of results to be returned. The default is to return all results, but you can choose to return only the top 5, 10 or 20 results. Alternatively, you can choose to return all results which score above a given Z- and Fisher score threshold. In cases where the top X results is anything other than "All", the "Sort results by" section affects how results are returned. For example if you choose to return the top 10 results and the sort result by option is set to Fisher score than the results with the top 10 Fisher scores are returned, so that if a result had a Z-score in the top 10 but it's Fisher score was not in the top 10 this result is lost. When you choose anything other than the All option, those results that are not returned, are lost. We therefore recommend using the default "All" results, as selecting less than this does not reduce the analysis time. Once you have the results you can re-order the results by the Fisher or Z-scores (as well as any other criteria).


Sort results by

This specifies the initial sort order of the results. Results can be initially sorted by either Z-score or Fisher score. The default is to sort by Fisher score. After the results are returned, the results table on the results web page can be re-ordered by clicking on the header of any column in the table. The downloadable tab delimited results text file will also be sorted by the initial sort order.


Output binding site details for each TF

If this box is checked then details of the individual binding sites for each TF is output. Clicking on the values in either the "Target CAGE peak region hits" or "Target TFBS hits" columns in the main results table will open up a new window which displays the chromosomal coordinates of all of the binding sites for this TF which fall within the merged (and optionally filtered) CAGE peak regions. Checking this option may result in slightly longer processing time.


Also run HOMER motif analysis

If this box is checked then a HOMER motif overrepresentation analysis is performed. For this analysis the defined HOMER motifs are used with their pre-defined default thresholds. The main results page will contain a link to the HOMER results. For more information on HOMER motif analysis please visit the HOMER page at http://homer.salk.edu/homer.

If you perform any HOMER motif overrepresentation analyses and use the results in a published work, please also cite HOMER (see the Citing CAGEd-oPOSSUM section on the main help page).


Email address

Once you are satisfied with your chosen TFBS search parameters, you must enter your email address to receive notification of when the analysis has completed. Your email address is not used for any other purpose than to send you this notification. Once you have entered a valid email address click the Perform analysis button to execute the analysis.

The Reset button will clear all TFBS search parameter values.




Analysis Summary

After you hit the Perform analysis button an analysis summary page is displayed in a new browser tab. This notifies you of your job ID and summarizes the criteria selected for the analysis.

Depending on the size of the target and background CAGE peaks data sets and which filters and TFBS analysis parameters where selected, as well as the overall server load, the analysis may take anywhere from a few seconds to several minutes.




Notification Email

Once your analysis is complete an email will be sent to the email address you provided with a link to your analysis results. The email also gives a summary of the analysis parameters you provided similar to what is displayed on the Analysis Summary page. If for any reason you do not receive the email notification in a reasonable time it is possible that the job failed silently for some reason. In this case, please use the Contact page to notify us so we can investigate. Please provide the Job ID in your email correspondence.




Analysis Results

Clicking on the results link in the notification email brings you to the Analysis Results page. The results page contains an analysis summary section at the ttop of the page which displays information about parameters used in the analysis, similar to the information displayed on the Analysis Summary page. The actual results are displayed below this summary in table format. Below the results table are links to the Fisher and Z-score vs. %GC composition plots. The results can also be downloaded in a tab delimited file format and a link to this is also provided under the main results table. See the sections below for more detailed information on each of these.


Summary

This section summarizes the analysis parameters that were chosen including the source of the CAGE peaks, which transcription factors were selected, the matrix score threshold, the amount of flanking sequence etc. It also provides the final number of CAGE peak regions analysized once flanking regions are added to the individual CAGE peaks and the regions are merged and any optional filtering is performed. See Basic Algorithm for more details on this merging and filtering of CAGE peaks regions.


Analysis Results Table

For a general explanation of what the oPOSSUM analysis results mean, please refer to the Understanding the Results section on the main CAGEd-oPOSSUM help page.

The results table contains the ranked list of overrepresented TFBS motifs. Results of both the Z-score and Fisher analyses appear in the last two columns. The results table is initially ordered by either Z-score or Fisher score (depending on which was selected in the "Sort results by" option) from most to least significant (higher to lower score). The table can be re-sorted by any other column by clicking on the column header. If there are no TFBS found in the background CAGE peak regions for a specific motif, this is highlighted in red and a general warning message will also appear before the table. Other factors that may affect the rankings of a particular TF in the results such as whether a TFBS profile has a particularly high information content (specificity) or a particularly high or low GC composition are also highlighted in red. The table columns are described in detail below.


TF

This column displays the name of the transcription factor, e.g. 'Nfe2l2'.


TF / JASPAR ID

The TFBS profile ID. If the TFBS profiles came from JASPAR then this column is named "JASPAR ID" and the JASPAR matrix ID is displayed here, e.g. 'MA0150.2'. Clicking on the matrix ID links to the JASPAR summary page for this TFBS profile.

If custom matrices were entered instead, this column is labelled "TF ID". If the header format of the custom matrices contained an ID and a name, the TF ID is shown here and the name is shown in the "TF" column. If only an ID or name was given then the "TF" and "TF ID" columns will contain the same information.


Class

The class of transcription factors to which this particular TF belongs, e.g. "Basic leucine zipper factors (bZIP)". This follows the heirarchical classification based on DBD characteristics as defined by TFClass. This only applies if you selected to use the pre-computed set of TFBS based on JASPAR profiles.


Family

The family of transcription factors to which this particular TF belongs, e.g. "Jun-related factors". This follows a heirarchical classification based on DBD characteristics as defined by the TFClass framework. This only applies if you selected to use the pre-computed set of TFBS based on JASPAR profiles.


Tax group

The taxonomic supergroup to which this TF belongs, e.g. "vertebrates". This only applies to JASPAR profiles.


IC

The information content (specificity) of this TFBS profile's position weight matrix. IC values less than 9 or greater than 19 are flagged red as profiles with low specificity will tend to be found more frequently by random chance resulting in more noise, whereas profiles with very high IC will have much fewer predicted binding sites overall, making the overrepresentation scores more susceptible to relatively small changes in the number of predicted target or background binding sites.


GC Content

The GC content of the this TFBS profile. Values less than 0.33 or greater than 0.66 are flagged red. Profiles with low or high GC compositions are more susceptible to %GC compostion biases between the target and background sequences.


Target CAGE peak region hits

The number of CAGE peak regions in the target set which contained at least one predicted binding site for this TF. The Fisher score calculation compares the frequency with which the target and background CAGE peak regions contain at least one predicted binding site for TF. If the option to output binding site details was selected this is also a link to the detailed TFBS Hits table, which lists the actual genomic locations of the binding sites for this particular TF.


Target CAGE peak region non-hits

The number of CAGE peak regions in the target set which contained NO predicted binding sites for this TF.


Background CAGE peak region hits

The number of CAGE peak regions in the background set which contained at least one predicted binding site for this TF. Results with 0 background hits are flagged with a warning.


Background CAGE peak region non-hits

The number of CAGE peak regions in the background set which contained NO predicted binding sites for this TF.


Target TFBS hits

The total number of predicted binding sites for this TF within the target set of CAGE peak regions. The Z-score is computed by comparing the frequency of predicted binding sites in the target and background for this TF (technically the frequencey of the binding site nucleotides is used to "normalize" for size differences in the profiles). If the option to output binding site details was selected, this is also a link to the detailed TFBS Hits table for this TF which lists the actual genomic locations of it's binding sites.


Background TFBS hits

The total number of predicted binding sites for this TF within background set of CAGE peak regions. Results with 0 background hits are flagged with a warning. Depending on whether the target set of CAGE peak regions contained 0 or more TFBS hits, the Z-score calculation becomes either undefined or infinite. This often occurs with high information content profiles as they tend to have few binding sites overall making the score calculations very susceptible to small fluctuations in the number of target or background binding sites. Therefore these results may or may not be significant.


Target TFBS rate

The rate of occurrence of predicted binding site for this TF within the target CAGE peak regions. The rate is equal to the number of times the site was predicted (target hits) multiplied by the width of the TFBS profile, divided by the total number of nucleotides comprising the target CAGE peak regions.


Background TFBS rate

The rate of occurrence of predicted binding sites for this TF within the background CAGE peak regions. The rate is equal to the number of times the site was predicted (background hits) multiplied by the width of the TFBS profile, divided by the total number of nucleotides comprising the background CAGE peak regions.


Z-score

The Z-score measures the likelihood that the total number of predicted TFBS in the target CAGE peak regions is significant when compared with the total number of predicted TFBS in the background CAGE peak regions. It computes the ratio of the total number of nucleotides within predicted binding sites to the total number of nucleotides in the CAGE peaks regions for both the target and background set and compares these two ratios. The Z-score is expressed in units of magnitude of the standard deviation. For a more detailed description of the Z-score calculation please see the Z-score description under the Statistical Analysis section of the main help page.


Fisher score

The Fisher score takes the likelihood that the difference in the ratio of the number of target CAGE peak regions containing at least one predicted binding site and the number of target CAGE peak regions that do not contain any predicted binding sites could have occured by random chance as compared to the same ratio of predicted binding sites in the background CAGE peak regions. The Fisher score is actually given as the negative natural logarithm of the Fisher p-value. For a more detailed description of the Fisher calculation please see the Fisher Score description under the Statistical Analysis section of the main help page.




Binding Site Details page

If the option to output binding site details was checked on the "Select TFBS Parameters" page, then the numbers in either the "Target CAGE peak region hits" or the "Target TFBS hits" columns are links which, when clicked, open up a new page which displays the genomic coordinates of all the predicted binding site for that particular TF.

The table of predicted binding sites is preceded by a header section listing the TF name, TF/JASPAR ID, structural class and family, taxonomic supergroup, information content and GC content of the TFBS profile.


Binding Sites

The binding sites table contains the following columns


Region

The CAGE peak region within which the TFBS was predicted in the form chr#:start-end

Chr

The name of the chromosome on which the TFBS was predicted.

Start

The chromosomal start position of the predicted TFBS.

End

The chromosomal end position of the predicted TFBS.

Strand

The strand on which the TFBS is predicted. Note that the chromosomal start and end positions are always given such that start is less than end, i.e. always reported for the + strand, regardless of whether the strand is reported as + or -.

Abs. Score

The absolute matrix score. This is the raw TFBS profile position weight matrix (PWM) score of the predicted binding site and is equal to the sum of the log-odds ratios of the specific nucleotide observed at each position in the binding site.

Rel. Score

The relative TFBS matrix score of the predicted binding site. This is given as a percentage and is calculated from the absolute matrix score. It is equal to the raw matrix score of the binding site minus the minimum possible matrix score divided by the maximum possible matrix score minus the minimum possible matrix score.

Sequence

The predicted binding site sequence. Note that if the TFBS is predicted on the - strand, then the sequence displayed is the - strand sequence.