Category Archives: uncategorized

CommPath webtool tutorials

CommPath: inference and analysis of intercellular communications by pathway analysis

Overview

Here, we introduce CommPath, an open source R package and a webserver, to infer and visualize the LR associations and signaling pathway-driven cell-cell communications from scRNA-seq data.

Key Features

CommPath has two key features:

(i) it manually curates a comprehensive signaling molecule interaction database of LR interactions, as well as their currently accessible pathway annotations, including KEGG pathways, WikiPathways, reactome pathways, and GO terms;

(ii) it prioritizes both LR pairs among cell types and cell type specific signaling pathways mediating cell-cell communications.

Interface

0.Register and login

CommPath needs to be registered. And through your CommPath account, you can easily get the analysis states and the results of your tasks.

1. Upload input files

CommPath requires input of an expression matrix of gene × cell produced from scRNA-seq experiments and a label vector indicating cell clusters.

For example, here are N genes * M cells for P clusters (one row = one cell) with headers:

Cells Gene1 Gene2 …… GeneN Clusters
Cell1 Cluster1
Cell2 Cluster1
Cell3 Cluster2
…… ……
CellM ClusterP

An example of input file is “HCC.tumor.3k.csv”.

2. Run your task

To run a task, four parameters should be selected. These parameters are:

(1) Pvalue. Default = 0.05;
(2) Fold Change. Default = 3;
(3) Species. Now CommPath supports L-R pairs in 3 species, including human (hsapiens), mouse (mmusculus), and rat (rnorvegicus). More model organisms will be supported soon.
(4) Data files, which is your input file in “1. Upload input files” section.

And, click “START CommPath CALCULATION” to run a task.

Then, CommPath will automatically jump to “Task list” section, and the top row shows the task status in real time.

When the task status changed from “In progression” to “Completed”, you can click “VIEW RESULT” and browse the results.

In the “Task list” section, you can also review the results of previously completed tasks.

3. Results

Results for all clusters

Default “Results” page (“Circos” tab page) shows circos of all clusters. The widths of lines indicate the counts (Left plot) or the overall interaction intensity (Right plot) of LR pairs among clusters.

In the above circos plot, the directions of lines indicate the associations from ligands to receptors, and the widths of lines represent the counts of LR pairs among clusters.

Pathway enrichment analysis

CommPath conducts pathway analysis to identify dysregulated signaling pathways containing the marker ligands and receptors for each cluster.

The “GSVA” tab page, shows differentially activated pathways for each cluster.

There are 7 columns stored in the variable ident.up.dat/ident.down.dat:

Columns cell.from, cell.to, ligand, receptor show the upstream and dowstream clusters and the specific ligands and receptors for the LR associations;

Columns log2FC.LR, P.val.LR, P.val.adj.LR show the interaction intensity (measured by the product of log2FCs of ligands and receptors) and the corresponding original and adjusted P values for hypothesis tests of the LR pairs or the overall interaction intensity among clusters.

Results for specific cluster

When you focus on some specific cluster (eg. Endothelial), you can select “Endothelial” on the “Cluster” drop-down menu. Then, the circos plot of “Endothelial” will update in “Circos” tab page, as well as L-R associations and signaling pathway-driven cell-cell communications (“LR pairs” tab page).

In the “Circos” tab page, users would highlight the interaction of specific clusters. Here we take the Endothelial cells as an example:

Then network graph tools are to visualize the pathways and associated functional LR interactions.

In the above network graph, the pie charts represent the activated pathways in the selected cells (here Endothelial cells) and the scatter points represent the LR pairs of which the receptors are included in the genesets of the linked pathways. Colors of scatter points indicate the upstream clusters releasing the corresponding ligands. Sizes of pie charts indicate their total in-degree and the proportions indicate the in-degree from different upstream clusters.

The legend of the above network graph is generally the same to that of the previous network plot, except that: (i) the scatter points represent the LR pairs of which the ligands are included in the genesets of the linked pathways; (ii) colors of scatter points indicate the downstream clusters expressing the corresponding receptors; (iii) sizes of pie charts indicate their total out-degree and the proportions indicate the out-degree to different downstream clusters.

dot plot to investigate the upstream and downstream LR pairs involved in the specific pathways in the selected clusters:

pathway-mediated cell-cell communication chain

For a specific cell cluster, here named as B for demonstration, CommPath identifies the upstream cluster A sending signals to B, the downstream cluster C receiving signals from B, and the significantly activated pathways in B to mediate the A-B-C communication chain. More exactly, through LR and pathways analysis described above, CommPath is able to identify LR pairs between A and B, LR pairs between B and C, and pathways activated in B. Then CommPath screens for pathways in B which involve both the receptors to interact with A and ligands to interact with C.

Left

In the above line plot, the widths of lines between Upstream cluster and Receptor represent the overall interaction intensity between the upstream cluster and Endothelial cells via the specific receptors; the sizes and colors of dots in the Receptor column represent the average log2FC and -log10(P) from differential expression tests comparing the receptor expression in Endothelial cells to that in all other cells; the lengths and colors of bars in the Pathway annotation column represent the mean difference and -log10(P) form differential activation tests comparing the pathway scores in Endothelial cells to those in all other cells.

Comparison to another object

“Comparison to another object” tab page, provide useful utilities to compare cell-cell interactions between two conditions, namely case group and control group, such as disease and control. The case group is the current task, and the control group is defined as another task by click “Comparion to another object” drop-down menu. Then, differentially activated signaling pathway-driven cell-cell communications between two CommPath objects is showed.

Here we, for example, use CommPath to compare the cell-cell communication between cells from HCC tumor and normal tissues. We have pre-created the CommPath object for the normal samples following the above steps.

In the above line plot, the widths of lines between Upstream cluster and Receptor represent the overall interaction intensity between the upstream clusters and Endothelial cells via the specific receptors, and the colors indicate the interaction intensity is upregulated (red) or downregulated (blue) in tumor tissues (object.1) compared to that in normal tissues (object.2); the sizes and colors of dots in the Receptor column represent the average log2FC and -log10(P) of expression of receptors in Endothelial cells compared to all other cells in tumor tissues; the lengths and colors of bars in the Pathway annotation column represent the mean difference and -log10(P) of pathway scores of Endothelial cells in tumor tissues compared to that in normal tissues.

4. Download the results

All the results, provided as a zipped package, can be freely downloaded by clicking “Download” buttom on “Task list” section.

Commpath R package instructions

CommPath

CommPath is an R package for inference and analysis of ligand-receptor interactions from single cell RNA sequencing data.

Installation

CommPath R package can be easily installed from Github using devtools:

devtools::install_github("yingyonghui/CommPath")
library(CommPath)

Dependencies

Tutorials

In this vignette we show CommPath’s steps and functionalities for inference and analysis of ligand-receptor interactions by applying it to a scRNA-seq data (GEO accession number: GSE156337) on cells from hepatocellular carcinoma (HCC) patients.

Brief description of CommPath object

We start CommPath analysis by creating a CommPath object, which is a S4 object and consists of six slots including
(i) data, a matrix containing the normalized expression values by gene * cell;
(ii) cell.info, a data frame contain the information of cells;
(iii) meta.info, a list containing some important parameters used during the analysis;
(iv) LR.marker, a data.frame containing the result of differential expression test of ligands and receptors;
(v) interact, a list containing the information of LR interaction among clusters;
(vi) pathway, a list containing the information of pathways related to the ligands and receptors.

CommPath input

The expression matrix and cell indentity information are required for CommPath input. We downloaded the processed HCC scRNA-seq data from Mendeley data. For a fast review and illustration of CommPath’s functionalities, we randomly selected the expression data of 3000 cells across the top 5000 highly variable genes from the tumor and normal tissues, respectively. The example data are available in figshare.
We here illustrate the CommPath steps for date from the tumor tissues. And analysis for data from the normal tissues would be roughly in the same manner.

# load(url("https://figshare.com/ndownloader/files/33926126"))
load("path_to_download/HCC.tumor.3k.RData")

This dataset consists of 2 varibles which are required for CommPath input:
tumor.expr : expression matrix of gene * cell. Expression values are required to be first normalized by the library-size and log-transformed;
tumor.label : a vector of lables indicating identity classes of cells in the expression matrix, and the order of lables should match the order of cells in the expression matrix; usrs may also provide a data frame containing the meta infomation of cells with the row names matching the cells in the expression matrix and a column named as Cluster must be included to indicate identity classes of cells.

Identification of marker ligands and receptors

We start CommPath analysis by creating a CommPath object:

# Classify the species of the scRNA-seq experiment by the species parameter
# CommPath now enable the analysis of scRNA-seq experiment from human (hsapiens) and mouse (mmusculus).
tumor.obj <- createCommPath(expr.mat = tumor.expr, 
        cell.info = tumor.label, 
        species = 'hsapiens')

Firstly we’re supposed to identify marker ligands and receptors (ligands and receptors that are significantly highly expressed) for each identity class of cells in the expression matrix. CommPath provide findLRmarker to identify these markers by t.test or wilcox.test.

tumor.obj <- findLRmarker(object = tumor.obj, method = 'wilcox.test')

Identification of ligand-receptor (L-R) associations

# find significant L-R pairs
tumor.obj <- findLRpairs(object = tumor.obj,
        logFC.thre = 0, 
        p.thre = 0.05)

The counts of significant LR pairs and overall interaction intensity among cell clusters are then stored in tumor.obj@interact[[‘InteractNumer’]],and the detailed information of each LR pair is stored in tumor.obj@interact[[‘InteractGeneUnfold’]].

Then you can visualize the interaction through a circos plot:

# Plot interaction for all cluster
circosPlot(object = tumor.obj)

In the above circos plot, the directions of lines indicate the associations from ligands to receptors, and the widths of lines indicate the counts of LR pairs among clusters.

# Plot interaction for all cluster
circosPlot(object = tumor.obj)

Now the widths of lines indicate the overall interaction intensity among clusters.

# Highlight the interaction of specific cluster
# Here we take the endothelial cell as an example
ident = 'Endothelial'
circosPlot(object = tumor.obj, ident = ident)

For a specific cluster of interest, CommPath provides function findLigand (findReceptor) to find the upstream (downstream) cluster and the corresponding ligand (receptor) for specific cluster and receptor (ligand):

# For the selected cluster and selected receptor, find the upstream cluster
select.ident = 'Endothelial'
select.receptor = 'ACKR1'

ident.up.dat <- findLigand(object = tumor.obj, 
    select.ident = select.ident, 
    select.receptor = select.receptor)
head(ident.up.dat)

# For the selected cluster and selected ligand, find the downstream cluster
select.ident = 'Endothelial'
select.ligand = 'CXCL12'

ident.down.dat <- findReceptor(object = tumor.obj, 
    select.ident = select.ident, 
    select.ligand = select.ligand)
head(ident.down.dat)

There are 7 columns stored in the variable ident.up.dat/ident.down.dat:
Columns Cell.From, Cell.To, Ligand, Receptor show the upstream and dowstream clusters and the specific ligands and receptors in the LR associations;
Columns Log2FC.LR, P.val.LR, P.val.adj.LR show the interaction intensity (measured by the product of Log2FCs of ligands and receptors)and the corresponding original and adjusted p value for hypothesis test of one pair of LR.

CommPath also provides dot plots to investigate its upstream clusters which release specific ligands and its downstream clusters which expressed specific receptors:

# Investigate the upstream clusters which release specific ligands to the interested cluster
dotPlot(object = tumor.obj, receptor.ident = ident)

# Investigate the downstream clusters which expressed specific receptors for the interested cluster
dotPlot(object = tumor.obj, ligand.ident = ident)

Pathway analysis

CommPath conducts pathway analysis to identify signaling pathways involving the marker ligands and receptors for each cluster.

# Find pathways in which genesets show overlap with the marker ligands and receptors in the example dataset
# CommPath provides pathway annotations from KEGG pathways, WikiPathways, reactome pathways, and GO terms
tumor.obj <- findLRpath(object = tumor.obj, category = 'kegg')

Now genesets showing overlap with the marker ligands and receptors are stored in tumor.obj@interact[[‘pathwayLR’]]. Then we score the pathways to measure the activation levels for each pathway in each cell.

# Compute pathway activation score by the gsva algorithm or an average manner
# For more information about gsva algorithm, see the GSVA package
tumor.obj <- scorePath(object = tumor.obj, method = 'gsva', min.size = 10, parallel.sz = 4)

After that CommPath provide diffAllPath to perform pathway differential activation analysis for cells in each identity class and find the receptor and ligand in the pathway:

# get significantly up-regulated pathways in each identity class
acti.path.dat <- diffAllPath(object = tumor.obj, only.posi = TRUE, only.sig = TRUE)
head(acti.path.dat)

There are several columns stored in the variable acti.path.dat:
Columns mean.diff, mean.1, mean.2, t, df, p.val, p.val.adj show the statistic result; description shows the name of pathway;
Columns cell.up and ligand.up show the upstream identity classes which would release specific ligands to interact with the receptors from the current identity class;
Column receptor.in.path shows the marker receptors expressed by the current identity class and these receptors are included in the current pathway;
Column ligand.in.path shows the marker ligands released by the current identity class and these ligands are also included in the current pathway.

Then we use pathHeatmap to plot a heatmap of those differentially activated pathways for each cluster to display the highly variable pathways:

pathHeatmap(object = tumor.obj,
       acti.path.dat = acti.path.dat,
       top.n.pathway = 10,
       sort = "p.val.adj")

Cell-cell interaction flow via pathways

For a specific cell cluster, which here we name it as B for demonstration, CommPath identify the upstream cluster A sending signals to B, the downstream cluster C receiving signals from B, and the significantly activated pathways in B to mediate the A-B-C communication flow. More exactly, through LR and pathways analysis described above, CommPath is able to identify LR pairs between A and B, LR pairs between B and C, and pathways activated in B. Then CommPath screens for pathways in B which involve both the receptors to interact with A and ligands to interact with C.

# Identification and visualization of the identified pathways
# Plot to identify receptors and the associated activated pathways for a specific cluster
select.ident = 'Endothelial'
pathPlot(object = tumor.obj, 
    select.ident = select.ident, 
    acti.path.dat = acti.path.dat)

# Plot to identify receptors, the associated activated pathways, and the downstream clusters
pathInterPlot(object = tumor.obj, 
    select.ident = select.ident, 
    acti.path.dat = acti.path.dat)

Compare cell-cell interactions between two conditions

CommPath also provide useful utilities to compare cell-cell interactions between two conditions such as disease and control. Here we, for example, used CommPath to compare the cell-cell interactions between cells from HCC tumor and normal tissues. The example data from normal tissues are also available in figshare.

# load(url("https://figshare.com/ndownloader/files/33926129"))
load("path_to_download/HCC.normal.3k.RData")

We have pre-created the CommPath object for the normal samples following the above steps. This dataset consists of 3 varibles:
normal.expr : expression matrix for cells from normal tissues;
normal.label : indentity lables for cells from normal tissues;
normal.obj : CommPath object created from normal.expr and normal.label, and processed by CommPath steps described above.

To compare 2 CommPath object, we shall first identify the differentially expressed ligands and receptors, and differentially activated pathways between the same cluster of cells in the two object.

# Take endothelial cells as example
# Identification of differentially expressed ligands and receptors 
diff.marker.dat <- diffCommPathMarker(object.1 = tumor.obj, object.2 = normal.obj, select.ident = 'Endothelial')

# Identification of differentially activated pathways 
diff.path.dat <- diffCommPathPath(object.1 = tumor.obj, object.2 = normal.obj, select.ident = 'Endothelial', parallel.sz = 4)

Then we compare the differentially activated pathways and the cell-cell communication flow mediated by those pathways.

# To compare differentially activated pathways and the involved receptors between the selected clusters of two CommPath object
pathPlot.compare(object.1 = tumor.obj, object.2 = normal.obj, select.ident = 'Endothelial', diff.marker.dat = diff.marker.dat, diff.path.dat = diff.path.dat)

# To compare the pathway mediated cell-cell communication flow for a specific cluster between 2 CommPath object
pathInterPlot.compare(object.1 = tumor.obj, object.2 = normal.obj, select.ident = 'Endothelial', diff.marker.dat = diff.marker.dat, diff.path.dat = diff.path.dat)


sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /home/luh/miniconda3/envs/seurat4/lib/libopenblasp-r0.3.17.so

locale:
 [1] LC_CTYPE=zh_CN.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=zh_CN.UTF-8        LC_COLLATE=zh_CN.UTF-8    
 [5] LC_MONETARY=zh_CN.UTF-8    LC_MESSAGES=zh_CN.UTF-8   
 [7] LC_PAPER=zh_CN.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GSVA_1.38.2     ggplot2_3.3.5   dplyr_1.0.7     reshape2_1.4.4 
[5] circlize_0.4.13 CommPath_0.1.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7                  lattice_0.20-44            
 [3] digest_0.6.27               assertthat_0.2.1           
 [5] utf8_1.2.2                  R6_2.5.1                   
 [7] GenomeInfoDb_1.26.7         plyr_1.8.6                 
 [9] stats4_4.0.3                RSQLite_2.2.8              
[11] httr_1.4.2                  pillar_1.6.2               
[13] zlibbioc_1.36.0             GlobalOptions_0.1.2        
[15] rlang_0.4.11                annotate_1.68.0            
[17] blob_1.2.2                  S4Vectors_0.28.1           
[19] Matrix_1.3-4                labeling_0.4.2             
[21] BiocParallel_1.24.1         stringr_1.4.0              
[23] RCurl_1.98-1.4              bit_4.0.4                  
[25] munsell_0.5.0               DelayedArray_0.16.3        
[27] compiler_4.0.3              pkgconfig_2.0.3            
[29] BiocGenerics_0.36.1         shape_1.4.6                
[31] tidyselect_1.1.1            SummarizedExperiment_1.20.0
[33] tibble_3.1.3                GenomeInfoDbData_1.2.4     
[35] IRanges_2.24.1              matrixStats_0.60.1         
[37] XML_3.99-0.7                fansi_0.5.0                
[39] crayon_1.4.1                withr_2.4.2                
[41] bitops_1.0-7                grid_4.0.3                 
[43] xtable_1.8-4                GSEABase_1.52.1            
[45] gtable_0.3.0                lifecycle_1.0.0            
[47] DBI_1.1.1                   magrittr_2.0.1             
[49] scales_1.1.1                graph_1.68.0               
[51] stringi_1.7.4               cachem_1.0.6               
[53] farver_2.1.0                XVector_0.30.0             
[55] ellipsis_0.3.2              generics_0.1.0             
[57] vctrs_0.3.8                 tools_4.0.3                
[59] bit64_4.0.5                 Biobase_2.50.0             
[61] glue_1.4.2                  purrr_0.3.4                
[63] MatrixGenerics_1.2.1        parallel_4.0.3             
[65] fastmap_1.1.0               AnnotationDbi_1.52.0       
[67] colorspace_2.0-2            GenomicRanges_1.42.0       
[69] memoise_2.0.0             

CommPath ChangeLog

2022.2.22

  • Add Google Analytics.
  • Fix errors when submit a new task with a new file.

2022.2.9

  • Add annotations for each plot.
  • Change appname to CommPath.
  • Remove Cluster2 in the result page.
  • Fix label errors for comparison-object.

2022.2.8

  • Fix width-height imbalance.

2022.2.7

  • On line.

scRNA-HCC instructions

A Single-Cell Atlas of the Multicellular Ecosystem of Primary and Metastatic Hepatocellular Carcinoma

SUMMARY

Hepatocellular carcinoma (HCC) represents a paradigm of the relation between tumor microenvironment (TME) and tumor development. Here, we generated > 70,000 single-cell transcriptomes for 10 HCC patients from four relevant sites: primary tumor, portal vein tumor thrombus (PVTT), metastatic lymph node and non-tumor liver. We discovered a cluster of antitumor central memory T (TCM) cells enriched in intratumoral tertiary lymphoid structures (TLSs) of HCC. We found chronic HBV/HCV infection increases the infiltration of CD8+ T cells in tumors but aggravates the exhaustion of tumor-infiltrating lymphocytes. We identified CD11b+ macrophages to be terminally differentiated tumor-associated macrophages (TAMs) and two distinct differentiation trajectories are related to their accumulation. We further demonstrated CD11b+ TAMs promote HCC cells invasion and migration, and angiogenesis. Our data also revealed the heterogeneous population of malignant hepatocytes and their potential multifaceted roles in shaping the immune microenvironment of HCC. Finally, we identified seven TME subtypes of HCC that can predict patient prognosis. Collectively, this large-scale, single-cell atlas deepens our understanding of the ecosystem in primary and metastatic HCCs, might facilitating the development of new immune therapy strategies for this malignancy.

Droplet-based scRNA-seq and gene expression quantification

Single-cell suspensions were converted to barcoded scRNA-seq libraries by using the Chromium Single Cell 3’ Library, Gel Bead & Multiplex Kit and Chip Kit (10x Genomics), aiming for an estimated 5,000 cells per library and following the manufacturer’s instructions. Samples were processed using kits pertaining to V2 barcoding chemistry of 10x Genomics. Single samples are always processed in a single well of a PCR plate, allowing all cells from a sample to be treated with the same master mix and in the same reaction vessel. For each patient, all samples (NTL, PT, PVTT and MLN) were processed in parallel in the same thermal cycler. The generated scRNA-seq libraries were sequenced on a NovaSeq sequencer (Illumina). The Cell Ranger software (version 2.2.0; 10x Genomics) was used to perform sample demultiplexing, barcode processing and single-cell 3’ counting. Cell Ranger’s mkfastq function was used to demultiplex raw base call files from the sequencer, into sample-specific fastq files. Afterward, fastq files for each sample were processed with Cell Ranger’s count function, which was used to align reads to human genome (build hg38) and quantify gene expression levels in single cells.

Quality control and batch correction

To filter out low-quality cells and doublets (two cells encapsulated in a single droplet), for each sample, cells were removed that had either fewer than 200 unique molecular identifiers (UMIs), over 8,000 or below 200 expressed genes. To filter out dead or dying cells, cells were further removed that had over 10% UMIs derived from mitochondrial genome. This resulted in a total of 71,915 high-quality single-cell transcriptomes in all samples.

To further merge samples across tissues and patients, we run a canonical correlation analysis (CCA) for batch correction using the RunMultiCCA function in R package Seurat v2. To calculate canonical correlation vectors (CCVs), variably expressed genes were selected for each sample as having a normalized expression between 0.125 and 3, and a quantile-normalized variance exceeding 0.5, and then combined across all samples. The resulting 2,773 non-redundant variable genes were summarized by CCA, and the first 15 CCVs were aligned to combine raw gene expression matrices generated per sample. The aligned CCVs were also used for tSNE dimensionality reduction using the RunTSNE function in Seurat.

Cell clustering

For cell clustering, we used the FindClusters function in Seurat v2 that implements shared nearest neighbor (SNN) modularity optimization-based clustering algorithm on 30 aligned CCVs with resolution 1–4, leading to 26–61 clusters. A resolution of 3 was chosen for the analysis and a final of 53 clusters were obtained.

RegVar tutorials

Brief introduction

RegVar is a deep neural network-based computational server for prioritizing tissue-specific regulatory impact of human noncoding SNPs on their potential target genes. RegVar integrates the sequential, epigenetic and evolutionary conservation profiles of SNPs and their potential target genes in 17 human tissues, and give tissue-specific predictions of regulatory probabilities of the provided SNPs on provided genes.

Input

Upload a file containing a list of SNPs and genes

A text file containing a list of SNP IDs and their possible target genes is required to be uploaded to the server for batch analysis.

The result will be generated based on all pairwise combinations of SNPs and genes. SNPs and genes lacking annotations are excluded and pairs of SNPs and genes that are located on different chromosomes are removed. The remaining pairs are referred to as valid pairs and RegVar would accept no more than 10,000 valid pairs.
Click here to see an example file

Or type SNP ID(s) and gene(s) in the corresponding search boxes

SNP ID(s) (indels are currently not supported) and their possible target gene(s) are accepted as input in the SNP and Gene search box, respectively. Multiple SNP IDs or genes should be delimited by commas, spaces or tabs, and if so, the result will be generated based on all pairwise combinations of SNPs and genes.

Output

All results will be listed in the result page, including the basic information of your query data (the positions of the input SNPs and TSSs of genes and the genomic distance between them, in GRCh37/hg19 genome coordinates) (positions of TSSs are annotated from GTEx eGene list, v7 release), and the regulatory probabilities calculated by RegVar.

Raw probability scores come straight from the tissue-specific model, and are interpretable as the extent to which the SNP is likely to have an effect on the regulation of the corresponding gene in your selected tissue.

A result file containing the same information will be sent to your email address, if you have it input.

Model selection

The RegVar website computes RegVar scores based on the DHS-filtered models trained on GTEx datasets. Besides, We also provide the scripts to train non-DHS-filtered models (or full models) on GTEx datasets and to train pathogenic RegVar models on HGMD dataset. Click the following download link for more information.

Software download

The datasets and source code to run RegVar locally are freely available at the download page.

3dsnp v2.0 Data Download

All data from 3dsnp including high-order modified predictions can be accessed through FTP.

You could also click links in the following table.

PS: If you have any questions or would like to access additional data, please leave a message.

Data Format Link
dbSNP154 Vcf example: chr1
HGSVC2 Vcf pangenie_merged_bi_nosnvs.integrated_callset.hg19
dbSNP153 BigBed dbSnp153Common.bb
Gene annotations GFF GCF_000001405.25_GRCh37.p13_genomic
Gene annotations RefSeq ncbiRefSeq
Assembly Fasta hg19.fa
ENCODE BigWig example: Gm12878 H3k27ac
RepeatMasker BigBed repeats
Fixation index Bed example: chr1 AMR
xp-NSL Bed example: chr1 AMR
ClinVar BigBed clinvarCnv
clinvarMain
ClinGen BigBed clinGenHaplo
clinGenTriplo
clinGenGeneDisease
scATAC-fetal BigWig example: thymus_vascular_endothelial_cells
HiC loops loop raw: Ventricle_Right
mod: Ventricle_Right
target: chr17-42337882-DEL-540 Ventricle_Right

3dsnp v1.0 API

3dsnp for developers

3DSNP provides a more powerful way for users to access the data through the use of API. SNP data can be accessed by two means: SNP ids or Chromatin position.

Overview

URL

http://cbportal.org/3dsnp/api.do

Format supported

JSON/XML

HTTP request method

GET/POST

Login required

No

Data access restrictions

Frequency limit: No

Request

Request parameters

Required Type Information
id/position true string Represents the SNP ID or genomic position, at least one of them is required,
multiple SNP IDs or positions should separated by comma ‘,’.
Dash symbol ‘-‘
The format of parameter ‘position’
should be ‘1000000-1000100’.
chrom false string Represents the chromosome of queried position and is required when parameter ‘position’ is used.
When there are more than one positions, the corresponding chromosomes should also be separated by ‘,’.
type true string Data type for searching, multiple types should be separated by comma ‘,’.
Available types are listed below.
format true string Represents data types returns. Json and XML formats are supported.

Request data type

DataType Description
basic Basic information of SNP, including sequential facts and phenotype from 1000G project.
chromhmm Chromatin state information generated by the core 15-state ChromHMM models trained
across a variety of cell types.
motif Transcription factor binding motifs altered by SNP.
tfbs Transcription factor binding sites in a variety of cell types.
eqtl Expression quantitative trait loci (eQTL).
3dgene Genes that interact the query SNP through chromatin loops.
3dsnp SNPs that interact the query SNP through chromatin loops. Not available for the query of position.
phylop PhyloP scores of genomic region surrounding the query SNP.

Response

Response parameters

Type DataType Description
id string basic SNP ID
chr string basic Chromosome name
position string basic Location of the query
MAF string basic Minor allele frequency
Ref string basic Reference Allele
Alt string basic Alternative Allele
EAS string basic Allele frequency in the EAS populations
AMR string basic Allele frequency in the AMR populations
AFR string basic Allele frequency in the AFR populations
EUR string basic Allele frequency in the EUR populations
SAS string basic Allele frequency in the SAS populations
linearClosestGene string basic Linear cloest genes
data_gene JsonArray basic listed below in JsonArray Parameters
chromhmm string chromhmm Chromatin state from ChromHMM core 15-state model
data_chromhmm JsonArray chromhmm listed below in JsonArray Parameters
motif string motif Sequence motif altered by the query SNP
data_motif JsonArray motif listed below in JsonArray Parameters
tfbs string tfbs Transcription factor binding sites the query locates
data_tfbs JsonArray tfbs listed below in JsonArray Parameters
eqtl string eqtl Expression quantitative trait loci
data_eqtl JsonArray eqtl listed below in JsonArray Parameters
data_loop_gene JsonArray 3dgene listed below in JsonArray Parameters
data_loop_snp JsonArray 3dsnp listed below in JsonArray Parameters
physcores string physcores PhyloP scores of the query SNP and its +/-10 bp adjacent regions

JsonArray Parameters

Type JsonArray Description
geneID string data_gene RefSeq Gene ID
geneName string data_gene Official gene symbol
geneRelativePosition string data_gene Relative position of the closest gene to the query
geneDescription string data_gene Gene description
chromhmmCell string data_chromhmm Cell type of the corresponding chromatin state
chromhmmName string data_chromhmm Short name of chromatin state
chromhmmFullName string data_chromhmm Full name of chromatin state
chromhmmCellDescription string data_chromhmm Cell type description
chromhmmTissue string data_chromhmm Tissue of the cell type
motif string data_motif Motif ID in TRANSFAC or JASPAR
motifStrand string data_motif Strand of the motif
motifSource string data_motif Database source of the motif
motifMatchedSequence string data_motif Matched sequence for the motif
motifMatchedSequencePos string data_motif Relative position of the query to the sequence
motifRef string data_motif Reference allele
motifAlt string data_motif Alternative allel
tfbsCell string data_tfbs Cell type of the corresponding TFBS
tfbsFactor string data_tfbs Name of the transcription factor
tfbsCellTissue string data_tfbs Tissue of the cell type
tfbsDNAAccessibility string data_tfbs DNA accessibility of the TFBS
tfbsCellDescription string data_tfbs Description for the cell type
eqtlGene string data_eqtl Related gene of the eQTL
eqtlPValue string data_eqtl P-value of the eQTL
eqtlTissue string data_eqtl Tissue in which the eQTL identified
eqtlEffect string data_eqtl Effect size of the eQTL
loopGene string data_loop_gene Genes interacting the query SNP through chromatin loops
loopGeneID string data_loop_gene RefSeq Gene ID
loopGeneDescription string data_loop_gene Gene description
loopCell string data_loop_gene/data_loop_snp Cell type in which the chromatin loop was identified
loopCellTissue string data_loop_gene/data_loop_snp Tissue of the cell type
loopCellDescription string data_loop_gene/data_loop_snp Cell type description
loopStart string data_loop_gene/data_loop_snp Start genomic position of the chromatin loop
loopEnd string data_loop_gene/data_loop_snp End genomic position of the chromatin loop
loopType string data_loop_gene/data_loop_snp Type of the chromatin loop: “Within Loop” or “Anchor-to-Anchor”
loopSNP string data_loop_snp SNPs interacting with the query and in the same LD block through chromatin loops
loopLD string data_loop_snp r^2 in LD
loopPopulation string data_loop_snp Continental population (AFR, AMR, ASN, EUR and SAS)

Request with id

URL example1 : single snp and single data type in json format

Request URL :

http://3dsnp.cbportal.org/api.do?id=rs1000&format=json&type=basic

Response format :

[{
"id":"rs1000",
"position":"32153894",
"chrom":"chr6",
"AFR":"",
"AMR":"",
"Alt":"",
"EAS":"",
"EUR":"",
"Ref":"",
"SAS":"",
"MAF":"",
"linearClosestGene":"AGER,177,upstream-variant-2KB;PBX2,5089,utr-variant-3-prime",
"data_gene": [
    {
    "geneID":"177",
    "geneName":"AGER",
    "geneRelativePosition":"upstream-variant-2KB",
    "geneDescription":"advanced glycosylation end product-specific receptor"
    },
    {
    "geneID":"5089",
    "geneName":"PBX2",
    "geneRelativePosition":"utr-variant-3-prime",
    "geneDescription":"pre-B-cell leukemia homeobox 2"
    }
]}]

URL example2 : mutilple snps and mutilple data types in xml format

Request URL :

http://3dsnp.cbportal.org/api.do?id=rs1000,rs10&format=xml&type=basic,eqtl,motif

Response format :

<?xml version="1.0" encoding="utf-8"?>
<a>
<e class="object">
<AFR type="string" />
<AMR type="string" />
<Alt type="string" />
<EAS type="string" />
<EUR type="string" />
<MAF type="string" />
<Ref type="string" />
<SAS type="string" />
<chrom type="string">chr6</chrom>
<data_gene class="array">
<e class="object">
<geneDescription type="string">advanced glycosylation end product-specific receptor</geneDescription>
<geneID type="string">177</geneID>
<geneName type="string">AGER</geneName>
<geneRelativePosition type="string">upstream-variant-2KB</geneRelativePosition>
</e>
<e class="object">
<geneDescription type="string">pre-B-cell leukemia homeobox 2</geneDescription>
<geneID type="string">5089</geneID>
<geneName type="string">PBX2</geneName>
<geneRelativePosition type="string">utr-variant-3-prime</geneRelativePosition>
</e>
</data_gene>
<eqtl type="string" />
<id type="string">rs1000</id>
<linearClosestGene type="string">AGER,177,upstream-variant-2KB;PBX2,5089,utr-variant-3-prime</linearClosestGene>
<motif type="string" />
<position type="number">32153894</position>
</e>
<e class="object">
<AFR type="string">0.997</AFR>
<AMR type="string">0.9524</AMR>
<Alt type="string">C</Alt>
<EAS type="string">1</EAS>
<EUR type="string">0.9453</EUR>
<MAF type="string">A,0.019369</MAF>
<Ref type="string">A</Ref>
<SAS type="string">0.9949</SAS>
<chrom type="string">chr7</chrom>
<data_gene class="array">
<e class="object">
<geneDescription type="string">cyclin-dependent kinase 6</geneDescription>
<geneID type="string">1021</geneID>
<geneName type="string">CDK6</geneName>
<geneRelativePosition type="string">intron-variant</geneRelativePosition>
</e>
</data_gene>
<eqtl type="string" />
<id type="string">rs10</id>
<linearClosestGene type="string">CDK6,1021,intron-variant</linearClosestGene>
<motif type="string" />
<position type="number">92383887</position>
</e>
</a>

Request with position

URL example3 : single position and single data type in json format

Request URL :

http://3dsnp.cbportal.org/api.do?position=1000000-1100000&chrom=chr11&format=json&type=basic

Response format :

[{
"id":"rs544411125",
"position":"1000017",
"chrom":"chr11",
"AFR":"0",
"AMR":"0",
"Alt":"A",
"EAS":"0",
"EUR":"0",
"Ref":"G",
"SAS":"0.001",
"MAF":"A,0.000199681",
"linearClosestGene":"AP2A2,161,intron-variant",
"data_gene":[
    {
    "geneID":"161",
    "geneName":"AP2A2",
    "geneRelativePosition":"intron-variant",
    "geneDescription":"adaptor related protein complex 2 alpha 2 subunit"
    }]
},
{
"id":"rs561110574",
"position":"1000027",
"chrom":"chr11",
"AFR":"0.0015",
"AMR":"0",
"Alt":"T",
"EAS":"0",
"EUR":"0",
"Ref":"G",
"SAS":"0",
"MAF":"T,0.000399361",
"linearClosestGene":"AP2A2,161,intron-variant",
"data_gene":[
    {
    "geneID":"161",
    "geneName":"AP2A2",
    "geneRelativePosition":"intron-variant",
    "geneDescription":"adaptor related protein complex 2 alpha 2 subunit"
    }]}
]

URL example4 : single position and mutilple data types in xml format

Request URL :

http://3dsnp.cbportal.org/api.do?position=100000-1000100&chrom=chr1&format=xml&type=eqtl,motif

Response format :

<?xml version="1.0" encoding="utf-8"?>
<a>
<e class="object">
<chrom type="string">chr1</chrom>
<data_motif class="array">
<e class="object">
<motif type="string">HEN1_02</motif>
<motifAlt type="string">G</motifAlt>
<motifMatchedSequence type="string">CAGGAAAGCAGCTGGGGGTCCA</motifMatchedSequence>
<motifMatchedSequencePos type="string">21</motifMatchedSequencePos>
<motifRef type="string">A</motifRef>
<motifSource type="string">Transfac</motifSource>
<motifStrand type="string">+</motifStrand>
</e>
</data_motif>
<eqtl type="string" />
<id type="string">rs537152617</id>
<motif type="string">Transfac,HEN1_02,+,CAGGAAAGCAGCTGGGGGTCCA,21</motif>
<position type="number">1000036</position>
</e>
<e class="object">
<chrom type="string">chr1</chrom>
<data_motif class="array">
<e class="object">
<motif type="string">MUSCLE_INI_B</motif>
<motifAlt type="string">T</motifAlt>
<motifMatchedSequence type="string">TCCCGTGGCCATTCAGGCGCC</motifMatchedSequence>
<motifMatchedSequencePos type="string">4</motifMatchedSequencePos>
<motifRef type="string">C</motifRef>
<motifSource type="string">Transfac</motifSource>
<motifStrand type="string">-</motifStrand>
</e>
<e class="object">
<motif type="string">MINI19_B</motif>
<motifAlt type="string">T</motifAlt>
<motifMatchedSequence type="string">TCCCGTGGCCATTCAGGCGCC</motifMatchedSequence>
<motifMatchedSequencePos type="string">4</motifMatchedSequencePos>
<motifRef type="string">C</motifRef>
<motifSource type="string">Transfac</motifSource>
<motifStrand type="string">-</motifStrand>
</e>
<e class="object">
<motif type="string">MINI20_B</motif>
<motifAlt type="string">T</motifAlt>
<motifMatchedSequence type="string">TCCCGTGGCCATTCAGGCGCC</motifMatchedSequence>
<motifMatchedSequencePos type="string">4</motifMatchedSequencePos>
<motifRef type="string">C</motifRef>
<motifSource type="string">Transfac</motifSource>
<motifStrand type="string">-</motifStrand>
</e>
</data_motif>
<eqtl type="string" />
<id type="string">rs573794673</id>
<motif type="string">Transfac,MUSCLE_INI_B,-,TCCCGTGGCCATTCAGGCGCC,4;Transfac,MINI19_B,-,TCCCGTGGCCATTCAGGCGCC,4;Transfac,MINI20_B,-,TCCCGTGGCCATTCAGGCGCC,4</motif>
<position type="number">1000090</position>
</e>
<e class="object">
<chrom type="string">chr11</chrom>
<eqtl type="string" />
<id type="string">rs544411125</id>
<motif type="string" />
<position type="number">1000017</position>
</e>
<e class="object">
<chrom type="string">chr11</chrom>
<eqtl type="string" />
<id type="string">rs561110574</id>
<motif type="string" />
<position type="number">1000027</position>
</e>
</a>

Sample code for java developers

JSON-lib is required for the example, a java library for transforming beans, maps, collections, java arrays and XML to JSON and back again to beans and DynaBeans. You could download it in https://sourceforge.net/projects/json-lib/

public static void main(String[] args) {

    String res_str = MyHttpRequest.sendPost("http://cbportal.org/3dsnp/api.do", "id=rs900012&format=json&type=basic,eqtl,motif");
    JSONArray res_array = JSONArray.fromObject(res_str);
    
    StringBuilder builder = new StringBuilder();
    
    for (int index_snp = 0; index_snp < res_array.size();index_snp++){
    
    JSONObject res_obj = res_array.getJSONObject(index_snp);
    builder.append("id : "+ res_obj.getString("id")+" , chrom : " +res_obj.getString("chrom") + " , num_eqtl : "+res_obj.getJSONArray("data_eqtl").size()+" , num_motif : " + res_obj.getJSONArray("data_motif").size()+"\n");
    }
    
    System.out.println(builder.toString());
}

MyHttpRequest.class is used to send HTTP request.

public class MyHttpRequest {

public static String sendGet(String url, String param) {
    String result = "";
    BufferedReader in = null;
    try {
        String urlNameString = url + "?" + param;
        URL realUrl = new URL(urlNameString);
        
        URLConnection connection = realUrl.openConnection();
        
        connection.setRequestProperty("accept", "*/*");
        connection.setRequestProperty("connection", "Keep-Alive");
        connection.setRequestProperty("user-agent",
        "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1)");
        
        connection.connect();
        
        Map<String, List<String>> map = connection.getHeaderFields();
        
        for (String key : map.keySet()) {
            System.out.println(key + "--->" + map.get(key));
        }

        in = new BufferedReader(new InputStreamReader(connection.getInputStream(),"UTF-8"));
        String line;
        while ((line = in.readLine()) != null) {
            result += line;
        }
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        try {
            if (in != null) {
                in.close();
            }
        } catch (Exception e2) {
            e2.printStackTrace();
        }
    }
    return result;
}

public static String sendPost(String url, String param) {
    PrintWriter out = null;
    BufferedReader in = null;
    String result = "";
    try {
        URL realUrl = new URL(url);
        URLConnection conn = realUrl.openConnection();
        
        conn.setRequestProperty("accept", "*/*");
        conn.setRequestProperty("connection", "Keep-Alive");
        conn.setRequestProperty("user-agent",
        "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1)");
        
        conn.setDoOutput(true);
        conn.setDoInput(true);
        
        out = new PrintWriter(conn.getOutputStream());
        out.print(param);
        out.flush();
        
        in = new BufferedReader(
        new InputStreamReader(conn.getInputStream(),"UTF-8"));
        String line;
        while ((line = in.readLine()) != null) {
            result += line;
        }
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        try{
            if(out!=null){
                out.close();
            }
            if(in!=null){
                in.close();
            }
        } catch(IOException ex){
            ex.printStackTrace();
        }
    }
}

3dsnp v2.0 changelog

2022.4.11

  • Fix link to 3dsnp v1.0 (cbportal is outdated).

2022.3.31

  • Fix export errors for excel.
  • Add an secure alternative link 3dsnp.omic.tech

2021.12.19

  • Fix export errors for igvtools.

2021.12.18

  • Add allele frequency information of major populations in the export data for the main table.
  • 3dsnp v2.0 have been published on NAR. So citing information in the pagefoot is updated.
  • Fix excel export button for the main table.

2021.10.6

  • Add a pie chart for the nearest scATAC peaks
  • Add Zoom functions, point labels, and borders to the Umap of the nearest scATAC peaks

2021.10.4

  • Add all tables for SVs from ClinVar.
  • Add annotations about ClinVar in the 3dsnp v2 tutorials.
  • Update documentation links to the 3dsnp v2 tutorials and API.
  • Update scores for SVs from HGSVC.

2021.10.3

  • Add SV-data from ClinVar.
  • Add Pathogenicity-data from ClinVar for dbSNP v155.
  • Add Pathogenicity-track in the IGVtools.

2021.09.24

  • Add LD-data for SVs.
  • Add LD-data for SNPs in AFR population.
  • Add SNP affected loops for each tissue.
  • Fix circos plot errors: no LD snps.

2021.09.19

  • Update documentation links.
  • Add 3dnps v2.0 documentation.
  • Add wordpress documentation for omic.tech.

2021.09.18

  • Add IGV tracks for Fst and xpNSL per each major population in IGSR 1000 genomes.
  • Add pagings for each table in the details page.

2021.09.16

  • Add LD-data check when user click the LD-detail button.
  • Fix loading picture error.
  • Fix picture saving error of the scATAC-plot.

2021.09.15

  • Add IGVtools.
  • Add HGSVC2 structural variations and Hi-C structure predictions.
  • Add scATAC data.
  • Add cCRE scores from scRNA-seq data.
  • Add statisticss of population genetics.
  • Update snp collections to dbSNP v154.
  • Fix foot location error.

3dsnp v2.0 API

3dsnp v2 for developers

3DSNP v2 extends the API functions of the previous version. The domain was changed and more importantly, SV data and some new tables were included. Now data can be accessed by three means: SNP ids, SV ids or Chromatin position.

We recommend using positions to search for variants which will display both SNPs and SVs in the target region.

We marked new features with *

Note: The original API are always open.

Overview

URL

https://omic.tech/3dsnpv2/api.do

Format supported

JSON/XML

HTTP request method

GET/POST

Login required

No

Data access restrictions

Frequency limit: No

Request

Request parameters

Required Type Information
id*/position true string Represents the SNP/SV ID or genomic position, at least one of them is required,
multiple SNP IDs or positions should separated by comma ‘,’.
Dash symbol ‘-‘
The format of parameter ‘position’
should be ‘1000000-1000100’. SV IDs could be found in HGSVC v2.
chrom false string Represents the chromosome of queried position and is required when parameter ‘position’ is used.
When there are more than one positions, the corresponding chromosomes should also be separated by ‘,’.
type true string Data type for searching, multiple types should be separated by comma ‘,’.
Available types are listed below.
format true string Represents data types returns. Json and XML formats are supported.

Request data type

DataType Description
basic Basic information of SNP, including sequential facts and phenotype from 1000G project.
chromhmm Chromatin state information generated by the core 15-state ChromHMM models trained
across a variety of cell types.
motif Transcription factor binding motifs altered by SNP.
tfbs Transcription factor binding sites in a variety of cell types.
eqtl Expression quantitative trait loci (eQTL).
3dgene Genes that interact the query SNP through chromatin loops.
3dsnp SNPs that interact the query SNP through chromatin loops. Not available for the query of position.
phylop PhyloP scores of genomic region surrounding the query SNP.
ccre* The status of open chromatin for over 750,000 candidate cis-regulatory elements (cCREs) in 54 distinct cell types.
genetics* Integrated haplotype scores (iHS) and Fixation index (Fst) for five continental population obtained from 1000 Genomes Phase 3 (final phase)
clinvar* ClinVar aggregates information about genomic variation and its relationship to human health.

Response

Response parameters

Type DataType Description
id string basic SNP ID
chr string basic Chromosome name
position string basic Location of the query
MAF string basic Minor allele frequency
Ref string basic Reference Allele
Alt string basic Alternative Allele
EAS string basic Allele frequency in the EAS populations
AMR string basic Allele frequency in the AMR populations
AFR string basic Allele frequency in the AFR populations
EUR string basic Allele frequency in the EUR populations
SAS string basic Allele frequency in the SAS populations
linearClosestGene string basic Linear cloest genes
data_gene JsonArray basic listed below in JsonArray Parameters
chromhmm string chromhmm Chromatin state from ChromHMM core 15-state model
data_chromhmm JsonArray chromhmm listed below in JsonArray Parameters
motif string motif Sequence motif altered by the query SNP
data_motif JsonArray motif listed below in JsonArray Parameters
tfbs string tfbs Transcription factor binding sites the query locates
data_tfbs JsonArray tfbs listed below in JsonArray Parameters
eqtl string eqtl Expression quantitative trait loci
data_eqtl JsonArray eqtl listed below in JsonArray Parameters
data_loop_gene JsonArray 3dgene listed below in JsonArray Parameters
data_loop_snp JsonArray 3dsnp listed below in JsonArray Parameters
physcores string physcores PhyloP scores of the query SNP and its +/-10 bp adjacent regions
ccre.position* string ccre The corresponding peak position of cCREs
mapping* string ccre Mapping rate of cCREs
Fol/Acn/Skm1/…/Swn_2* string ccre cCREs in 54 distinct cell types
Fst_EUR* string genetics Fixation index in EUR
Fst_SAS* string genetics Fixation index in SAS
Fst_EAS* string genetics Fixation index in EAS
Fst_AMR* string genetics Fixation index in AMR
Fst_AFR* string genetics Fixation index in AFR
iHS_EUR* string genetics Integrate Haplotype score in EUR
iHS_SAS* string genetics Integrate Haplotype score in SAS
iHS_EAS* string genetics Integrate Haplotype score in EAS
iHS_AMR* string genetics Integrate Haplotype score in AMR
iHS_AFR* string genetics Integrate Haplotype score in AFR
xpnsl_EUR* string genetics cross-population NSL in EUR
xpnsl_SAS* string genetics cross-population NSL in SAS
xpnsl_EAS* string genetics cross-population NSL in EAS
xpnsl_AMR* string genetics cross-population NSL in AMR
xpnsl_AFR* string genetics cross-population NSL in AFR
ClinVarID* string clinvar the ClinVar Allele ID
CLNDN* string clinvar ClinVar’s preferred disease name for the concept specified by disease identifiers in CLNDISDB
CLNDISDB* string clinvar Tag-value pairs of disease database name and identifier
CLNREVSTAT* string clinvar ClinVar review status for the Variation ID
CLNSIG* string clinvar Clinical significance for this single variant
CLNSIGCONF* string clinvar Conflicting clinical significance for this single variant
CLNVC* string clinvar Variant type
CLNVCSO* string clinvar Sequence Ontology id for variant type
CLNVI* string clinvar the variant’s clinical sources reported as tag-value pairs of database and variant identifier
GENEINFO* string clinvar Gene(s) for the variant reported as gene symbol:gene id
MC* string clinvar comma separated list of molecular consequence in the form of Sequence Ontology ID|molecular_consequence
ORIGIN* string clinvar Allele origin

JsonArray Parameters

Type JsonArray Description
geneID string data_gene RefSeq Gene ID
geneName string data_gene Official gene symbol
geneRelativePosition string data_gene Relative position of the closest gene to the query
geneDescription string data_gene Gene description
chromhmmCell string data_chromhmm Cell type of the corresponding chromatin state
chromhmmName string data_chromhmm Short name of chromatin state
chromhmmFullName string data_chromhmm Full name of chromatin state
chromhmmCellDescription string data_chromhmm Cell type description
chromhmmTissue string data_chromhmm Tissue of the cell type
motif string data_motif Motif ID in TRANSFAC or JASPAR
motifStrand string data_motif Strand of the motif
motifSource string data_motif Database source of the motif
motifMatchedSequence string data_motif Matched sequence for the motif
motifMatchedSequencePos string data_motif Relative position of the query to the sequence
motifRef string data_motif Reference allele
motifAlt string data_motif Alternative allel
tfbsCell string data_tfbs Cell type of the corresponding TFBS
tfbsFactor string data_tfbs Name of the transcription factor
tfbsCellTissue string data_tfbs Tissue of the cell type
tfbsDNAAccessibility string data_tfbs DNA accessibility of the TFBS
tfbsCellDescription string data_tfbs Description for the cell type
eqtlGene string data_eqtl Related gene of the eQTL
eqtlPValue string data_eqtl P-value of the eQTL
eqtlTissue string data_eqtl Tissue in which the eQTL identified
eqtlEffect string data_eqtl Effect size of the eQTL
loopGene string data_loop_gene Genes interacting the query SNP through chromatin loops
loopGeneID string data_loop_gene RefSeq Gene ID
loopGeneDescription string data_loop_gene Gene description
loopCell string data_loop_gene/data_loop_snp Cell type in which the chromatin loop was identified
loopCellTissue string data_loop_gene/data_loop_snp Tissue of the cell type
loopCellDescription string data_loop_gene/data_loop_snp Cell type description
loopStart string data_loop_gene/data_loop_snp Start genomic position of the chromatin loop
loopEnd string data_loop_gene/data_loop_snp End genomic position of the chromatin loop
loopType string data_loop_gene/data_loop_snp Type of the chromatin loop: “Within Loop” or “Anchor-to-Anchor”
loopSNP string data_loop_snp SNPs interacting with the query and in the same LD block through chromatin loops
loopLD string data_loop_snp r^2 in LD
loopPopulation string data_loop_snp Continental population (AFR, AMR, ASN, EUR and SAS)

Request with position

URL example1 : single position and single data type in json format

Request URL :

https://www.omic.tech/3dsnpv2/api.do?position=1000000-1100000&chrom=chr11&format=json&type=basic

Response format :

[
    {
        "id":"chr11-1009478-INS-50",
        "position":"1009477",
        "chrom":"chr11",
        "AFR":"0",
        "AMR":"0",
        "Alt":"AACACGCAGCCCATGACCCCGCGCCAGGGTCTGGAGGGACGGCCCCGGGGG",
        "EAS":"0",
        "EUR":"0",
        "Ref":"A",
        "SAS":"0",
        "MAF":"INS,0.000000",
        "linearClosestGene":""
    },
    {
        "id":"rs544411125",
        "position":"1000017",
        "chrom":"chr11",
        "AFR":"0",
        "AMR":"0",
        "Alt":"A",
        "EAS":"0",
        "EUR":"0",
        "Ref":"G",
        "SAS":"0.001",
        "MAF":"A,0.000199681",
        "linearClosestGene":"AP2A2,161,intron-variant",
        "data_gene":[
            {
            "geneID":"161",
            "geneName":"AP2A2",
            "geneRelativePosition":"intron-variant",
            "geneDescription":"adaptor related protein complex 2 alpha 2 subunit"
            }]
        }
]

URL example2 : single position and mutilple data types in xml format

Request URL :

https://www.omic.tech/3dsnpv2/api.do?position=100000-1000100&chrom=chr1&format=xml&type=eqtl,motif

Response format :

<a>
    <e class="object">
        <chrom type="string">chr1</chrom>
        <eqtl type="string"/>
        <id type="string">chr1-121118-INS-113</id>
        <motif type="string"/>
        <position type="string">121117</position>
    </e>
    <e class="object">
        <chrom type="string">chr1</chrom>
        <id type="string">chr1-126241-DEL-38630</id>
        <position type="string">126241</position>
        <data_motif class="array">
            <e class="object">
                <motif type="string">CEBPB_02</motif>
                <motifAlt type="string">DEL</motifAlt>
                <motifMatchedSequence type="string">TGATTGCACCACTG</motifMatchedSequence>
                <motifMatchedSequencePos type="string">16992</motifMatchedSequencePos>
                <motifRef type="string">.</motifRef>
                <motifSource type="string">Transfac</motifSource>
                <motifStrand type="string">-</motifStrand>
            </e>
            <e class="object">
                <motif type="string">ETS1_B</motif>
                <motifAlt type="string">DEL</motifAlt>
                <motifMatchedSequence type="string">GCAGGAAGTCAGGGA</motifMatchedSequence>
                <motifMatchedSequencePos type="string">-27799</motifMatchedSequencePos>
                <motifRef type="string">.</motifRef>
                <motifSource type="string">Transfac</motifSource>
                <motifStrand type="string">+</motifStrand>
            </e>
        </data_motif>
        <eqtl type="string"/>
        <motif type="string">Transfac,CEBPB_02,-,TGATTGCACCACTG,16992;Transfac,ETS1_B,+,GCAGGAAGTCAGGGA,-27799;Transfac,CEBPB_01,+,GGGTGAGGCAAGGG,-10490;Transfac,EBF_Q6,-,TTCCCTTGAGA,32414;Transfac,KROX_Q6,-,CTCGCCCCCTCCTC,4826;Transfac,CEBP_Q2_01,+,GTTGCCCAAGCT,-24111;Transfac,MTF1_Q4,-,ACTGCGCCCAGCCT,37618;Jaspar,SPI-1,-,CGGAAG,3705;Transfac,MYOD_Q6_01,-,TTGAAGCAGGTGATGGAG,24991;Transfac,TEL2_Q6,-,CCACTTCCTG,32686;Transfac,CRX_Q4,+,CCCGTAATCCCAG,-27209;Transfac,R_01,-,TGGGCCACCGGATGTGGTCCT,5445;Transfac,HNF4_01,-,ACGCGGACAGAGGTCAGCG,10966;Transfac,PAX4_01,+,GGAGGTGACCCGTGGGCAGCC,-6023;Transfac,PAX4_02,+,GAATAATTGCC,-1320;Transfac,PAX4_03,-,AGCCCCCACCCC,8402;Transfac,PAX4_04,+,AAAAATTAGCCGGGTGTGGTGGCACACACC,-3883;Transfac,IK3_01,+,TACTGGGAATGTC,-16898;Jaspar,SAP-1,-,ACCGGATGT,5439;Transfac,E2F1_Q4,+,CTTGGCGG,-33552;Transfac,HNF1_Q6,-,AGGTTAATAATTATCTCT,35228;Transfac,E2F1_Q3,+,CGTGGCGC,-28392;Transfac,AR_02,-,CGCCCACGATCAACGTGTTCTGTTCTG,8539;Transfac,ETF_Q6,+,GCGGCGG,-11412;Transfac,EN1_01,-,GTAGTGG,3310;Transfac,SREBP_Q3,-,CCCATCACCCCA,17405;Transfac,AP4_01,-,AGGATCACCTGAGGTCAG,3413;Transfac,HAND1E47_01,+,GGTGGTGTCTGGCACT,-5938;Transfac,E2F1_Q3_01,-,TGGGCGGCAGCAGGGC,6056;Transfac,STAT3_01,-,GGTGATTTCCAGGATGTGAGC,17822;Transfac,MYB_Q3,+,GGTGCCAGTTG,-7224;Transfac,HMEF2_Q6,-,GGCTAAAACTACCCCT,35670;Transfac,EGR2_01,-,TCACGTGGGCGG,6061;Transfac,E2F_Q2,-,GGCGCG,6794;Transfac,PAX8_01,-,CGGTGTCGAGTGAGG,13827;Transfac,RP58_01,-,AACACATCTGGA,37199;Transfac,CEBPGAMMA_Q6,-,CCCACTTCAGAGA,19517;Transfac,HEN1_01,+,TCGGTGCTCAGCTGAGTCTGCA,-2833;Transfac,E2_Q6_01,-,CCCACCGTCTCTGGTT,19989;Transfac,HEN1_02,-,CCTGGGCCCAGCTCCGTCCTCT,9184;Transfac,USF2_Q6,+,CACGCG,-11114;Transfac,SP1_Q6,+,CAAGGGCGGGGCC,-11202;Transfac,SMAD4_Q6,+,AGGATGCAGCCAGCT,-33630;Transfac,CIZ_01,+,GAAAAAGCC,-12404;Transfac,TAL1ALPHAE47_01,-,TTGGCCAGATGGGGTC,14330;Jaspar,deltaEF1,+,CACCTG,-3326;Transfac,POLY_C,-,GAGAAAACCCTCCTGCTG,8438;Jaspar,ARNT,+,CACGTG,-6055;Transfac,MEF3_B,-,TGCCCAGGTTTCA,28126;Transfac,GATA2_01,+,GGGGATGGGG,-6520;Transfac,GR_01,+,GCAGCATGGGCAGGATGTTCTGCACAC,-7429;Transfac,CEBP_C,+,AGTGTGAGGCAAGACCTG,-12861;Jaspar,NF-kappaB,-,GGGAATTTCC,28429;Transfac,EGR3_01,+,CAGCGTGGGAGG,-10034;Transfac,TANTIGEN_B,+,GGGAGGCCGAGGCAGGCAG,-3797;Transfac,SRF_C,-,GCCTTTTTTGGCCCA,12574;Transfac,E4F1_Q6,-,CCTACGTCAC,13357;Jaspar,PPARgamma,-,AGAGGTCAGCGTGACCCCCT,9983;Transfac,HSF_Q6,+,TCCCAGGAGTTTC,-20707;Transfac,EGR1_01,-,TCACGTGGGCGG,6061;Transfac,ETS_Q4,-,TTCCACTTCCTG,32688;Transfac,USF_C,+,CCACGTGA,-6054;Transfac,E2_01,+,GAACCAGAGACGGTGG,-19973;Transfac,AHRHIF_Q6,-,CGCGTGCGG,11119;Transfac,RFX1_02,+,CTGTAGCCTAAGCAACAG,-22798;Transfac,BARBIE_01,-,TTCAAAAGGTGAGGG,28660;Transfac,FXR_IR1_Q6,+,GGATGAATGTCCC,-28051;Transfac,HNF3ALPHA_Q6,-,TGTTTGTTTTG,4737;Transfac,STRA13_01,-,GCCTCACGTGACTC,7198;Transfac,AHR_Q5,+,GTGGCGTGTGC,-21067;Transfac,ZF5_01,-,GGGCGCGG,6795;Jaspar,p65,-,GGGAATTTCC,28429;Transfac,FREAC3_01,-,GGCATGTAAATAAAGA,23069;Transfac,ATATA_B,+,GTATATAAGC,-31222;Transfac,ACAAT_B,+,GATTGGTGG,-26027;Transfac,AP4_Q5,+,CTCAGCTGGC,-13970;Transfac,AP4_Q6,+,CTCAGCTGGC,-13970;Jaspar,Yin-Yang,-,GCCATC,3377;Transfac,ZTA_Q2,-,TCACAGTGACTCA,14023;Transfac,E12_Q6,+,GGCAGGTGCCA,-7403;Transfac,ELK1_02,+,GCTGCCGGAAGGGA,-8752;Transfac,MYC_Q2,+,CACGTGG,-10864;Transfac,LBP1_Q6,-,CAGCTGC,2984;Transfac,TFIII_Q6,+,AGAGGGAGG,-19953;Transfac,LMO2COM_02,+,CAGATAGGG,-43;Transfac,LMO2COM_01,-,CCCCAGGTGTTG,7655;Transfac,SMAD_Q6,-,AGACTCCCC,9856;Transfac,MAF_Q6,+,TGAGGGCAAGTTGGCA,-34778;Jaspar,cEBP,-,TGGCGCAACCTT,38390;Jaspar,c-REL,+,GGGGAATTCC,-23710;Transfac,MUSCLE_INI_B,-,TCCCCCCACCACCCCCTCCCA,30643;Transfac,AP4_Q6_01,+,GCCAGCTGT,-36895;Transfac,DR3_Q4,+,CATCCCCTTCCTGACCCCTCC,-4972;Transfac,STAT5A_04,-,CACTTCCG,16011;Transfac,ATF4_Q2,-,GCTGACGCCACG,4915;Transfac,SPZ1_01,-,GGTGGAGGGATGGGG,16533;Jaspar,TCF11-MafG,+,CATGAC,-3852;Transfac,PAX2_02,+,CACAAACCC,-23836;Transfac,LUN1_01,+,TCCCAGCTACTTGGGAG,-3918;Transfac,PAX2_01,-,CCCTGTCACTCAGGATGGA,20254;Transfac,MAZR_01,-,TGGGGAGGGGCAC,27106;Transfac,MYOGNF1_01,+,AATCCTTTCAGTTTGGGACGGAGTAAGGC,-7790;Transfac,HSF2_01,-,GGAAGCTTCG,13805;Transfac,T3R_01,+,CTGGGAGGTCACGGCT,-21588;Transfac,ZIC3_01,+,TGGGGGGTC,-13048;Transfac,ISRE_01,+,CAGTTTCTCTTCCTG,-29546;Jaspar,Bsap,+,TGGTCAACGCAGCAGAGCGG,-6478;Transfac,CDXA_02,+,ATTACTG,-16382;Transfac,CREB_Q4_01,+,CCGTGACGTAG,-13346;Transfac,ARNT_02,+,CGAGAGTCACGTGAGGCTGA,-7182;Transfac,HOGNESS_B,-,GTGGTGGCTCACGCCTGTAATCCCAGCACT,8124;Transfac,ARNT_01,-,CAGCTCACGTGGGCGG,6065;Transfac,HIF1_Q3,-,GCCCGCGTGCGGCC,11122;Transfac,LFA1_Q6,-,GGGGTCAG,7534;Transfac,GR_Q6,-,GGGCCTCGCTCTGTTGTCC,27466;Transfac,TEF1_Q6,+,GGAATG,-1360;Transfac,BACH1_01,-,GCTATGAGTCACCAC,1540;Transfac,TBP_Q6,+,TTTATAC,-8715;Transfac,E47_02,-,AATTACAGGTGTACGC,21546;Transfac,CP2_02,+,GCTGGGCTGAGCCAC,-6680;Transfac,E47_01,-,AGGGCAGGTGGCTCC,5145;Transfac,MEIS1_01,+,GAGTGACAGGGC,-20244;Transfac,PR_01,-,TGTTGAGGAGAATGCTGTTCTCATTGT,36718;Jaspar,MZF_1-4,+,TGGGGA,-2671;Transfac,OCT1_07,+,TTTATGGTAATT,-31767;Jaspar,Androgen,-,TTTGGCACAGCATGTACCTGTC,34465;Transfac,ZID_01,+,CAGCTCCATCACC,-24971;Jaspar,Pax6,+,TTCACGCTTTAGTT,-2658;Transfac,AREB6_02,+,ACACACCTGTAG,-3906;Transfac,AREB6_03,-,GTGCACCTGTAG,1658;Transfac,PAX_Q6,+,CTGGAAATCAC,-14033;Transfac,RREB1_01,+,CCCCAAAAAACCCT,-1014;Transfac,MEF2_01,-,GGCTAAAACTACCCCT,35670;Transfac,LPOLYA_B,+,CAATAAAG,-22981;Transfac,MEF2_03,-,TAGGTGCCTATAAATAGCATAG,31727;Transfac,ER_Q6,-,AGAGGTCAGCGTGACCCCC,9983;Transfac,MYB_Q6,-,CCCAACTGGC,7236;Transfac,PPARG_02,+,TTCCAGGTGAAGGTGGCCCACTT,-5598;Transfac,HFH4_01,-,TTATGTTTGTTTA,382;Transfac,HEB_Q6,-,GCCAGCTG,13979;Transfac,PPAR_DR1_Q2,+,TGACCTCTGTCCA,-10853;Transfac,OLF1_01,+,CAAGGTTCCCTAGAGAAATGGC,-35076;Transfac,MYOD_01,+,ACACAGGTGGTG,-5933;Transfac,CREBP1_Q2,-,GCTGACGCCACG,4915;Transfac,NERF_Q2,+,TTGCAGGAAGTCAGGGAC,-27797;Transfac,IRF_Q6,+,GTCAGTTTCTCTTCC,-29544;Transfac,XPF1_Q6,+,TCTGGGCAAC,-32109;Transfac,GEN_INI3_B,-,CCTCATTC,17236;Transfac,STAT6_02,+,GCCTTCCT,-7817;Transfac,AR_01,+,GGTACATGCTGTGCC,-34448;Transfac,NFKAPPAB_01,-,GGGAATTTCC,28429;Jaspar,HNF-1,-,GGTTAATAATTATC,35227;Transfac,EGR_Q6,+,GTGGGGGCAAG,-11163;Transfac,LYF1_01,+,TTTGGGAGG,-3584;Transfac,PPARA_01,-,CTGCCCCAGGCCAAATTTCT,12377;Transfac,PPARA_02,-,TGGGGTCAGGCAGGGCTGG,7535;Transfac,COUP_DR1_Q6,+,GGACCTTTGGCTT,-38525;Transfac,GATA1_02,-,TTCTAGATAGGGGC,21667;Transfac,VDR_Q3,-,GAGGGAATGGGGAGA,8449;Transfac,T3R_Q6,+,CCTGTCCTC,-6382;Transfac,VDR_Q6,+,CTGCCTGACCCC,-7523;Transfac,LXR_Q3,-,TGGGGTGACCCTGGTGCG,5511;Jaspar,FREAC-4,+,GTAAACAT,-20345;Transfac,LXR_DR4_Q3,+,TGACCGTCATTAAACC,-8569;Transfac,YY1_02,-,CCTGTGCCATCCAGGCTGGA,14512;Transfac,SP1_01,+,AGGGCGGGGC,-11204;Transfac,AP2_Q6_01,+,CGGCCCCCAGGCC,-4872;Transfac,TCF11_01,-,GTCATTCAGGACC,33780;Transfac,TAL1BETAE47_01,-,GGGGACAGATGGCAGT,25058;Transfac,PAX6_Q2,-,CTGACCTTGAACTC,20070;Transfac,SP3_Q3,-,AGCACTGTGGGAGG,2620;Transfac,SEF1_C,+,GGCCCCCAGGCCTGCGTTC,-4873;Transfac,NFKB_Q6_01,+,GACAAGGAAATTCCCG,-28415;Transfac,ZIC2_01,+,AGGGTGGTC,-27629;Transfac,AREB6_01,-,TACTCACCTGAGT,8388;Transfac,AP2_Q6,+,GGCCCCCAGGCC,-4873;Transfac,HNF4_DR1_Q3,+,TGACCTCTGTCCA,-10853;Transfac,NMYC_01,-,TCCCACGTGGAC,10872;Transfac,AP2_Q3,-,GCCCCCAGCCTTAGGC,22344;Transfac,MYOGENIN_Q6,+,GGCAGCTG,-5067;Transfac,CAP_01,-,TCAGCCCC,36304;Jaspar,c-ETS,+,CTTCCG,-3700;Jaspar,Staf,-,GGTTTCCCAGGGGGCAGTGC,14095;Jaspar,n-MYC,+,CACGTG,-6055;Jaspar,MEF2,+,CTATTTATAG,-31711;Transfac,PAX9_B,-,GTCACCCAGGGTGGAGTGCAGTGA,21178;Transfac,ER_Q6_02,+,GAGGTCACGGC,-21592;Jaspar,HLF,-,GGTTACACAATT,21743;Jaspar,GATA-3,+,AGATAG,-44;Transfac,MZF1_01,+,AGTGGGGA,-6218;Jaspar,Irf-1,-,GATAGTGAAACC,21815;Transfac,E2_Q6,+,GAACCAGAGACGGTGG,-19973;Transfac,SP1_Q6_01,+,AGGGCGGGGC,-11204;Transfac,CREB_Q2,+,CGTGACGTAGGG,-13347;Transfac,CREB_Q3,-,CGTCAG,778;Transfac,NFKB_C,-,AGGGATTTTCCT,20047;Transfac,CREB_Q4,+,CGTGACGTAGGG,-13347;Transfac,SREBP1_01,+,GATCACCTGAG,-4565;Jaspar,Ahr-ARNT,+,CGCGTG,-9987;Jaspar,SRF,-,GCCCATATATGA,37496;Transfac,DR4_Q2,-,CGGCCTCTCCAGACCCA,11714;Transfac,SP1_Q4_01,+,CAAGGGCGGGGCC,-11202;Transfac,TTF1_Q6,+,CCCCCAAGTGTG,-6842;Transfac,ATF_01,+,CCGTGACGTAGGGT,-13346;Transfac,HOXA3_01,+,CCTAATGGG,-35670;Transfac,POU6F1_01,+,GCATAATTTAT,-35917;Transfac,CREB_Q2_01,+,CTTGACGTCAGGAG,-38209;Transfac,GABP_B,-,CCGGGAAGAGCA,19270;Transfac,AHRARNT_01,+,GGAGGGTAGTGTGCCC,-27057;Transfac,DR1_Q3,-,TGGACAGAGGTCA,10865;Transfac,MZF1_02,-,TGGAGAGGGGCAA,19435;Transfac,P300_01,+,TCAAGGAGTGGGTG,-6194;Transfac,DELTAEF1_01,-,ACTCACCTGAG,8387;Jaspar,USF,+,CACGTGG,-10864;Transfac,CMYB_01,+,TACAAAGGCGGTTGGGAG,-11310;Transfac,PADS_C,-,TGTGGTCTC,4001;Jaspar,Chop-cEBP,-,GGGTGCAATGGC,21908;Transfac,DBP_Q6,+,AGCACAC,-6111;Transfac,NFKAPPAB65_01,-,GGGAATTTCC,28429;Transfac,AP2GAMMA_01,-,GCCTGGGGG,4883;Transfac,AHR_01,-,GCCCAGGCTGGAGTGCAA,18623;Transfac,TAL1BETAITF2_01,-,GGGGACAGATGGCAGT,25058;Transfac,PITX2_Q2,+,TGTAATCCCAA,-3780;Transfac,CAAT_C,+,GCCCAATAACCAGCTCCTCGCTGAT,-20432;Transfac,IK2_01,+,CTTTGGGAAGGC,-38457;Transfac,MIF1_01,+,TGGGTGCAGGGCCGCTGG,-7352;Transfac,IK1_01,+,GCTTGGGAAGGCC,-12009;Transfac,NFKB_Q6,+,ATGGGAATCTCCTC,-19067;Jaspar,Tal1beta-E47S,+,GGAACATCTGTT,-35130;Transfac,VJUN_01,+,GTGATGATGTCATTGC,-6140;Transfac,PAX5_02,+,GGAGTGCAATGTGAGCCGAGACCACACA,-3976;Transfac,PAX5_01,-,TCTTGGCTCACTGTAGTGTAGACTTCCC,18984;Transfac,BRACH_01,-,AGAATCACATGTAGGTGCCACAGT,16237;Transfac,CETS1P54_02,-,CCACCGGATGTGG,5441;Transfac,MAF_Q6_01,-,GGCTGAGTCAA,24942;Transfac,TAXCREB_02,+,GTGACCCACACCCTA,-28621;Jaspar,Pax-2,-,CGTCACGG,13353;Transfac,COMP1_01,+,TGTTATCAATGACAATGCGCGCCC,-28488;Transfac,CREL_01,+,GGGGAATTCC,-23710;Transfac,SP1_Q2_01,-,CCCCACCCCC,8399;Jaspar,c-MYB_1,+,GGCCGTTG,-11773;Transfac,SMAD3_Q6,-,TGTCTGTCT,16822;Transfac,E2A_Q6,+,CACCTGCC,-5136;Transfac,MYCMAX_03,+,CGAGAGTCACGTGAGGCTGA,-7182;Transfac,CHCH_01,+,CGGGGG,-6696;Transfac,E2A_Q2,-,GCACCTGCCTCAGT,7411;Transfac,BEL1_B,-,AAAGTGCTGAGATTACAGGCATAAGCCA,17103;Transfac,NRSE_B,+,CTCAGCACCTTGGCCAGCTCC,-24957;Transfac,MAZ_Q6,-,GGGGAGGG,16549;Transfac,ZIC1_01,+,TGGGGGGTC,-13048;Jaspar,RORalfa-1,+,TTCAAGGTCA,-20060;Transfac,NF1_Q6,+,TGCTGGCAGGCAGGCAGA,-12343;Transfac,MINI20_B,+,ACCTCCCACCATGGAGGAGGA,-5205;Transfac,VMW65_Q6,+,TCTCATTA,-25555;Transfac,NFKAPPAB50_01,+,GGGGAGTCCC,-5241;Jaspar,RREB-1,-,CCCCCCACCACCCCCTCCCA,30642;Jaspar,NRF-2,+,GCCGGAAGGG,-8755;Transfac,RFX1_01,+,TAGGCACCTAGTAACAG,-31718;Transfac,GNCF_01,+,CAGGAGTTCAAGGTCAGC,-20054;Jaspar,RXR-VDR,-,GGGTCACAGAGATCA,28627;Transfac,NRSF_01,+,CTCAGCACCTTGGCCAGCTCC,-24957;Transfac,USF_Q6_01,+,GCCCACGTGAGC,-6052;Transfac,P53_01,+,GGACATGGTGGCACATGTCT,-22689;Transfac,WHN_B,+,AGGGACGCCTT,-6534;Transfac,MINI19_B,-,GCAAGGAGCCACACAGCAGGA,13854;Transfac,GKLF_01,+,AAAGGAAGGAAGGG,-35999;Transfac,HNF4_01_B,+,GGGGGCAAAGGTAGG,-22339;Transfac,YY1_Q6,-,GCCATCTTG,18004;Jaspar,p53,-,CAGGACAAGTTCGAGCATCT,2978;Jaspar,p50,-,GGGGGTTCCCG,15798;Transfac,GATA2_02,-,GGAGATAAGA,33994;Transfac,GRE_C,+,GTCACACCCTGTCCTC,-6375;Transfac,FXR_Q3,+,CAAGGGCAGCAACC,-13934;Transfac,MYCMAX_B,-,GCCATGTGCC,30955;Transfac,NFE2_01,-,AGCTGAGGCAC,13976;Transfac,CACBINDINGPROTEIN_Q6,+,GGGGGTGGG,-8390;Transfac,MYOD_Q6,+,TGCACCTGTC,-6277;Transfac,STAF_02,+,ACATACCATCATGCCTGGCTA,-24189;Transfac,STAF_01,-,AGTTCCCGTAGTGCCTGACGGT,5931;Transfac,GATA3_02,-,GGAGATAAGA,33994;Jaspar,Myf,+,AGGCAGCAGGAG,-8418;Transfac,NRF2_01,+,GCCGGAAGGG,-8755;Transfac,GATA1_01,+,GGGGATGGGG,-6520;Transfac,ICSBP_Q6,+,GAAGAGAAACTG,-6711;Transfac,CETS1P54_01,-,ACCGGATGTG,5439;Transfac,TCF11MAFG_01,+,CTGTTGTGAGGCAGCAGTTGTG,-12574;Transfac,CACCCBINDINGFACTOR_Q6,+,AATCAGCTGGGTGTGG,-18121;Transfac,SMAD_Q6_01,+,TAGTCAGACAG,-34438;Transfac,GC_01,+,CAAGGGCGGGGCCT,-11202;Transfac,FOXM1_01,+,AGATGGAGT,-3171;Transfac,ARP1_01,-,TGAACTCCTGACCTCT,3835;Transfac,NGFIC_01,-,TCACGTGGGCGG,6061;Jaspar,Gklf,+,AAAGGGAAGG,-35981;Transfac,ERR1_Q2,+,AGTTCAAGGTCAGC,-20058;Jaspar,MZF_5-13,-,GGAGGGGGAG,8091</motif>
    </e>
</a>

Other examples could be found in the previous version.

3dsnp v1.0 tutorials

Access to new version

Overview

3DSNP is an integrated database for annotating the regulatory function of human noncoding SNPs by exploring their 3D interactions with genes and other SNPs mediated by chromatin loops. The models of cis-acting DNA elements regulating gene expression through three dimensional interactions mediated by chromatin loops have been established recently, and SNPs were reported frequently located in these elements. 3DSNP collects currently available Hi-C datasets from different studies. Two types of linkages were defined according to the spectrums of the loops: “Within Loop” or “Anchor-to-Anchor”. Allele frequencies were obtained for the SNPs in 1000 Genomes Phase 3 data, and pairwise LD was calculated for all pairs of SNPs in each continental population (EAS, AFR, AMR, ASN, EUR) within 200 kb. 3DSNP also integrated chromatin state segments, transcription factor binding sites and DNA accessibility from the Roadmap Epigenomics and ENCODE projects, DNA-binding motifs from TRANSFAC and JASPAR databases, eQTLs from the GTEx project and sequence conservation from UCSC Genome Browser. Visualization tools were developed for 3DSNP to display interacting SNPs, genes and elements along with important epigenetic marks, and a comprehensive scoring system was developed to assess the functionality of SNPs in different aspects. 3DSNP provides an integrated database and visualization tools for discovering the regulatory roles of noncoding SNPs mediated by 3D genome topology.

Data source

Sequential and genotyping data

3DSNP contains all 149,254,102 SNPs and small indels from NCBI dbSNP build 146. Among them, 84,801,880 SNPs were phased using 1000 Genomes Project Phase 3 (final phase) genotype data, the allele frequencies were obtained and pairwise LD was calculated for all pairs of SNPs in each continental population (AFR, AMR, ASN, EUR and SAS) within 200 kb. In addition, minor allele frequency (MAF) and linear closest gene were also extracted from dbSNP. Gene annotations were obtained from GRCh37/hg19 version of RefSeq genes from the UCSC Genome Browser.

3D genome topology

Chromatin loops identified by Hi-C technology in multiple human cell types were collected from published Hi-C studies (Rao et al. 2015, Sanborn et al. 2015, Taberlay et al. 2016) to map the intrachromosomal interactions between distant genomic regions. In total, 3DSNP collected 75,362 intrachromosome chromatin loops in twelve human cell types. It has been reported that Chromatin interactions are classified into two types based on the spans of chromatin loops. For a chromatin loop shorter than 200 kb, the corresponding interaction type is ‘Within loop’, where genomic elements located within can interact each other. For a chromatin loop longer than 200 kb, the type is ‘Anchor-to-anchor’, where only elements located at the two anchors are supposed to interact with each other.

Chromatin signature

A variety of chromatin signatures were used to annotate the regulatory functions of SNPs, including chromatin state (ChromHMM Core 15-state model), histone modifications (NarrowPeak), DNase I hypersensitivity sites and transcription factor binding sites from Roadmap Epigenomics and ENCODE projects. To annotate SNPs that alters TF binding motifs, TFM-Scan software was used to locate the putative TFBS across the genome using a set of position weight matrices (PWMs) collected from TRANSFAC and JASPAR databases.

Conservation

The conservation of SNPs was measured by two PhyloP scores obtained from the UCSC Genome Browser. The two PhyloP scores were calculated from multiple alignments of 46 vertebrate genomes and 33 mammal genomes respectively. The absolute values of the PhyloP scores represent -log(P-values) under a null hypothesis of neutral evolution, and sites predicted to be conserved are assigned positive scores, while sites predicted to be fast-evolving are assigned negative scores.

eQTL

Correlations between genotype and tissue-specific gene expression levels will help annotate the effects of genetic variants on gene regulation. We collected a total of 19,582,729 significant SNP-gene pairs (FDR <= 0.05) in 44 human tissues from the GTEx project version 6. To measure the significance of the eQTLs, nominal eQTL p-values and the effect size were obtained for each SNP-gene pair. Nominal eQTL p-values were generated using a two-tailed t test, testing the alternative hypothesis that the beta deviates from the null hypothesis of beta=0. The effect size of the eQTLs is defined as the slope (‘beta’) of the linear regression, and is computed as the effect of the alternative allele (ALT) relative to the reference allele (REF) in the human genome.

Visualization

Regional LD plot

A regional plot is used to display a set of SNPs associated to the query in LD, as shown in Figure 1. In this plot, the x-axis shows chromosome coordinates, y-axis shows values for r2, the size of the node represents its total score, and associated SNPs in five populations are shown in different colors. Associated SNPs in each of the five populations can be removed from or added to the plot by clicking the corresponding circle in the legend. Users can also restrict the range of total score for displaying by adjusting the upper and lower bound of size bar at the right side of the plot. A detailed page will be opened by clicking the node of the corresponding SNP. The regional LD plots can be displayed in the browser and can also be downloaded as high quality, publication-ready PNG files.

Figure 1. Regional LD plot of the associated SNPs of rs12740374. In the plot, x-axis shows chromosome coordinates, y-axis shows values for r2, the size of the node represents its total score, and associated SNPs in five populations (AFR: African, AMR: Ad Mixed American, ASN: East Asian, EUR: European and SAS: South Asian) are displayed in different colors, and rs12740374 is displayed in black.

CIRCOS plot

Circos is a software package for visualizing genomic data and information. A customizable Circos plotting system was developed to display the 3D chromatin topology and a set of important chromatin marks surrounding the query SNP. Circos tracks from outside to inside represent: Chromatin states, RefSeq genes, DHS and histone modifications, TFBS, associated SNPs and chromatin loops, as shown in Figure 2. The color scheme of 15 chromatin states in ChromHMM model is shown in Table 1, and the color scheme of chromatin loops in twelve cell types is shown in Table 2 below.

Figure 2. Track annotation for Circos plot. In the plot, from outer to inner, the circle represents chromatin states, annotated genes, histone modification set (red), transcription factor set (blue), current SNP and associated SNPs, and 3D chromatin interactions, respectively.

Table 1. Color scheme of 15 chromatin states in ChromHMM core model

Color State NO. Mnemonic Description
1 TssA Active TSS
2 TssAFlnk Flanking Active TSS
3 TxFlnk Transcr. at gene 5′ and 3′
4 Tx Strong transcription
5 TxWk Weak transcription
6 EnhG Genic enhancers
7 Enh Enhancers
8 ZNF/Rpts ZNF genes & repeats
9 Het Heterochromatin
10 TssBiv Bivalent/Poised TSS
11 BivFlnk Flanking Bivalent TSS/Enh
12 EnhBiv Bivalent Enhancer
13 ReprPC Repressed PolyComb
14 ReprPCWk Weak Repressed PolyComb
15 Quies Quiescent/Low

Table 2. Color scheme of twelve cell types for the chromatin loop track in Circos plot

Color Cell type Description Tissue
GM12878 Lymphoblastoid Cells Blood
K562 K562 leukemia Cells Blood
H1-hESC Embryonic stem cells ESC
IMR90 Fetal lung fibroblasts Lung
HeLa-S3 Cervical carcinoma cells Cervix
HUVEC Umbilical vein endothelial cells Blood vessel
NHEK Epidermal keratinocytes Skin
HMEC Mammary epithelial cells Breast
KBM-7 Chronic myelogenous leukemia (CML) cells Blood
LNCaP Prostate adenocarcinoma Prostate
PC3 Prostate cancer cells Prostate
PrEC Prostate epithelial cell line Prostate

You can customize the Circos tracks in the ‘Options’ block to display DHS, histone modifications of 127 cell lines in Roadmap Epigenomics Project and the binding sites of CTCF and other 166 transcription factors of 91 cell lines in ENCODE Project. The DHS and histone modification track is represent by red tiles and TFBS is represent by blue tiles.

UCSC Genome Browser plot

Chromatin loops and chromatin signatures surrounding the query SNP can be also displayed in a figure generated by the UCSC Genome Browser with custom tracks. The location of the query SNP is marked by red bar, and pairs of anchor sites of chromatin loops are displayed with colored bars for different cell types, as shown in Figure 3.

Figure 3. Track annotation for UCSC Genome Browser plot. In the plot, from top to bottom, the tracks are: genomic coordinates, chromatin interactions, current SNP, UCSC genes, RefSeq genes, histone modifications, CTCF binding sites, DNase Clusters and mammal conservation.

Scoring system

In 3DSNP, each SNP is scored based on its annotated records on six functional categories: 3D interacting genes, Enhancer state, Promoter state, Transcription factor binding sites, sequence motifs altered and conservation score. Different from the scoring scheme of RegulomeBD, which classifies SNPs into classes based on the combinatorial presence/absence status of functional categories, 3DSNP adopts a quantitative scoring system to evaluate the functional significance of a SNP in different categories. For the first five categories, we used the number of annotated records (hits) to assign score to a SNP in the corresponding category. Specifically, we fitted the numbers of hits of all SNPs in each chromosome to a Poisson distribution model. Considering a SNP has k hits in one functional category F, λ is the fitted parameter of the corresponding Poisson model, then the score of the SNP in this category is defined as follows:

For the conservation category (F’), we found the PhyloP scores of all SNPs in a chromosome follow a Gaussian distribution. Considering a SNP has a conservation score of c, μ and σ are the fitted parameters of the corresponding Gaussian model, then the score of the SNP in the conservation category is defined as follows:

The total score of a SNP is the sum of scores of the six functional categories.

Data format

Query format

SNP ID(s), a single genomic region or an official gene symbol can be used as query to search the database by the search bar on the main page. Multiple SNP ids should be delimited by commas or spaces, a genomic region should be written as chrN:start-end, and a gene should be written as gene:SYMBOL. Only one query type is allowed for each searching and mixed query format is not supported. The maximum size of the query in the search bar is 100 SNP IDs.

Upload file format

A text file containing a list of SNP IDs or genomic regions can be uploaded to the server for batch analysis by clicking the browser icon at the right side of the search bar. The maximum size of SNPs in the file for uploading is 2000 SNPs, the maximum number of regions in the file is 10 regions, and any data exceeding the limitations will be ignored.

Export format

All resulting forms can be exported in three formats: copy to clipboard, MS excel or PDF and figures can be exported in PNG format.

Usage example

Type the full id “rs12740374” in the search box then the SNP rs12740374 will appear automatically in the main table below. You can see the position and reference/alternative sequence in the genome. The total functionality score of rs12740374 is 135.06, and its 3D interacting genes are PSRC1 and other 7 genes. Click the “+” icon on the left side of the ID to see a table containing a set of associated SNPs in the same LD block with rs12740374. A regional LD plot on the right side of the table shows their associations as below:

Click the “AMR” circle in the legend to remove the associated SNPs in AMR population, as shown below:

You can also restrict the range of total score for displaying by adjusting the upper and lower bound of size bar at the right side of the plot. For example, you can make it only show SNPs whose total score > 50 as below:

You can alway click on the node to see the details of the specific SNP in a new detailed page.

Then, click the SNP ID “rs12740374” will open a new page containing all detailed information about this SNP, which contains:

  • A text paragraph summarizing it scores of the non-empty functional categories and a radar chart showing the distribution of the six scores, as below:

  • A Circos plot showing the chromatin loops and other 2D signatures:

You can switch the cell type of the CIRCOS plot to another cell type in the ‘Options’ block. For example, you can change the cell type to HepG2 of the liver tissue, as shown below.

  • A figure generated by UCSC Genome Browser with custom data:

  • And a series of tables containing the details of 3D interacting genes, 3D interacting SNP, Chromatin state, TFBS and Conservation.