Category Archives: uncategorized

3dsnp v2.0 changelog

2023.7.26

Update the source of database (both the url of 3dsnp.omic.tech and www.omic.tech/3dsnpv2) to local server through frp proxy

2023.5.16

Fix errors in display LD information in the IGV browser;
Fix errors in display summary information in the detail page.

2023.5.15

Update the format of Circos-plot to SVG;
Update the source of database (only accessed by 3dsnp.omic.tech) to local server through frp proxy;
Fix errors in display allele and score information for SNPs without HiC interactions.

2022.4.11

Fix link to 3dsnp v1.0 (cbportal is outdated).

2022.3.31

Fix export errors for excel.
Add an secure alternative link 3dsnp.omic.tech

2021.12.19

Fix export errors for igvtools.

2021.12.18

Add allele frequency information of major populations in the export data for the main table.
3dsnp v2.0 have been published on NAR. So citing information in the pagefoot is updated.
Fix excel export button for the main table.

2021.10.6

Add a pie chart for the nearest scATAC peaks
Add Zoom functions, point labels, and borders to the Umap of the nearest scATAC peaks

2021.10.4

Add all tables for SVs from ClinVar.
Add annotations about ClinVar in the 3dsnp v2 tutorials.
Update documentation links to the 3dsnp v2 tutorials and API.
Update scores for SVs from HGSVC.

2021.10.3

Add SV-data from ClinVar.
Add Pathogenicity-data from ClinVar for dbSNP v155.
Add Pathogenicity-track in the IGVtools.

2021.09.24

Add LD-data for SVs.
Add LD-data for SNPs in AFR population.
Add SNP affected loops for each tissue.
Fix circos plot errors: no LD snps.

2021.09.19

Update documentation links.
Add 3dnps v2.0 documentation.
Add wordpress documentation for omic.tech.

2021.09.18

Add IGV tracks for Fst and xpNSL per each major population in IGSR 1000 genomes.
Add pagings for each table in the details page.

2021.09.16

Add LD-data check when user click the LD-detail button.
Fix loading picture error.
Fix picture saving error of the scATAC-plot.

2021.09.15

Add IGVtools.
Add HGSVC2 structural variations and Hi-C structure predictions.
Add scATAC data.
Add cCRE scores from scRNA-seq data.
Add statisticss of population genetics.
Update snp collections to dbSNP v154.
Fix foot location error.

3dsnp v2.0 API

3 Replies

3dsnp v2 for developers

3DSNP v2 extends the API functions of the previous version. The domain was changed and more importantly, SV data and some new tables were included. Now data can be accessed by three means: SNP ids, SV ids or Chromatin position.

We recommend using positions to search for variants which will display both SNPs and SVs in the target region.

We marked new features with *

Note: The original API are always open.

Overview

URL

https://omic.tech/3dsnpv2/api.do

Format supported

JSON/XML

HTTP request method

GET/POST

Login required

Data access restrictions

Frequency limit: No

Request

Request parameters

	Required	Type	Information
id^*/position	true	string	Represents the SNP/SV ID or genomic position, at least one of them is required, multiple SNP IDs or positions should separated by comma ‘,’. Dash symbol ‘-‘ The format of parameter ‘position’ should be ‘1000000-1000100’. SV IDs could be found in HGSVC v2.
chrom	false	string	Represents the chromosome of queried position and is required when parameter ‘position’ is used. When there are more than one positions, the corresponding chromosomes should also be separated by ‘,’.
type	true	string	Data type for searching, multiple types should be separated by comma ‘,’. Available types are listed below.
format	true	string	Represents data types returns. Json and XML formats are supported.

Request data type

DataType	Description
basic	Basic information of SNP, including sequential facts and phenotype from 1000G project.
chromhmm	Chromatin state information generated by the core 15-state ChromHMM models trained across a variety of cell types.
motif	Transcription factor binding motifs altered by SNP.
tfbs	Transcription factor binding sites in a variety of cell types.
eqtl	Expression quantitative trait loci (eQTL).
3dgene	Genes that interact the query SNP through chromatin loops.
3dsnp	SNPs that interact the query SNP through chromatin loops. Not available for the query of position.
phylop	PhyloP scores of genomic region surrounding the query SNP.
ccre^*	The status of open chromatin for over 750,000 candidate cis-regulatory elements (cCREs) in 54 distinct cell types.
genetics^*	Integrated haplotype scores (iHS) and Fixation index (Fst) for five continental population obtained from 1000 Genomes Phase 3 (final phase)
clinvar^*	ClinVar aggregates information about genomic variation and its relationship to human health.

Response

Response parameters

	Type	DataType	Description
id	string	basic	SNP ID
chr	string	basic	Chromosome name
position	string	basic	Location of the query
MAF	string	basic	Minor allele frequency
Ref	string	basic	Reference Allele
Alt	string	basic	Alternative Allele
EAS	string	basic	Allele frequency in the EAS populations
AMR	string	basic	Allele frequency in the AMR populations
AFR	string	basic	Allele frequency in the AFR populations
EUR	string	basic	Allele frequency in the EUR populations
SAS	string	basic	Allele frequency in the SAS populations
linearClosestGene	string	basic	Linear cloest genes
data_gene	JsonArray	basic	listed below in JsonArray Parameters
chromhmm	string	chromhmm	Chromatin state from ChromHMM core 15-state model
data_chromhmm	JsonArray	chromhmm	listed below in JsonArray Parameters
motif	string	motif	Sequence motif altered by the query SNP
data_motif	JsonArray	motif	listed below in JsonArray Parameters
tfbs	string	tfbs	Transcription factor binding sites the query locates
data_tfbs	JsonArray	tfbs	listed below in JsonArray Parameters
eqtl	string	eqtl	Expression quantitative trait loci
data_eqtl	JsonArray	eqtl	listed below in JsonArray Parameters
data_loop_gene	JsonArray	3dgene	listed below in JsonArray Parameters
data_loop_snp	JsonArray	3dsnp	listed below in JsonArray Parameters
physcores	string	physcores	PhyloP scores of the query SNP and its +/-10 bp adjacent regions
ccre.position^*	string	ccre	The corresponding peak position of cCREs
mapping^*	string	ccre	Mapping rate of cCREs
Fol/Acn/Skm1/…/Swn_2^*	string	ccre	cCREs in 54 distinct cell types
Fst_EUR^*	string	genetics	Fixation index in EUR
Fst_SAS^*	string	genetics	Fixation index in SAS
Fst_EAS^*	string	genetics	Fixation index in EAS
Fst_AMR^*	string	genetics	Fixation index in AMR
Fst_AFR^*	string	genetics	Fixation index in AFR
iHS_EUR^*	string	genetics	Integrate Haplotype score in EUR
iHS_SAS^*	string	genetics	Integrate Haplotype score in SAS
iHS_EAS^*	string	genetics	Integrate Haplotype score in EAS
iHS_AMR^*	string	genetics	Integrate Haplotype score in AMR
iHS_AFR^*	string	genetics	Integrate Haplotype score in AFR
xpnsl_EUR^*	string	genetics	cross-population NSL in EUR
xpnsl_SAS^*	string	genetics	cross-population NSL in SAS
xpnsl_EAS^*	string	genetics	cross-population NSL in EAS
xpnsl_AMR^*	string	genetics	cross-population NSL in AMR
xpnsl_AFR^*	string	genetics	cross-population NSL in AFR
ClinVarID^*	string	clinvar	the ClinVar Allele ID
CLNDN^*	string	clinvar	ClinVar’s preferred disease name for the concept specified by disease identifiers in CLNDISDB
CLNDISDB^*	string	clinvar	Tag-value pairs of disease database name and identifier
CLNREVSTAT^*	string	clinvar	ClinVar review status for the Variation ID
CLNSIG^*	string	clinvar	Clinical significance for this single variant
CLNSIGCONF^*	string	clinvar	Conflicting clinical significance for this single variant
CLNVC^*	string	clinvar	Variant type
CLNVCSO^*	string	clinvar	Sequence Ontology id for variant type
CLNVI^*	string	clinvar	the variant’s clinical sources reported as tag-value pairs of database and variant identifier
GENEINFO^*	string	clinvar	Gene(s) for the variant reported as gene symbol:gene id
MC^*	string	clinvar	comma separated list of molecular consequence in the form of Sequence Ontology ID\|molecular_consequence
ORIGIN^*	string	clinvar	Allele origin

JsonArray Parameters

	Type	JsonArray	Description
geneID	string	data_gene	RefSeq Gene ID
geneName	string	data_gene	Official gene symbol
geneRelativePosition	string	data_gene	Relative position of the closest gene to the query
geneDescription	string	data_gene	Gene description
chromhmmCell	string	data_chromhmm	Cell type of the corresponding chromatin state
chromhmmName	string	data_chromhmm	Short name of chromatin state
chromhmmFullName	string	data_chromhmm	Full name of chromatin state
chromhmmCellDescription	string	data_chromhmm	Cell type description
chromhmmTissue	string	data_chromhmm	Tissue of the cell type
motif	string	data_motif	Motif ID in TRANSFAC or JASPAR
motifStrand	string	data_motif	Strand of the motif
motifSource	string	data_motif	Database source of the motif
motifMatchedSequence	string	data_motif	Matched sequence for the motif
motifMatchedSequencePos	string	data_motif	Relative position of the query to the sequence
motifRef	string	data_motif	Reference allele
motifAlt	string	data_motif	Alternative allel
tfbsCell	string	data_tfbs	Cell type of the corresponding TFBS
tfbsFactor	string	data_tfbs	Name of the transcription factor
tfbsCellTissue	string	data_tfbs	Tissue of the cell type
tfbsDNAAccessibility	string	data_tfbs	DNA accessibility of the TFBS
tfbsCellDescription	string	data_tfbs	Description for the cell type
eqtlGene	string	data_eqtl	Related gene of the eQTL
eqtlPValue	string	data_eqtl	P-value of the eQTL
eqtlTissue	string	data_eqtl	Tissue in which the eQTL identified
eqtlEffect	string	data_eqtl	Effect size of the eQTL
loopGene	string	data_loop_gene	Genes interacting the query SNP through chromatin loops
loopGeneID	string	data_loop_gene	RefSeq Gene ID
loopGeneDescription	string	data_loop_gene	Gene description
loopCell	string	data_loop_gene/data_loop_snp	Cell type in which the chromatin loop was identified
loopCellTissue	string	data_loop_gene/data_loop_snp	Tissue of the cell type
loopCellDescription	string	data_loop_gene/data_loop_snp	Cell type description
loopStart	string	data_loop_gene/data_loop_snp	Start genomic position of the chromatin loop
loopEnd	string	data_loop_gene/data_loop_snp	End genomic position of the chromatin loop
loopType	string	data_loop_gene/data_loop_snp	Type of the chromatin loop: “Within Loop” or “Anchor-to-Anchor”
loopSNP	string	data_loop_snp	SNPs interacting with the query and in the same LD block through chromatin loops
loopLD	string	data_loop_snp	r^2 in LD
loopPopulation	string	data_loop_snp	Continental population (AFR, AMR, ASN, EUR and SAS)

Request with position

URL example1 : single position and single data type in json format

Request URL :

https://www.omic.tech/3dsnpv2/api.do?position=1000000-1100000&chrom=chr11&format=json&type=basic

Response format :

[
    {
        "id":"chr11-1009478-INS-50",
        "position":"1009477",
        "chrom":"chr11",
        "AFR":"0",
        "AMR":"0",
        "Alt":"AACACGCAGCCCATGACCCCGCGCCAGGGTCTGGAGGGACGGCCCCGGGGG",
        "EAS":"0",
        "EUR":"0",
        "Ref":"A",
        "SAS":"0",
        "MAF":"INS,0.000000",
        "linearClosestGene":""
    },
    {
        "id":"rs544411125",
        "position":"1000017",
        "chrom":"chr11",
        "AFR":"0",
        "AMR":"0",
        "Alt":"A",
        "EAS":"0",
        "EUR":"0",
        "Ref":"G",
        "SAS":"0.001",
        "MAF":"A,0.000199681",
        "linearClosestGene":"AP2A2,161,intron-variant",
        "data_gene":[
            {
            "geneID":"161",
            "geneName":"AP2A2",
            "geneRelativePosition":"intron-variant",
            "geneDescription":"adaptor related protein complex 2 alpha 2 subunit"
            }]
        }
]

URL example2 : single position and mutilple data types in xml format

Request URL :

https://www.omic.tech/3dsnpv2/api.do?position=100000-1000100&chrom=chr1&format=xml&type=eqtl,motif

Response format :

<a>
    <e class="object">
        <chrom type="string">chr1</chrom>
        <eqtl type="string"/>
        <id type="string">chr1-121118-INS-113</id>
        <motif type="string"/>
        <position type="string">121117</position>
    </e>
    <e class="object">
        <chrom type="string">chr1</chrom>
        <id type="string">chr1-126241-DEL-38630</id>
        <position type="string">126241</position>
        <data_motif class="array">
            <e class="object">
                <motif type="string">CEBPB_02</motif>
                <motifAlt type="string">DEL</motifAlt>
                <motifMatchedSequence type="string">TGATTGCACCACTG</motifMatchedSequence>
                <motifMatchedSequencePos type="string">16992</motifMatchedSequencePos>
                <motifRef type="string">.</motifRef>
                <motifSource type="string">Transfac</motifSource>
                <motifStrand type="string">-</motifStrand>
            </e>
            <e class="object">
                <motif type="string">ETS1_B</motif>
                <motifAlt type="string">DEL</motifAlt>
                <motifMatchedSequence type="string">GCAGGAAGTCAGGGA</motifMatchedSequence>
                <motifMatchedSequencePos type="string">-27799</motifMatchedSequencePos>
                <motifRef type="string">.</motifRef>
                <motifSource type="string">Transfac</motifSource>
                <motifStrand type="string">+</motifStrand>
            </e>
        </data_motif>
        <eqtl type="string"/>
        <motif type="string">Transfac,CEBPB_02,-,TGATTGCACCACTG,16992;Transfac,ETS1_B,+,GCAGGAAGTCAGGGA,-27799;Transfac,CEBPB_01,+,GGGTGAGGCAAGGG,-10490;Transfac,EBF_Q6,-,TTCCCTTGAGA,32414;Transfac,KROX_Q6,-,CTCGCCCCCTCCTC,4826;Transfac,CEBP_Q2_01,+,GTTGCCCAAGCT,-24111;Transfac,MTF1_Q4,-,ACTGCGCCCAGCCT,37618;Jaspar,SPI-1,-,CGGAAG,3705;Transfac,MYOD_Q6_01,-,TTGAAGCAGGTGATGGAG,24991;Transfac,TEL2_Q6,-,CCACTTCCTG,32686;Transfac,CRX_Q4,+,CCCGTAATCCCAG,-27209;Transfac,R_01,-,TGGGCCACCGGATGTGGTCCT,5445;Transfac,HNF4_01,-,ACGCGGACAGAGGTCAGCG,10966;Transfac,PAX4_01,+,GGAGGTGACCCGTGGGCAGCC,-6023;Transfac,PAX4_02,+,GAATAATTGCC,-1320;Transfac,PAX4_03,-,AGCCCCCACCCC,8402;Transfac,PAX4_04,+,AAAAATTAGCCGGGTGTGGTGGCACACACC,-3883;Transfac,IK3_01,+,TACTGGGAATGTC,-16898;Jaspar,SAP-1,-,ACCGGATGT,5439;Transfac,E2F1_Q4,+,CTTGGCGG,-33552;Transfac,HNF1_Q6,-,AGGTTAATAATTATCTCT,35228;Transfac,E2F1_Q3,+,CGTGGCGC,-28392;Transfac,AR_02,-,CGCCCACGATCAACGTGTTCTGTTCTG,8539;Transfac,ETF_Q6,+,GCGGCGG,-11412;Transfac,EN1_01,-,GTAGTGG,3310;Transfac,SREBP_Q3,-,CCCATCACCCCA,17405;Transfac,AP4_01,-,AGGATCACCTGAGGTCAG,3413;Transfac,HAND1E47_01,+,GGTGGTGTCTGGCACT,-5938;Transfac,E2F1_Q3_01,-,TGGGCGGCAGCAGGGC,6056;Transfac,STAT3_01,-,GGTGATTTCCAGGATGTGAGC,17822;Transfac,MYB_Q3,+,GGTGCCAGTTG,-7224;Transfac,HMEF2_Q6,-,GGCTAAAACTACCCCT,35670;Transfac,EGR2_01,-,TCACGTGGGCGG,6061;Transfac,E2F_Q2,-,GGCGCG,6794;Transfac,PAX8_01,-,CGGTGTCGAGTGAGG,13827;Transfac,RP58_01,-,AACACATCTGGA,37199;Transfac,CEBPGAMMA_Q6,-,CCCACTTCAGAGA,19517;Transfac,HEN1_01,+,TCGGTGCTCAGCTGAGTCTGCA,-2833;Transfac,E2_Q6_01,-,CCCACCGTCTCTGGTT,19989;Transfac,HEN1_02,-,CCTGGGCCCAGCTCCGTCCTCT,9184;Transfac,USF2_Q6,+,CACGCG,-11114;Transfac,SP1_Q6,+,CAAGGGCGGGGCC,-11202;Transfac,SMAD4_Q6,+,AGGATGCAGCCAGCT,-33630;Transfac,CIZ_01,+,GAAAAAGCC,-12404;Transfac,TAL1ALPHAE47_01,-,TTGGCCAGATGGGGTC,14330;Jaspar,deltaEF1,+,CACCTG,-3326;Transfac,POLY_C,-,GAGAAAACCCTCCTGCTG,8438;Jaspar,ARNT,+,CACGTG,-6055;Transfac,MEF3_B,-,TGCCCAGGTTTCA,28126;Transfac,GATA2_01,+,GGGGATGGGG,-6520;Transfac,GR_01,+,GCAGCATGGGCAGGATGTTCTGCACAC,-7429;Transfac,CEBP_C,+,AGTGTGAGGCAAGACCTG,-12861;Jaspar,NF-kappaB,-,GGGAATTTCC,28429;Transfac,EGR3_01,+,CAGCGTGGGAGG,-10034;Transfac,TANTIGEN_B,+,GGGAGGCCGAGGCAGGCAG,-3797;Transfac,SRF_C,-,GCCTTTTTTGGCCCA,12574;Transfac,E4F1_Q6,-,CCTACGTCAC,13357;Jaspar,PPARgamma,-,AGAGGTCAGCGTGACCCCCT,9983;Transfac,HSF_Q6,+,TCCCAGGAGTTTC,-20707;Transfac,EGR1_01,-,TCACGTGGGCGG,6061;Transfac,ETS_Q4,-,TTCCACTTCCTG,32688;Transfac,USF_C,+,CCACGTGA,-6054;Transfac,E2_01,+,GAACCAGAGACGGTGG,-19973;Transfac,AHRHIF_Q6,-,CGCGTGCGG,11119;Transfac,RFX1_02,+,CTGTAGCCTAAGCAACAG,-22798;Transfac,BARBIE_01,-,TTCAAAAGGTGAGGG,28660;Transfac,FXR_IR1_Q6,+,GGATGAATGTCCC,-28051;Transfac,HNF3ALPHA_Q6,-,TGTTTGTTTTG,4737;Transfac,STRA13_01,-,GCCTCACGTGACTC,7198;Transfac,AHR_Q5,+,GTGGCGTGTGC,-21067;Transfac,ZF5_01,-,GGGCGCGG,6795;Jaspar,p65,-,GGGAATTTCC,28429;Transfac,FREAC3_01,-,GGCATGTAAATAAAGA,23069;Transfac,ATATA_B,+,GTATATAAGC,-31222;Transfac,ACAAT_B,+,GATTGGTGG,-26027;Transfac,AP4_Q5,+,CTCAGCTGGC,-13970;Transfac,AP4_Q6,+,CTCAGCTGGC,-13970;Jaspar,Yin-Yang,-,GCCATC,3377;Transfac,ZTA_Q2,-,TCACAGTGACTCA,14023;Transfac,E12_Q6,+,GGCAGGTGCCA,-7403;Transfac,ELK1_02,+,GCTGCCGGAAGGGA,-8752;Transfac,MYC_Q2,+,CACGTGG,-10864;Transfac,LBP1_Q6,-,CAGCTGC,2984;Transfac,TFIII_Q6,+,AGAGGGAGG,-19953;Transfac,LMO2COM_02,+,CAGATAGGG,-43;Transfac,LMO2COM_01,-,CCCCAGGTGTTG,7655;Transfac,SMAD_Q6,-,AGACTCCCC,9856;Transfac,MAF_Q6,+,TGAGGGCAAGTTGGCA,-34778;Jaspar,cEBP,-,TGGCGCAACCTT,38390;Jaspar,c-REL,+,GGGGAATTCC,-23710;Transfac,MUSCLE_INI_B,-,TCCCCCCACCACCCCCTCCCA,30643;Transfac,AP4_Q6_01,+,GCCAGCTGT,-36895;Transfac,DR3_Q4,+,CATCCCCTTCCTGACCCCTCC,-4972;Transfac,STAT5A_04,-,CACTTCCG,16011;Transfac,ATF4_Q2,-,GCTGACGCCACG,4915;Transfac,SPZ1_01,-,GGTGGAGGGATGGGG,16533;Jaspar,TCF11-MafG,+,CATGAC,-3852;Transfac,PAX2_02,+,CACAAACCC,-23836;Transfac,LUN1_01,+,TCCCAGCTACTTGGGAG,-3918;Transfac,PAX2_01,-,CCCTGTCACTCAGGATGGA,20254;Transfac,MAZR_01,-,TGGGGAGGGGCAC,27106;Transfac,MYOGNF1_01,+,AATCCTTTCAGTTTGGGACGGAGTAAGGC,-7790;Transfac,HSF2_01,-,GGAAGCTTCG,13805;Transfac,T3R_01,+,CTGGGAGGTCACGGCT,-21588;Transfac,ZIC3_01,+,TGGGGGGTC,-13048;Transfac,ISRE_01,+,CAGTTTCTCTTCCTG,-29546;Jaspar,Bsap,+,TGGTCAACGCAGCAGAGCGG,-6478;Transfac,CDXA_02,+,ATTACTG,-16382;Transfac,CREB_Q4_01,+,CCGTGACGTAG,-13346;Transfac,ARNT_02,+,CGAGAGTCACGTGAGGCTGA,-7182;Transfac,HOGNESS_B,-,GTGGTGGCTCACGCCTGTAATCCCAGCACT,8124;Transfac,ARNT_01,-,CAGCTCACGTGGGCGG,6065;Transfac,HIF1_Q3,-,GCCCGCGTGCGGCC,11122;Transfac,LFA1_Q6,-,GGGGTCAG,7534;Transfac,GR_Q6,-,GGGCCTCGCTCTGTTGTCC,27466;Transfac,TEF1_Q6,+,GGAATG,-1360;Transfac,BACH1_01,-,GCTATGAGTCACCAC,1540;Transfac,TBP_Q6,+,TTTATAC,-8715;Transfac,E47_02,-,AATTACAGGTGTACGC,21546;Transfac,CP2_02,+,GCTGGGCTGAGCCAC,-6680;Transfac,E47_01,-,AGGGCAGGTGGCTCC,5145;Transfac,MEIS1_01,+,GAGTGACAGGGC,-20244;Transfac,PR_01,-,TGTTGAGGAGAATGCTGTTCTCATTGT,36718;Jaspar,MZF_1-4,+,TGGGGA,-2671;Transfac,OCT1_07,+,TTTATGGTAATT,-31767;Jaspar,Androgen,-,TTTGGCACAGCATGTACCTGTC,34465;Transfac,ZID_01,+,CAGCTCCATCACC,-24971;Jaspar,Pax6,+,TTCACGCTTTAGTT,-2658;Transfac,AREB6_02,+,ACACACCTGTAG,-3906;Transfac,AREB6_03,-,GTGCACCTGTAG,1658;Transfac,PAX_Q6,+,CTGGAAATCAC,-14033;Transfac,RREB1_01,+,CCCCAAAAAACCCT,-1014;Transfac,MEF2_01,-,GGCTAAAACTACCCCT,35670;Transfac,LPOLYA_B,+,CAATAAAG,-22981;Transfac,MEF2_03,-,TAGGTGCCTATAAATAGCATAG,31727;Transfac,ER_Q6,-,AGAGGTCAGCGTGACCCCC,9983;Transfac,MYB_Q6,-,CCCAACTGGC,7236;Transfac,PPARG_02,+,TTCCAGGTGAAGGTGGCCCACTT,-5598;Transfac,HFH4_01,-,TTATGTTTGTTTA,382;Transfac,HEB_Q6,-,GCCAGCTG,13979;Transfac,PPAR_DR1_Q2,+,TGACCTCTGTCCA,-10853;Transfac,OLF1_01,+,CAAGGTTCCCTAGAGAAATGGC,-35076;Transfac,MYOD_01,+,ACACAGGTGGTG,-5933;Transfac,CREBP1_Q2,-,GCTGACGCCACG,4915;Transfac,NERF_Q2,+,TTGCAGGAAGTCAGGGAC,-27797;Transfac,IRF_Q6,+,GTCAGTTTCTCTTCC,-29544;Transfac,XPF1_Q6,+,TCTGGGCAAC,-32109;Transfac,GEN_INI3_B,-,CCTCATTC,17236;Transfac,STAT6_02,+,GCCTTCCT,-7817;Transfac,AR_01,+,GGTACATGCTGTGCC,-34448;Transfac,NFKAPPAB_01,-,GGGAATTTCC,28429;Jaspar,HNF-1,-,GGTTAATAATTATC,35227;Transfac,EGR_Q6,+,GTGGGGGCAAG,-11163;Transfac,LYF1_01,+,TTTGGGAGG,-3584;Transfac,PPARA_01,-,CTGCCCCAGGCCAAATTTCT,12377;Transfac,PPARA_02,-,TGGGGTCAGGCAGGGCTGG,7535;Transfac,COUP_DR1_Q6,+,GGACCTTTGGCTT,-38525;Transfac,GATA1_02,-,TTCTAGATAGGGGC,21667;Transfac,VDR_Q3,-,GAGGGAATGGGGAGA,8449;Transfac,T3R_Q6,+,CCTGTCCTC,-6382;Transfac,VDR_Q6,+,CTGCCTGACCCC,-7523;Transfac,LXR_Q3,-,TGGGGTGACCCTGGTGCG,5511;Jaspar,FREAC-4,+,GTAAACAT,-20345;Transfac,LXR_DR4_Q3,+,TGACCGTCATTAAACC,-8569;Transfac,YY1_02,-,CCTGTGCCATCCAGGCTGGA,14512;Transfac,SP1_01,+,AGGGCGGGGC,-11204;Transfac,AP2_Q6_01,+,CGGCCCCCAGGCC,-4872;Transfac,TCF11_01,-,GTCATTCAGGACC,33780;Transfac,TAL1BETAE47_01,-,GGGGACAGATGGCAGT,25058;Transfac,PAX6_Q2,-,CTGACCTTGAACTC,20070;Transfac,SP3_Q3,-,AGCACTGTGGGAGG,2620;Transfac,SEF1_C,+,GGCCCCCAGGCCTGCGTTC,-4873;Transfac,NFKB_Q6_01,+,GACAAGGAAATTCCCG,-28415;Transfac,ZIC2_01,+,AGGGTGGTC,-27629;Transfac,AREB6_01,-,TACTCACCTGAGT,8388;Transfac,AP2_Q6,+,GGCCCCCAGGCC,-4873;Transfac,HNF4_DR1_Q3,+,TGACCTCTGTCCA,-10853;Transfac,NMYC_01,-,TCCCACGTGGAC,10872;Transfac,AP2_Q3,-,GCCCCCAGCCTTAGGC,22344;Transfac,MYOGENIN_Q6,+,GGCAGCTG,-5067;Transfac,CAP_01,-,TCAGCCCC,36304;Jaspar,c-ETS,+,CTTCCG,-3700;Jaspar,Staf,-,GGTTTCCCAGGGGGCAGTGC,14095;Jaspar,n-MYC,+,CACGTG,-6055;Jaspar,MEF2,+,CTATTTATAG,-31711;Transfac,PAX9_B,-,GTCACCCAGGGTGGAGTGCAGTGA,21178;Transfac,ER_Q6_02,+,GAGGTCACGGC,-21592;Jaspar,HLF,-,GGTTACACAATT,21743;Jaspar,GATA-3,+,AGATAG,-44;Transfac,MZF1_01,+,AGTGGGGA,-6218;Jaspar,Irf-1,-,GATAGTGAAACC,21815;Transfac,E2_Q6,+,GAACCAGAGACGGTGG,-19973;Transfac,SP1_Q6_01,+,AGGGCGGGGC,-11204;Transfac,CREB_Q2,+,CGTGACGTAGGG,-13347;Transfac,CREB_Q3,-,CGTCAG,778;Transfac,NFKB_C,-,AGGGATTTTCCT,20047;Transfac,CREB_Q4,+,CGTGACGTAGGG,-13347;Transfac,SREBP1_01,+,GATCACCTGAG,-4565;Jaspar,Ahr-ARNT,+,CGCGTG,-9987;Jaspar,SRF,-,GCCCATATATGA,37496;Transfac,DR4_Q2,-,CGGCCTCTCCAGACCCA,11714;Transfac,SP1_Q4_01,+,CAAGGGCGGGGCC,-11202;Transfac,TTF1_Q6,+,CCCCCAAGTGTG,-6842;Transfac,ATF_01,+,CCGTGACGTAGGGT,-13346;Transfac,HOXA3_01,+,CCTAATGGG,-35670;Transfac,POU6F1_01,+,GCATAATTTAT,-35917;Transfac,CREB_Q2_01,+,CTTGACGTCAGGAG,-38209;Transfac,GABP_B,-,CCGGGAAGAGCA,19270;Transfac,AHRARNT_01,+,GGAGGGTAGTGTGCCC,-27057;Transfac,DR1_Q3,-,TGGACAGAGGTCA,10865;Transfac,MZF1_02,-,TGGAGAGGGGCAA,19435;Transfac,P300_01,+,TCAAGGAGTGGGTG,-6194;Transfac,DELTAEF1_01,-,ACTCACCTGAG,8387;Jaspar,USF,+,CACGTGG,-10864;Transfac,CMYB_01,+,TACAAAGGCGGTTGGGAG,-11310;Transfac,PADS_C,-,TGTGGTCTC,4001;Jaspar,Chop-cEBP,-,GGGTGCAATGGC,21908;Transfac,DBP_Q6,+,AGCACAC,-6111;Transfac,NFKAPPAB65_01,-,GGGAATTTCC,28429;Transfac,AP2GAMMA_01,-,GCCTGGGGG,4883;Transfac,AHR_01,-,GCCCAGGCTGGAGTGCAA,18623;Transfac,TAL1BETAITF2_01,-,GGGGACAGATGGCAGT,25058;Transfac,PITX2_Q2,+,TGTAATCCCAA,-3780;Transfac,CAAT_C,+,GCCCAATAACCAGCTCCTCGCTGAT,-20432;Transfac,IK2_01,+,CTTTGGGAAGGC,-38457;Transfac,MIF1_01,+,TGGGTGCAGGGCCGCTGG,-7352;Transfac,IK1_01,+,GCTTGGGAAGGCC,-12009;Transfac,NFKB_Q6,+,ATGGGAATCTCCTC,-19067;Jaspar,Tal1beta-E47S,+,GGAACATCTGTT,-35130;Transfac,VJUN_01,+,GTGATGATGTCATTGC,-6140;Transfac,PAX5_02,+,GGAGTGCAATGTGAGCCGAGACCACACA,-3976;Transfac,PAX5_01,-,TCTTGGCTCACTGTAGTGTAGACTTCCC,18984;Transfac,BRACH_01,-,AGAATCACATGTAGGTGCCACAGT,16237;Transfac,CETS1P54_02,-,CCACCGGATGTGG,5441;Transfac,MAF_Q6_01,-,GGCTGAGTCAA,24942;Transfac,TAXCREB_02,+,GTGACCCACACCCTA,-28621;Jaspar,Pax-2,-,CGTCACGG,13353;Transfac,COMP1_01,+,TGTTATCAATGACAATGCGCGCCC,-28488;Transfac,CREL_01,+,GGGGAATTCC,-23710;Transfac,SP1_Q2_01,-,CCCCACCCCC,8399;Jaspar,c-MYB_1,+,GGCCGTTG,-11773;Transfac,SMAD3_Q6,-,TGTCTGTCT,16822;Transfac,E2A_Q6,+,CACCTGCC,-5136;Transfac,MYCMAX_03,+,CGAGAGTCACGTGAGGCTGA,-7182;Transfac,CHCH_01,+,CGGGGG,-6696;Transfac,E2A_Q2,-,GCACCTGCCTCAGT,7411;Transfac,BEL1_B,-,AAAGTGCTGAGATTACAGGCATAAGCCA,17103;Transfac,NRSE_B,+,CTCAGCACCTTGGCCAGCTCC,-24957;Transfac,MAZ_Q6,-,GGGGAGGG,16549;Transfac,ZIC1_01,+,TGGGGGGTC,-13048;Jaspar,RORalfa-1,+,TTCAAGGTCA,-20060;Transfac,NF1_Q6,+,TGCTGGCAGGCAGGCAGA,-12343;Transfac,MINI20_B,+,ACCTCCCACCATGGAGGAGGA,-5205;Transfac,VMW65_Q6,+,TCTCATTA,-25555;Transfac,NFKAPPAB50_01,+,GGGGAGTCCC,-5241;Jaspar,RREB-1,-,CCCCCCACCACCCCCTCCCA,30642;Jaspar,NRF-2,+,GCCGGAAGGG,-8755;Transfac,RFX1_01,+,TAGGCACCTAGTAACAG,-31718;Transfac,GNCF_01,+,CAGGAGTTCAAGGTCAGC,-20054;Jaspar,RXR-VDR,-,GGGTCACAGAGATCA,28627;Transfac,NRSF_01,+,CTCAGCACCTTGGCCAGCTCC,-24957;Transfac,USF_Q6_01,+,GCCCACGTGAGC,-6052;Transfac,P53_01,+,GGACATGGTGGCACATGTCT,-22689;Transfac,WHN_B,+,AGGGACGCCTT,-6534;Transfac,MINI19_B,-,GCAAGGAGCCACACAGCAGGA,13854;Transfac,GKLF_01,+,AAAGGAAGGAAGGG,-35999;Transfac,HNF4_01_B,+,GGGGGCAAAGGTAGG,-22339;Transfac,YY1_Q6,-,GCCATCTTG,18004;Jaspar,p53,-,CAGGACAAGTTCGAGCATCT,2978;Jaspar,p50,-,GGGGGTTCCCG,15798;Transfac,GATA2_02,-,GGAGATAAGA,33994;Transfac,GRE_C,+,GTCACACCCTGTCCTC,-6375;Transfac,FXR_Q3,+,CAAGGGCAGCAACC,-13934;Transfac,MYCMAX_B,-,GCCATGTGCC,30955;Transfac,NFE2_01,-,AGCTGAGGCAC,13976;Transfac,CACBINDINGPROTEIN_Q6,+,GGGGGTGGG,-8390;Transfac,MYOD_Q6,+,TGCACCTGTC,-6277;Transfac,STAF_02,+,ACATACCATCATGCCTGGCTA,-24189;Transfac,STAF_01,-,AGTTCCCGTAGTGCCTGACGGT,5931;Transfac,GATA3_02,-,GGAGATAAGA,33994;Jaspar,Myf,+,AGGCAGCAGGAG,-8418;Transfac,NRF2_01,+,GCCGGAAGGG,-8755;Transfac,GATA1_01,+,GGGGATGGGG,-6520;Transfac,ICSBP_Q6,+,GAAGAGAAACTG,-6711;Transfac,CETS1P54_01,-,ACCGGATGTG,5439;Transfac,TCF11MAFG_01,+,CTGTTGTGAGGCAGCAGTTGTG,-12574;Transfac,CACCCBINDINGFACTOR_Q6,+,AATCAGCTGGGTGTGG,-18121;Transfac,SMAD_Q6_01,+,TAGTCAGACAG,-34438;Transfac,GC_01,+,CAAGGGCGGGGCCT,-11202;Transfac,FOXM1_01,+,AGATGGAGT,-3171;Transfac,ARP1_01,-,TGAACTCCTGACCTCT,3835;Transfac,NGFIC_01,-,TCACGTGGGCGG,6061;Jaspar,Gklf,+,AAAGGGAAGG,-35981;Transfac,ERR1_Q2,+,AGTTCAAGGTCAGC,-20058;Jaspar,MZF_5-13,-,GGAGGGGGAG,8091</motif>
    </e>
</a>

Other examples could be found in the previous version.

3dsnp v1.0 tutorials

Overview

3DSNP is an integrated database for annotating the regulatory function of human noncoding SNPs by exploring their 3D interactions with genes and other SNPs mediated by chromatin loops. The models of cis-acting DNA elements regulating gene expression through three dimensional interactions mediated by chromatin loops have been established recently, and SNPs were reported frequently located in these elements. 3DSNP collects currently available Hi-C datasets from different studies. Two types of linkages were defined according to the spectrums of the loops: “Within Loop” or “Anchor-to-Anchor”. Allele frequencies were obtained for the SNPs in 1000 Genomes Phase 3 data, and pairwise LD was calculated for all pairs of SNPs in each continental population (EAS, AFR, AMR, ASN, EUR) within 200 kb. 3DSNP also integrated chromatin state segments, transcription factor binding sites and DNA accessibility from the Roadmap Epigenomics and ENCODE projects, DNA-binding motifs from TRANSFAC and JASPAR databases, eQTLs from the GTEx project and sequence conservation from UCSC Genome Browser. Visualization tools were developed for 3DSNP to display interacting SNPs, genes and elements along with important epigenetic marks, and a comprehensive scoring system was developed to assess the functionality of SNPs in different aspects. 3DSNP provides an integrated database and visualization tools for discovering the regulatory roles of noncoding SNPs mediated by 3D genome topology.

Data source

Sequential and genotyping data

3DSNP contains all 149,254,102 SNPs and small indels from NCBI dbSNP build 146. Among them, 84,801,880 SNPs were phased using 1000 Genomes Project Phase 3 (final phase) genotype data, the allele frequencies were obtained and pairwise LD was calculated for all pairs of SNPs in each continental population (AFR, AMR, ASN, EUR and SAS) within 200 kb. In addition, minor allele frequency (MAF) and linear closest gene were also extracted from dbSNP. Gene annotations were obtained from GRCh37/hg19 version of RefSeq genes from the UCSC Genome Browser.

3D genome topology

Chromatin loops identified by Hi-C technology in multiple human cell types were collected from published Hi-C studies (Rao et al. 2015, Sanborn et al. 2015, Taberlay et al. 2016) to map the intrachromosomal interactions between distant genomic regions. In total, 3DSNP collected 75,362 intrachromosome chromatin loops in twelve human cell types. It has been reported that Chromatin interactions are classified into two types based on the spans of chromatin loops. For a chromatin loop shorter than 200 kb, the corresponding interaction type is ‘Within loop’, where genomic elements located within can interact each other. For a chromatin loop longer than 200 kb, the type is ‘Anchor-to-anchor’, where only elements located at the two anchors are supposed to interact with each other.

Chromatin signature

A variety of chromatin signatures were used to annotate the regulatory functions of SNPs, including chromatin state (ChromHMM Core 15-state model), histone modifications (NarrowPeak), DNase I hypersensitivity sites and transcription factor binding sites from Roadmap Epigenomics and ENCODE projects. To annotate SNPs that alters TF binding motifs, TFM-Scan software was used to locate the putative TFBS across the genome using a set of position weight matrices (PWMs) collected from TRANSFAC and JASPAR databases.

Conservation

The conservation of SNPs was measured by two PhyloP scores obtained from the UCSC Genome Browser. The two PhyloP scores were calculated from multiple alignments of 46 vertebrate genomes and 33 mammal genomes respectively. The absolute values of the PhyloP scores represent -log(P-values) under a null hypothesis of neutral evolution, and sites predicted to be conserved are assigned positive scores, while sites predicted to be fast-evolving are assigned negative scores.

eQTL

Correlations between genotype and tissue-specific gene expression levels will help annotate the effects of genetic variants on gene regulation. We collected a total of 19,582,729 significant SNP-gene pairs (FDR <= 0.05) in 44 human tissues from the GTEx project version 6. To measure the significance of the eQTLs, nominal eQTL p-values and the effect size were obtained for each SNP-gene pair. Nominal eQTL p-values were generated using a two-tailed t test, testing the alternative hypothesis that the beta deviates from the null hypothesis of beta=0. The effect size of the eQTLs is defined as the slope (‘beta’) of the linear regression, and is computed as the effect of the alternative allele (ALT) relative to the reference allele (REF) in the human genome.

Visualization

Regional LD plot

A regional plot is used to display a set of SNPs associated to the query in LD, as shown in Figure 1. In this plot, the x-axis shows chromosome coordinates, y-axis shows values for r², the size of the node represents its total score, and associated SNPs in five populations are shown in different colors. Associated SNPs in each of the five populations can be removed from or added to the plot by clicking the corresponding circle in the legend. Users can also restrict the range of total score for displaying by adjusting the upper and lower bound of size bar at the right side of the plot. A detailed page will be opened by clicking the node of the corresponding SNP. The regional LD plots can be displayed in the browser and can also be downloaded as high quality, publication-ready PNG files.

Figure 1. Regional LD plot of the associated SNPs of rs12740374. In the plot, x-axis shows chromosome coordinates, y-axis shows values for r², the size of the node represents its total score, and associated SNPs in five populations (AFR: African, AMR: Ad Mixed American, ASN: East Asian, EUR: European and SAS: South Asian) are displayed in different colors, and rs12740374 is displayed in black.

CIRCOS plot

Circos is a software package for visualizing genomic data and information. A customizable Circos plotting system was developed to display the 3D chromatin topology and a set of important chromatin marks surrounding the query SNP. Circos tracks from outside to inside represent: Chromatin states, RefSeq genes, DHS and histone modifications, TFBS, associated SNPs and chromatin loops, as shown in Figure 2. The color scheme of 15 chromatin states in ChromHMM model is shown in Table 1, and the color scheme of chromatin loops in twelve cell types is shown in Table 2 below.

Figure 2. Track annotation for Circos plot. In the plot, from outer to inner, the circle represents chromatin states, annotated genes, histone modification set (red), transcription factor set (blue), current SNP and associated SNPs, and 3D chromatin interactions, respectively.

Table 1. Color scheme of 15 chromatin states in ChromHMM core model

State NO.	Mnemonic	Description
1	TssA	Active TSS
2	TssAFlnk	Flanking Active TSS
3	TxFlnk	Transcr. at gene 5′ and 3′
4	Tx	Strong transcription
5	TxWk	Weak transcription
6	EnhG	Genic enhancers
7	Enh	Enhancers
8	ZNF/Rpts	ZNF genes & repeats
9	Het	Heterochromatin
10	TssBiv	Bivalent/Poised TSS
11	BivFlnk	Flanking Bivalent TSS/Enh
12	EnhBiv	Bivalent Enhancer
13	ReprPC	Repressed PolyComb
14	ReprPCWk	Weak Repressed PolyComb
15	Quies	Quiescent/Low

Table 2. Color scheme of twelve cell types for the chromatin loop track in Circos plot

Cell type	Description	Tissue
GM12878	Lymphoblastoid Cells	Blood
K562	K562 leukemia Cells	Blood
H1-hESC	Embryonic stem cells	ESC
IMR90	Fetal lung fibroblasts	Lung
HeLa-S3	Cervical carcinoma cells	Cervix
HUVEC	Umbilical vein endothelial cells	Blood vessel
NHEK	Epidermal keratinocytes	Skin
HMEC	Mammary epithelial cells	Breast
KBM-7	Chronic myelogenous leukemia (CML) cells	Blood
LNCaP	Prostate adenocarcinoma	Prostate
PC3	Prostate cancer cells	Prostate
PrEC	Prostate epithelial cell line	Prostate

You can customize the Circos tracks in the ‘Options’ block to display DHS, histone modifications of 127 cell lines in Roadmap Epigenomics Project and the binding sites of CTCF and other 166 transcription factors of 91 cell lines in ENCODE Project. The DHS and histone modification track is represent by red tiles and TFBS is represent by blue tiles.

UCSC Genome Browser plot

Chromatin loops and chromatin signatures surrounding the query SNP can be also displayed in a figure generated by the UCSC Genome Browser with custom tracks. The location of the query SNP is marked by red bar, and pairs of anchor sites of chromatin loops are displayed with colored bars for different cell types, as shown in Figure 3.

Figure 3. Track annotation for UCSC Genome Browser plot. In the plot, from top to bottom, the tracks are: genomic coordinates, chromatin interactions, current SNP, UCSC genes, RefSeq genes, histone modifications, CTCF binding sites, DNase Clusters and mammal conservation.

Scoring system

In 3DSNP, each SNP is scored based on its annotated records on six functional categories: 3D interacting genes, Enhancer state, Promoter state, Transcription factor binding sites, sequence motifs altered and conservation score. Different from the scoring scheme of RegulomeBD, which classifies SNPs into classes based on the combinatorial presence/absence status of functional categories, 3DSNP adopts a quantitative scoring system to evaluate the functional significance of a SNP in different categories. For the first five categories, we used the number of annotated records (hits) to assign score to a SNP in the corresponding category. Specifically, we fitted the numbers of hits of all SNPs in each chromosome to a Poisson distribution model. Considering a SNP has k hits in one functional category F, λ is the fitted parameter of the corresponding Poisson model, then the score of the SNP in this category is defined as follows:

For the conservation category (F’), we found the PhyloP scores of all SNPs in a chromosome follow a Gaussian distribution. Considering a SNP has a conservation score of c, μ and σ are the fitted parameters of the corresponding Gaussian model, then the score of the SNP in the conservation category is defined as follows:

The total score of a SNP is the sum of scores of the six functional categories.

Data format

Query format

SNP ID(s), a single genomic region or an official gene symbol can be used as query to search the database by the search bar on the main page. Multiple SNP ids should be delimited by commas or spaces, a genomic region should be written as chrN:start-end, and a gene should be written as gene:SYMBOL. Only one query type is allowed for each searching and mixed query format is not supported. The maximum size of the query in the search bar is 100 SNP IDs.

Upload file format

A text file containing a list of SNP IDs or genomic regions can be uploaded to the server for batch analysis by clicking the browser icon at the right side of the search bar. The maximum size of SNPs in the file for uploading is 2000 SNPs, the maximum number of regions in the file is 10 regions, and any data exceeding the limitations will be ignored.

Export format

All resulting forms can be exported in three formats: copy to clipboard, MS excel or PDF and figures can be exported in PNG format.

Usage example

Type the full id “rs12740374” in the search box then the SNP rs12740374 will appear automatically in the main table below. You can see the position and reference/alternative sequence in the genome. The total functionality score of rs12740374 is 135.06, and its 3D interacting genes are PSRC1 and other 7 genes. Click the “+” icon on the left side of the ID to see a table containing a set of associated SNPs in the same LD block with rs12740374. A regional LD plot on the right side of the table shows their associations as below:

Click the “AMR” circle in the legend to remove the associated SNPs in AMR population, as shown below:

You can also restrict the range of total score for displaying by adjusting the upper and lower bound of size bar at the right side of the plot. For example, you can make it only show SNPs whose total score > 50 as below:

You can alway click on the node to see the details of the specific SNP in a new detailed page.

Then, click the SNP ID “rs12740374” will open a new page containing all detailed information about this SNP, which contains:

A text paragraph summarizing it scores of the non-empty functional categories and a radar chart showing the distribution of the six scores, as below:

A Circos plot showing the chromatin loops and other 2D signatures:

You can switch the cell type of the CIRCOS plot to another cell type in the ‘Options’ block. For example, you can change the cell type to HepG2 of the liver tissue, as shown below.

A figure generated by UCSC Genome Browser with custom data:

And a series of tables containing the details of 3D interacting genes, 3D interacting SNP, Chromatin state, TFBS and Conservation.

3dsnp v2.0 tutorials

Overview

We systemically update our 3DSNP server. We systemically annotated SVs from a full spectrum of functions, especially their potential effects on three-dimensional chromatin structures. We also evaluated the chromatin accessibility surrounding the variants in a variety of tissues at the single-cell resolution. At last, we updated all the major contents of 3DSNP, including Hi-C, dbSNP and expression quantitative trait loci (eQTL).

NEW FEATURES AND CONTENTS UPDATE

Functional annotation of structural variation

To annotate SV, we first obtained a total of 107,590 SVs from the Human Genome Structural Variation Consortium (HGSVC v2). All these SVs were robustly identified using fully phased genome assemblies. Similar with our SNP annotation strategy, SVs were systemically annotated from a variety of aspects, including 3D interacting genes, expression quantitative trait loci (eQTL), chromatin state, transcription factor binding site (TFBS), sequence motif altered and evolutionary conservation. We calculated the linkage disequilibrium (LD)-associated SNPs and SVs of the query SV using the HGSVC SVs genotypes called from the 1000G project samples. Notably, the SV-eQTLs were identified by HGSVC using the transcriptomes of 427 donors and tested all SVs around the genes within a window of 1 Mbp. To generate a functional overview of SVs, each SV is scored based on its annotated records on six functional categories: 3D interacting genes, enhancer state, promoter state, transcription factor binding sites, sequence motifs altered and conservation score.

Especially, the new platform enables the measurement of the potential effects of SVs on local chromatin architecture. Using a published algorithm focusing on the prediction of the chromatin architecture alteration by SVs, we calculated three different sets of chromatin loops genome-wide for each Hi-C map of the 49 cell types:

i) the normal chromatin loops without considering any SV in the genome (e.g. track: H1-ESC),

ii) the altered chromatin loops considering all possible SVs located in the same topologically associating domain (TAD) at the population scale (e.g. track: H1-ESC_mod), and

iii) the altered chromatin loops considering only the presence of the query SV, to block any interferences from adjacent SVs (e.g. track: H1-ESC_mod_target), as show in Figure 1.

Figure 1

Annotation of SNP or SV target cell types/subtypes using scATAC-seq data

A measurement of local chromatin accessibility at the single-cell level will greatly facilitate the determination of functional target cell types/subtypes of a genetic variant. As a result, we collected two publicly available scATAC-seq datasets, including 53 scATAC-seq samples from 15 human fetal tissues and 70 samples from 25 adult tissues, respectively. For the fetal scATAC-seq dataset, a total of 1,001,437 ATAC peaks were identified genome-wide in 86,685 single cells, which were classified into 126 cell types/subtypes. The nearest ATAC peak (if any) within a 100 kb window surrounding a query SNP or SV was derived and its accessible states across all the cells were plotted using the Uniform Manifold Approximation and Projection (UMAP) (Figure 2). One can easily identify the target cell types/subtypes of the queried SNP or SV by looking up cell clusters with highly accessible chromatin states. For the adult scATAC-seq dataset, a total of 756,414 open chromatin regions termed as cis-regulatory elements (cCREs) were obtained in 472,373 single cells, which were classified into 54 cell types/subtypes. Since the raw peaks × cells matrix is not available yet, we used the average cCRE score of each peak across cells within the same cell type/subtype. For each SNP or SV query, the potentially functional cell types were sorted based on the accessibility scores of the cCREs overlapping the variant.

Figure 2

Population genetic statistics

With our better understanding of multi-variant adaptation, recent studies discovered that SNPs and SVs could jointly participate in the evolution of the genome and contribute to environmental adaptation. In order to avoid recombination with maladapted genomic backgrounds, candidate adaptive variants discovered so far often occurred in non-coding regions. Combing SNP genotypes from 1000G and SV genotypes from HGSVC, we searched for positive natural selection signals among the five major populations (AMR, EAS, EUR, AFR, and SAS). Population genetic statistics were calculated between each population and the others, including fixation index statistics (F_ST), integrated haplotype homozygosity score (iHS) and cross-population number of segregating sites by length (XPNSL).

Contents update

Additional cell types/tissues with Hi-C data. The number of available cell types with Hi-C data increased from 12 to 49, so that substantially more 3D interacting genes can be identified for a query variant in the updated version.

Latest dbSNP version. dbSNP database was updated from version 146 to 154, with the number of annotated SNPs increasing from 149,254,102 to 700,385,017.

Latest GTEx 3database. GTEx database was updated from version 6 to 8, with the number of significant SNP-gene pairs increasing from 19,582,729 to 71,478,528.

INTERFACE

New interactive interface for visualizing chromatin architecture and accessibility

Former version of 3DSNP has provided a series of user-friendly interfaces for the users to search, browse, visualize and export the results. To facilitate the visualization of the local chromatin architecture and accessibility surrounding the query SNPs or SVs, we used IGV.js plugin to replace the former USCS genome browser screenshot, as shown in Figure 3. Five major annotation categories are integrated in this visualization platform: basic annotation, Hi-C loops, scATAC-seq, variants, and population genetics.

(i) Basic annotation class include: RefSeq genes, DNase sites and RepeatMasker;

(ii) Hi-C loops class include: the normal loops (e.g. H1-ESC), the altered loops by all SVs (e.g. H1-ESC_mod) and the altered loops only by the query SV (e.g. H1-ESC_mod_target) in 49 human cell types;

(iii) scATAC-seq class include: averaged chromatin accessibility profiles of 126 cell types/subtypes in 15 fetal tissues;

(iv) Variants class include the dbSNP and HGSCV2 databases;

(v) Pathogenicity class include ClinVar and ClinGen.

(vi) Population genetics class include F_ST, iHS, and XPNSL；

Users can easily select interested tracks by clicking the Track Selector button at the bottom of the tracks.

Figure 3

Data format

Query format

In addition to SNP ID(s), now we could search SV ID(s) in our database like chrN-start-SVtype-SVlength. It is worth noting that we used the naming of SV IDs from HGSVC. The SV IDs in HGSVC is defined in GRCh38 which is different from the position information we provide in the main table (GRCh37). In order to easily find SVs from HGSVC, we didn’t change the naming for SV IDs.

In addition, we also extracted some highly reliable SVs detected by short-read sequencing from the ClinVar. The naming of these SVs is consistent with the positions in GRCh37. Therefore, we recommend researchers search SVs by genomic regions. Target SVs in the region will automatically appear at the top of the main table.

Other annotations could be found in the previous version.