Mouse SNPs Track Settings
 
Annotated SNPs from mouse strain comparison analysis   (All Variation and Repeats tracks)

Display mode:      Duplicate track

Haplotype sorting display

Enable Haplotype sorting display
Haplotype sorting order:
using middle variant in viewing window as anchor.
If this mode is selected and genotypes are phased or homozygous, then each genotype is split into two independent haplotypes. These local haplotypes are clustered by similarity around a central variant. Haplotypes are reordered for display using the clustering tree, which is drawn in the left label area. Local haplotype blocks can often be identified using this display.
To anchor the sorting to a particular variant, click on the variant in the genome browser, and then click on the 'Use this variant' button on the next page.
using the order in which samples appear in the underlying VCF file
Allele coloring scheme:
reference alleles invisible, alternate alleles in black
reference alleles in blue, alternate alleles in red
first base of allele (A = red, C = blue, G = green, T = magenta)
Haplotype sorting display height:

Filters

Exclude variants with Quality/confidence score (QUAL) score less than
Exclude variants with these FILTER values:
 
PASS (All filters passed)
StrandBias (Min P-value for strand bias (INFO/PV4) [0.0001])
EndDistBias (Min P-value for end distance bias (INFO/PV4) [0.0001])
MaxDP (Maximum read depth (INFO/DP or INFO/DP4) [])
BaseQualBias (Min P-value for baseQ bias (INFO/PV4) [0])
MinMQ (Minimum RMS mapping quality for SNPs (INFO/MQ) [20])
MinAB (Minimum number of alternate bases (INFO/DP4) [5])
Qual (Minimum value of the QUAL field [10])
VDB (Minimum Variant Distance Bias (INFO/VDB) [0])
GapWin (Window size for filtering adjacent gaps [3])
MapQualBias (Min P-value for mapQ bias (INFO/PV4) [0])
SnpGap (SNP within INT bp around a gap to be filtered [2])
RefN (Reference base is N [])
MinDP (Minimum read depth (INFO/DP or INFO/DP4) [5])
Het (Genotype call is heterozygous (low quality) [])
Minimum minor allele frequency (if INFO column includes AF or AC+AN):


Display data as a density graph:

VCF configuration help

Data schema/format description and download
Assembly: Mouse Dec. 2011 (GRCm38/mm10)
Data last updated at UCSC: 2016-11-08

Description

This track shows single nucleotide variants (SNVs), from the Mouse Genomes Project.

Display Conventions

In "dense" mode, a vertical line is drawn at the position of each variant. In "pack" mode, since these variants have been phased, the display shows a clustering of haplotypes in the viewed range, sorted by similarity of alleles weighted by proximity to a central variant. The clustering view can highlight local patterns of linkage.

In the clustering display, each sample's phased diploid genotype is split into two independent haplotypes. Each haplotype is placed in a horizontal row of pixels; when the number of haplotypes exceeds the number of vertical pixels for the track, multiple haplotypes fall in the same pixel row and pixels are averaged across haplotypes.

Each variant is a vertical bar with white (invisible) representing the reference allele and black representing the non-reference allele(s). Tick marks are drawn at the top and bottom of each variant's vertical bar to make the bar more visible when most alleles are reference alleles. The vertical bar for the central variant used in clustering is outlined in purple. In order to avoid long compute times, the range of alleles used in clustering may be limited; alleles used in clustering have purple tick marks at the top and bottom.

The clustering tree is displayed to the left of the main image. It does not represent relatedness of individuals; it simply shows the arrangement of local haplotypes by similarity. When a rightmost branch is purple, it means that all haplotypes in that branch are identical, at least within the range of variants used in clustering.

Methods

Listed below are the strain names as they appear in the VCF header, the full strain name, gender of samples sequenced and the approximate sequence fold-coverage of the genome, based on the number of read bases mapped to the reference genome (and excluding reads marked as PCR duplicates).

VCF header
name
strainname sex sequence
fold-coverage
129P2_OlaHsd(129P2/OlaHsd)F52
129S1_SvImJ(129S1/SvImJ)F68
129S5SvEvBrd(129S5SvEvBrd)F22
A_J(A/J)F52
AKR_J(AKR/J)F57
BALB_cJ(BALB/cJ)F62
BTBR_T+_Itpr3tf_J(BTBR T+ Itpr3tf/J)M85
BUB_BnJ(BUB/BnJ)M49
C3H_HeH(C3H/HeH)F14
C3H_HeJ(C3H/HeJ)F63
C57BL_10J(C57BL/10J)M37
C57BL_6NJ(C57BL/6NJ)F61
C57BR_cdJ(C57BR/cdJ)M51
C57L_J(C57L/J)M64
C58_J(C58/J)M55
CAST_EiJ(CAST/EiJ)F53
CBA_J(CBA/J)F56
DBA_1J(DBA/1J)M49
DBA_2J(DBA/2J)F56
FVB_NJ(FVB/NJ)F73
I_LnJ(I/LnJ)M45
KK_HiJ(KK/HiJ)M55
LEWES_EiJ(LEWES/EiJ)F19
LP_J(LP/J)F54
MOLF_EiJ(MOLF/EiJ)M40
NOD_ShiLtJ(NOD/ShiLtJ)F66
NZB_B1NJ(NZB/B1NJ)M47
NZO_HlLtJ(NZO/HlLtJ)F72
NZW_LacJ(NZW/LacJ)M58
PWK_PhJ(PWK/PhJ)F53
RF_J(RF/J)M54
SEA_GnJ(SEA/GnJ)M49
SPRET_EiJ(SPRET/EiJ)F67
ST_bJ(ST/bJ)M81
WSB_EiJ(WSB/EiJ)F51
ZALENDE_EiJ(ZALENDE/EiJ)M19

All SNP and indel calls are relative to the reference mouse genome C57BL/6J (GRCm38/mm10). The reference genome used for the alignment can be found here: ftp-mouse.sanger.ac.uk/ref/. Gene models from Ensembl release 78 were used to predict the functional consequences of the SNPs and indels. SNPs and indels are annotated with rs IDs from dbSNP Build 142. The dbSNP data was downloaded from: ftp.ncbi.nlm.nih.gov/snp/organisms/mouse_10090/VCF/ and the 'vcf-annotate' Perl utility from the VCFtools package (Danecek et al, 2011) was used to add the rsIDs to calls in this release. (See below for VCFtools information). For SNPs, the position, reference allele and alternative alleles were all compared:

e.g.
vcf-annotate -c CHROM,POS,ID,REF,ALT

For indels, only the positions were matched:
e.g.
vcf-annotate -c CHROM,POS,ID

Sequencing was performed using the Illumina HiSeq platform. All reads are 100bp paired-end reads except for strains 129P2 and 129S5 in which the sequence data included reads of 75 bps or less. Also, a small amount of the sequence data for MOLF_EiJ is single-end sequencing.

In version 3 all variant data was obtained from sequencing of female mice only. In version 4, 10 new strains were included in which all data was obtained from sequencing of male mice. The data for an additional 8 strains included in this release (version 5) was obtained from sequencing of male mice for 5 strains, and female mice for the remaining 3 strains. As such, the SNP and indel VCF files contain calls on chromosomes 1-19, MT, X and Y. The BAM files used to call SNPs and indels are located here: ftp-mouse.sanger.ac.uk/REL-1502-BAM/.

Reads were aligned to the reference genome (GRCm38/mm10) using BWA-MEM v0.7.5-r406 (Li and Durbin, 2009; Li, 2013).

Reads were realigned around indels using GATK realignment tool v3.0.0 (McKenna et al., 2010) with default parameters.

SNP and indel discovery was performed with the SAMtools v1.1 with parameters:
Samtools mpileup -t DP,DV,DP4,SP,DPR,INFO/DPR -E -Q 0 -pm3 -F0.25 â#"d500
and calling was performed with BCFtools call v1.1 with parameters:
Bcftools call -mv -f GQ,GP -p 0.99
Indels were then left-aligned and normalized using bcftools norm v1.1 with parameters:
bcftools norm -D -s -m+indels

The vcf-annotate function in the VCFtools package was used to soft-filter the SNP and indel calls.

The Variant Effect Predictor software from Ensembl (McLaren et al., 2010) was used to predict the functional consequence of SNP and indels queried against Ensembl release 78 mouse gene models.

Definitions of consequence types can be found here: http://www.ensembl.org/info/genome/variation/predicted_data.html#consequences.

SNP calling was performed for each strain independently. These strain specific VCF files can be found on the ftp site ftp-mouse.sanger.ac.uk/REL-1505-SNPs_Indels/strain_specific_vcfs/.

A single list of all polymorphic sites across the genome was then produced from all of the 36 strains' SNP calls. This list was then used to call SNPs again, this time across all 36 strains simultaneously, using the 'samtools mpileup -l' option. The calls from all 36 strains were then merged into a single VCF file. All strain specific information was retained in the sample columns for each strain. For indels, the same approach was taken with the addition of the indel normalisation step after the initial variant calling. The merged SNP VCF and indel VCF for version 5 can be found here: ftp-mouse.sanger.ac.uk/REL-1505-SNPs_Indels/.

Information regarding the filtering of SNP and indel calls can be found in the VCF file headers in the '##FILTER' and '##source' lines.

Credits

Thanks to the Mouse Genomes Project for supplying the data for this track.

See also: ftp-mouse.sanger.ac.uk/REL-1505-SNPs_Indels/README.

References

Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST et al. The variant call format and VCFtools. Bioinformatics. 2011 Aug 1;27(15):2156-8. PMID: 21653522; PMC: PMC3137218

Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at http://arxiv.org/pdf/1303.3997v2.pdf 2013.

Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009 Jul 15;25(14):1754-60. PMID: 19451168; PMC: PMC2705234

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup.. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. PMID: 19505943; PMC: PMC2723002

Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011 Nov 1;27(21):2987-93. PMID: 21903627; PMC: PMC3198575

McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010 Sep;20(9):1297-303. PMID: 20644199; PMC: PMC2928508

McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics. 2010 Aug 15;26(16):2069-70. PMID: 20562413; PMC: PMC2916720