DNA sequence alignments used in Caswell et al

"Analysis of chimpanzee history based on genome sequence alignments"
Caswell J.L, Mallick S., Gnerre S., Richter D.J., Schirmer C., Reich D. PLoS Genetics (in press)

Main datasets

dataset		README
C1C2WH	fasta	filtered_data_sets
C1C2WHM	fasta	filtered_data_sets
ChBH	fasta	filtered_data_sets
CWBH	fasta	filtered_data_sets
CWBHM	fasta	filtered_data_sets
ECWH	fasta	filtered_data_sets
ECWHM	fasta	filtered_data_sets
W1W2CH	fasta	filtered_data_sets
W1W2CHM	fasta	filtered_data_sets

Incomplete lineage sorting datasets

To empirically test for incomplete lineage sorting between humans, chimpanzees and bonobos, we generated a new data set consisting of alignment of more than 12 million base pairs of sequence of chimpanzees, bonobos, humans, orangutans, and macaque (CBHOM). Orangutan was added to give two outgroup species ancestral to human, chimps and bonobos.

Preparation of the CBHOM data set

We implemented a new procedure that used genome assemblies wherever possible, allowing longer regions of alignment, which provided sufficient data in some of the regions of alignment to distinguish between potential genealogical trees. For details see Note S9 of the paper.

Data sets

CBHOM

alignments

snps

Raw sequence data for bonobo and eastern chimpanzee

We extracted shotgun sequenced data for western chimpanzees, central chimpanzees, human, orangutan and macaque from public databases, as described in the main text. For bonobos and eastern chimpanzees, no public sequence data was available, so we generated our own. All the sequencing reads we generated are publicly available at the NCBI trace archive (http://www.ncbi.nlm.nih.gov/Traces); to access them, carry out the following queries:

Bonobo (Pan paniscus), 28,416 passing reads
CENTER_NAME='WIBR' and CENTER_PROJECT='G743'

Eastern chimpanzee (Pan troglodytes schweinfurthii), 39,168 passing reads
CENTER_NAME='WIBR' and CENTER_PROJECT='G870'