DNA sequence alignments used in Caswell et al
"Analysis of chimpanzee history based on genome sequence alignments"
Caswell J.L, Mallick S., Gnerre S., Richter D.J., Schirmer C., Reich D.
PLoS Genetics (in press)
Main datasets
Incomplete lineage sorting datasets
To empirically test for incomplete lineage sorting between humans, chimpanzees and bonobos, we generated a
new data set consisting of alignment of more than 12 million base pairs of sequence of chimpanzees, bonobos,
humans, orangutans, and macaque (CBHOM). Orangutan was added to give two outgroup species ancestral to human, chimps and bonobos.
Preparation of the CBHOM data set
We implemented a new procedure that used genome assemblies wherever
possible, allowing longer regions of alignment, which provided sufficient data in some of the regions of
alignment to distinguish between potential genealogical trees. For details see Note S9 of the paper.
Data sets
Raw sequence data for bonobo and eastern chimpanzee
We extracted shotgun sequenced data for western chimpanzees, central
chimpanzees, human, orangutan and macaque from public databases, as described in
the main text. For bonobos and eastern chimpanzees, no public sequence data was
available, so we generated our own. All the sequencing reads we generated are
publicly available at the NCBI trace archive
(http://www.ncbi.nlm.nih.gov/Traces); to access them, carry out the following
queries:
Bonobo (Pan paniscus), 28,416 passing reads
CENTER_NAME='WIBR' and CENTER_PROJECT='G743'
Eastern chimpanzee (Pan troglodytes schweinfurthii), 39,168 passing reads
CENTER_NAME='WIBR' and CENTER_PROJECT='G870'