Genome sequence errors produce signals of accelerated evolution in chimpanzee (in submission)


(1) Alignments of ref. [1]
      (1.1) Procedure
      (1.2) Legend for alignments
            (1.2.1) lines
            (1.2.2) annotation
      (1.3) Alignments
      (1.4) Alignment comparisons with Ensembl

(2) Alignments of ref. [2]
      (2.1) Procedure
      (2.2) Legend for alignments
            (2.2.1) lines
            (2.2.2) annotation
      (2.3) Alignments

(3) Comparison of assemblies used in this work with publicly available assemblies
      (3.1) Alignment comparisons with Ensembl

(4) Resequencing data
      (4.1) Chimp forward reads
      (4.2) Chimp reverse reads
      (4.3) Human forward reads
      (4.4) Human reverse names

(5) References
      (5.1) Alignments of resequencing data




(1) Alignments of ref. [1]

(1.1) Procedure

To facilitate a comparison between an alignment of a set of sequences in one analysis, and an alignment of a set of sequences in another analysis, we generate a meta-alignment which is simply an alignment of alignments. To do this, we took the human DNA sequence from our alignments and the alignments kindly provided by ref. [1], and used dynamic programming (using needle from EMBOSS [3]) to globally align them. This is used as a guide to stitch the two alignments together.

(1.2) Legend for alignments

(1.2.1) lines

There are 14 lines in each meta-alignment.
Line 1 is the reference ensembl human protein sequence for the gene.
Lines 2-4 show the translations of our own DNA alignments for human, chimpanzee and macaque respectively, shown in lines 5-7.
Lines 8-10 show the corresponding alignment published in ref [1]).
Line 11 shows codon position of alignments of ref [1]
Line 12-14 show protein translation of ref [1].

Positions within the protein alignment which do not match the protein consensus are highlighted). 

(1.2.2) annotation

Sequence differences are highlighted about the alignment with the letters H and C above the 10 lines, and are either bracketed (synonymous) or unbracketed (non-synonymous); divergent sites that are removed by filtering are shown by an asterisk. Macaque divergent sites are not highlighted.

(1.3) Alignments

n
gene_id
msa_link
1
ALPK3
msa.html
2
ANKRD11
msa.html
3
ARID1A
msa.html
4
BVES
msa.html
5
CDC5L
msa.html
6
CR025
msa.html
7
CSF2RB
msa.html
8
CTTNBP2
msa.html
9
DRP2
msa.html
10
EML5
msa.html
11
GLCCI1
msa.html
12
GRPR
msa.html
13
HECW1
msa.html
14
HELZ
msa.html
15
IPO9
msa.html
16
JOSD3
msa.html
17
K1434
msa.html
18
KRTHA6
msa.html
19
LANCL3
msa.html
20
LSM14A
msa.html
21
MARK1
msa.html
22
MCF2L2
msa.html
23
MYH9
msa.html
24
NP_060227.1
msa.html
25
NP_078947.2
msa.html
26
NP_115923.1
msa.html
27
NP_955806.1
msa.html
28
NRIP1
msa.html
29
NT5DC4
msa.html
30
PADI6
msa.html
31
PALLD
msa.html
32
PCNT
msa.html
33
PGBD4
msa.html
34
PLEKHG3
msa.html
35
POLR3B
msa.html
36
Q9NSI3
msa.html
37
RNF130
msa.html
38
SORBS1
msa.html
39
SUV39H2
msa.html
40
TAIP2
msa.html
41
TARBP1
msa.html
42
TEC
msa.html
43
TRERF1
msa.html
44
TTC21B
msa.html
45
UCHL5
msa.html
46
USP44
msa.html
47
XPO7
msa.html
48
XRCC1
msa.html
49
XRCC4
msa.html

(1.4) Alignment comparisons with Ensembl

Genomic alignments for primates are also available at Ensembl [4]. Here we make a 3-way comparison between i) our alignments, ii) those of ref.[1], and iii) Ensembl. In all cases, the positive selection indicated in ref.[1] is present neither in our own alignments, nor in the Ensembl alignments.
gene_id
msa_link
HELZ
msa.html
NP_115923.1
msa.html
POLR3B
msa.html
TEC
msa.html

(2) Alignments of ref. [2]

(2.1) Procedure

To facilitate a comparison between an alignment of a set of sequences in one analysis, and an alignment of a set of sequences in another analysis, we generate a meta-alignment which is simply an alignment of alignments. To do this, we took the human DNA sequence from our alignments and the alignments kindly provided by ref [2]), and used dynamic programming (using needle from EMBOSS [3]) to globally align them. This is used as a guide to stitch the two alignments together.

(2.2) Legend for alignments

(2.2.1) lines

There are 14 lines in each meta-alignment.
Line 1 is the reference ensembl human protein sequence for the gene.
Lines 2-4 show the translations of our own DNA alignments for human, chimpanzee and macaque respectively, shown in lines 5-7.
Lines 8-10 show the corresponding alignment published in ref [1]).
Line 11 shows codon position of alignments of ref [1]
Line 12-14 show protein translation of ref [1].

Positions within the protein alignment which do not match the protein consensus are highlighted). 

(2.2.2) annotation

Sequence differences are highlighted about the alignment with the letters H and C above the 10 lines, and are either bracketed (synonymous) or unbracketed (non-synonymous); divergent sites that are removed by filtering are shown by an asterisk. Macaque divergent sites are not highlighted.

(2.3) Alignments

n
gene_id
msa_link
1
PRM1
msa.html
2
IRF7
msa.html
3
BCDIN3
msa.html
4
KRT36
msa.html
5
TMCO5
msa.html
6
DLEC1
msa.html
7
C14orf121
msa.html
8
NPFFR2
msa.html
9
FLJ46210
msa.html
10
PARD3-007
msa.html

(3) Comparison of assemblies used in this work with publicly available assemblies

genome
availability
coverage
total contig length
N50 contig length
n_contigs
notes
chimp (panTro2)
public
6x
2.97 Gb (no gaps)
29 kb
265882
-
chimp (constructed)
on request
7.26x
2.83 Gb (no gaps)
43.5 kb
145414
-
macaque (rheMac2)
public
5.1x
2.87 Gb (no gaps), 3.01 Gb (with gaps)
25.7 kb
-
20.1e6 used in assembly
macaque (ours)
on request
6.17 x
2.92 Gb (no gaps)
34.9 kb
179250
-
- Higher coverage in principle yields more accurate local assemblies, and a higher quality consensus. This is ideal for scans of positive selection.
- full assemblies for both chimpanzee and macaque are available on request.

(3.1) Alignment comparisons with Ensembl

Genomic alignments for primates are also available at Ensembl [4]. Here we make a 3-way comparison between i) our alignments, ii) those of ref.[1], and iii) Ensembl. For C14orf121, our alignments show a signal of positive selection which we filter out due to its proximity to the end of a genomic alignment (indicated by the splice site), where misalignments may occur. Ensembl alignments indicate that this is the correct thing to do. Ref 2 shows the false signal of positive selection here, due to misalignment. For IRF7, the same situation occurs. Here, however, the filtering is over-aggressive.
gene_id
msa_link
C14orf121
msa.html
IRF7
msa.html

(4) Resequencing data

PCR amplification and bi-directional sequencing on an ABI 3730 sequencer on a panel of 8 humans and 8 chimpanzees at these sites (including Clint, the chimpanzee used for the chimpanzee reference sequence). We obtained clear results for 10 loci. For each individual, forward and reverse sequence names are given for each loci, in the format: [loci]_[org][seq]. Sequences have been submitted to Genbank (awaiting accession numbers; these should be searchable by the title of the paper), or from: seq
or from: reich_datasets

(4.1) Chimp forward reads

Chimpanzee: Clara  Clint  Gina  Masuku  NA03448  NA03450  Noemie  Yvonne

./Clara:
B0_chimpF61  B2_chimpF63  B5_chimpF65  B7_chimpF67  B11_chimpF69  G0_chimpF71
B1_chimpF62  B3_chimpF64  B6_chimpF66  B8_chimpF68  B13_chimpF70  G1_chimpF72

./Clint:
B0_chimpF49  B2_chimpF51  B5_chimpF53  B7_chimpF55  B11_chimpF57  G0_chimpF59
B1_chimpF50  B3_chimpF52  B6_chimpF54  B8_chimpF56  B13_chimpF58  G1_chimpF60

./Gina:
B0_chimpF37  B2_chimpF39  B5_chimpF41  B7_chimpF43  B11_chimpF45  G0_chimpF47
B1_chimpF38  B3_chimpF40  B6_chimpF42  B8_chimpF44  B13_chimpF46  G1_chimpF48

./Masuku:
B0_chimpF73  B2_chimpF75  B5_chimpF77  B7_chimpF79  B11_chimpF81  G0_chimpF83
B1_chimpF74  B3_chimpF76  B6_chimpF78  B8_chimpF80  B13_chimpF82  G1_chimpF84

./NA03448:
B0_chimpF1  B2_chimpF3  B5_chimpF5  B7_chimpF7  B11_chimpF9   G0_chimpF11
B1_chimpF2  B3_chimpF4  B6_chimpF6  B8_chimpF8  B13_chimpF10  G1_chimpF12

./NA03450:
B0_chimpF25  B2_chimpF27  B5_chimpF29  B7_chimpF31  B11_chimpF33  G0_chimpF35
B1_chimpF26  B3_chimpF28  B6_chimpF30  B8_chimpF32  B13_chimpF34  G1_chimpF36

./Noemie:
B0_chimpF13  B2_chimpF15  B5_chimpF17  B7_chimpF19  B11_chimpF21  G0_chimpF23
B1_chimpF14  B3_chimpF16  B6_chimpF18  B8_chimpF20  B13_chimpF22  G1_chimpF24

./Yvonne:
B0_chimpF85  B2_chimpF87  B5_chimpF89  B7_chimpF91  B11_chimpF93  G0_chimpF95
B1_chimpF86  B3_chimpF88  B6_chimpF90  B8_chimpF92  B13_chimpF94  G1_chimpF96

(4.2) Chimp reverse reads

Chimpanzee: Clara  Clint  Gina  Masuku  NA03448  NA03450  Noemie  Yvonne

./Clara:
B0_chimpR61  B2_chimpR63  B5_chimpR65  B7_chimpR67  B11_chimpR69  G0_chimpR71
B1_chimpR62  B3_chimpR64  B6_chimpR66  B8_chimpR68  B13_chimpR70  G1_chimpR72

./Clint:
B0_chimpR49  B2_chimpR51  B5_chimpR53  B7_chimpR55  B11_chimpR57  G0_chimpR59
B1_chimpR50  B3_chimpR52  B6_chimpR54  B8_chimpR56  B13_chimpR58  G1_chimpR60

./Gina:
B0_chimpR37  B2_chimpR39  B5_chimpR41  B7_chimpR43  B11_chimpR45  G0_chimpR47
B1_chimpR38  B3_chimpR40  B6_chimpR42  B8_chimpR44  B13_chimpR46  G1_chimpR48

./Masuku:
B0_chimpR73  B2_chimpR75  B5_chimpR77  B7_chimpR79  B11_chimpR81  G0_chimpR83
B1_chimpR74  B3_chimpR76  B6_chimpR78  B8_chimpR80  B13_chimpR82  G1_chimpR84

./NA03448:
B0_chimpR1  B2_chimpR3  B5_chimpR5  B7_chimpR7  B11_chimpR9   G0_chimpR11
B1_chimpR2  B3_chimpR4  B6_chimpR6  B8_chimpR8  B13_chimpR10  G1_chimpR12

./NA03450:
B0_chimpR25  B2_chimpR27  B5_chimpR29  B7_chimpR31  B11_chimpR33  G0_chimpR35
B1_chimpR26  B3_chimpR28  B6_chimpR30  B8_chimpR32  B13_chimpR34  G1_chimpR36

./Noemie:
B0_chimpR13  B2_chimpR15  B5_chimpR17  B7_chimpR19  B11_chimpR21  G0_chimpR23
B1_chimpR14  B3_chimpR16  B6_chimpR18  B8_chimpR20  B13_chimpR22  G1_chimpR24

./Yvonne:
B0_chimpR85  B2_chimpR87  B5_chimpR89  B7_chimpR91  B11_chimpR93  G0_chimpR95
B1_chimpR86  B3_chimpR88  B6_chimpR90  B8_chimpR92  B13_chimpR94  G1_chimpR96

(4.3) Human forward reads

Humans: NA07348  NA10831  NA10863  NA12753  NA18502  NA18870  NA19201  NA19238

./NA07348:
B0_humanF73  B2_humanF75  B5_humanF77  B7_humanF79  B11_humanF81  G0_humanF83
B1_humanF74  B3_humanF76  B6_humanF78  B8_humanF80  B13_humanF82  G1_humanF84

./NA10831:
945625_humanF95  B1_humanF86  B3_humanF88  B6_humanF90  B8_humanF92   B13_humanF94
B0_humanF85      B2_humanF87  B5_humanF89  B7_humanF91  B11_humanF93  G1_humanF96

./NA10863:
B0_humanF49  B2_humanF51  B5_humanF53  B7_humanF55  B11_humanF57  G0_humanF59
B1_humanF50  B3_humanF52  B6_humanF54  B8_humanF56  B13_humanF58  G1_humanF60

./NA12753:
945601_humanF71  B1_humanF62  B3_humanF64  B6_humanF66  B8_humanF68   B13_humanF70
B0_humanF61      B2_humanF63  B5_humanF65  B7_humanF67  B11_humanF69  G1_humanF72

./NA18502:
B0_humanF1  B2_humanF3  B5_humanF5  B7_humanF7  B11_humanF9   G0_humanF11
B1_humanF2  B3_humanF4  B6_humanF6  B8_humanF8  B13_humanF10  G1_humanF12

./NA18870:
B0_humanF37  B2_humanF39  B5_humanF41  B7_humanF43  B11_humanF45  G0_humanF47
B1_humanF38  B3_humanF40  B6_humanF42  B8_humanF44  B13_humanF46  G1_humanF48

./NA19201:
B0_humanF25  B2_humanF27  B5_humanF29  B7_humanF31  B11_humanF33  G0_humanF35
B1_humanF26  B3_humanF28  B6_humanF30  B8_humanF32  B13_humanF34  G1_humanF36

./NA19238:
B0_humanF13  B2_humanF15  B5_humanF17  B7_humanF19  B11_humanF21  G0_humanF23
B1_humanF14  B3_humanF16  B6_humanF18  B8_humanF20  B13_humanF22  G1_humanF24

(4.4) Human reverse names

Humans: NA07348  NA10831  NA10863  NA12753  NA18502  NA18870  NA19201  NA19238

./NA07348:
B0_humanR73  B2_humanR75  B5_humanR77  B7_humanR79  B11_humanR81  G0_humanR83
B1_humanR74  B3_humanR76  B6_humanR78  B8_humanR80  B13_humanR82  G1_humanR84

./NA10831:
B0_humanR85  B2_humanR87  B5_humanR89  B7_humanR91  B11_humanR93  G0_humanR95
B1_humanR86  B3_humanR88  B6_humanR90  B8_humanR92  B13_humanR94  G1_humanR96

./NA10863:
B0_humanR49  B2_humanR51  B5_humanR53  B7_humanR55  B11_humanR57  G0_humanR59
B1_humanR50  B3_humanR52  B6_humanR54  B8_humanR56  B13_humanR58  G1_humanR60

./NA12753:
B0_humanR61  B2_humanR63  B5_humanR65  B7_humanR67  B11_humanR69  G0_humanR71
B1_humanR62  B3_humanR64  B6_humanR66  B8_humanR68  B13_humanR70  G1_humanR72

./NA18502:
B0_humanR1  B2_humanR3  B5_humanR5  B7_humanR7  B11_humanR9   G0_humanR11
B1_humanR2  B3_humanR4  B6_humanR6  B8_humanR8  B13_humanR10  G1_humanR12

./NA18870:
B0_humanR37  B2_humanR39  B5_humanR41  B7_humanR43  B11_humanR45  G0_humanR47
B1_humanR38  B3_humanR40  B6_humanR42  B8_humanR44  B13_humanR46  G1_humanR48

./NA19201:
B0_humanR25  B2_humanR27  B5_humanR29  B7_humanR31  B11_humanR33  G0_humanR35
B1_humanR26  B3_humanR28  B6_humanR30  B8_humanR32  B13_humanR34  G1_humanR36

./NA19238:
B0_humanR13  B2_humanR15  B5_humanR17  B7_humanR19  B11_humanR21  G0_humanR23
B1_humanR14  B3_humanR16  B6_humanR18  B8_humanR20  B13_humanR22  G1_humanR24

(5) References

[1]Bakewell M., Shi P., Zhang J. (2007) More genes underwent positive selection in chimpanzee evolution than in human evolution. Proc Natl Acad Sci USA 104: 7489-94.
[2]Rhesus macaque genome sequencing and analysis consortium (2007) Evolutionary and biomedical insights from the rhesus macaque genome. Science 316, 222-234.
[3]Rice,P. Longden,I. and Bleasby,A. (2000) EMBOSS: The European Molecular Biology Open Software Suite.Trends in Genetics 16, (6) 276277.
[4] Ensembl: www.ensembl.org

(5.1) Alignments of resequencing data

source
gene
protein alignment
resequencing alignment
Bakewell 0
HELZ

msa

msa
Bakewell 1
NP_115923.1

msa

msa
Bakewell 2
POLR3B

msa

msa
Bakewell 3
TEC

msa

msa
Bakewell 5
C18orf25 (CR025)

msa

msa
Bakewell 6
HECW1

msa

msa
Bakewell 7
ARID1A

msa

msa
Bakewell 8
EML5

msa

msa
Bakewell 11
GLCCI1

msa

msa
Bakewell 13
BVES

msa

msa
Gibbs 0
IRF7

msa

msa
Gibbs 1
LRRC16B (C14orf121)

msa

msa