
File formats for the main datasets are as follows:

--------------------
Fasta header format:
--------------------
The fasta headers look like:

>human_pseudo_read8527 chr17 49920726-49922795
>central_S217P6750RG5.T0 chr17 49920726-49921527 3
>western_pzs36b02.g1 chr17 49921565-49922204 0
>western_qey69a08.b1 chr17 49921305-49921914 0
>western_G591P64063RB8.T0 chr17 49921458-49922294 0

The ">human" header has 3 fields which give the read_id, chr, and position respectively.

Non-human header lines have 1 additional field, which is the individual ID, whose details are:

Western chimps:
Clint = 0
S221 = 4
S222 = 5

Central chimps:
Clara/S215 = 1
Masuku/S216 = 2
Noemie/S217 = 3



-----------------------
Filtered_data_set files
-----------------------

Fields:
1 cluster_id
2 chromosome (human ref sequence)
3 position (human ref sequence)
4 event class

The event_class may be something like '12C', in which case this refers to a mutation specific to individuals 1 and 2 (who are both chimp). The individual ids are detailed above.