6.3 Output Files
Next
we will discuss the format of the output files, which can be specified in the
parameter file. For an explanation of the format of these files click here.
§
indoutfilename specifies the following information for all the
samples analyzed:
o Indiv_Id
o Gender
o Status
o Num_valid_genotypes
§
snpoutfilename specifies the following information for all the
markers analyzed:
o Snp_Id
o Chromosome_num
o Genetic_pos
o Physical_pos
o Pop_A_variant_allele_count
o Pop_A_ref_allele_count
o Pop_B_variant_allele_count
o Pop_B_ref_allele_count
o Case_genotype_count
o Control_genotype_count
§
thetafilename specifies the following information for all the
analyzed samples:
o Indiv_index
o Indiv_id
o θ_true: “true” value of θ or M, printed only in the
simulation mode
o θ_ mean: population A ancestry for the autosomes averaged over all the iterations for a particular
individual
o θ _sdev: standard
deviation of θ_ mean
o θX_true: “true” values of θX
or MX, printed only in the
simulation mode
o θX_mean: population A ancestry for the X chromosome
averaged over all the iterations for a particular individual
o θX_sdev: standard deviation of θX_mean
o Status
§
lambdafilename specifies the following information for all the
analyzed samples:
o Indiv_index
o Indiv_Id
o λ_true: “true” value of λ, printed only in the
simulation mode
o λ_ mean: λ for the autosomes
averaged over all the iterations for a particular individual
o λ_sdev: standard deviation associated with λ_ mean
o λX_true: “true” value of λX
, printed only in the simulation mode
o λX_mean: λ for the X chromosome averaged over all
the iterations for a particular individual
o λX_sdev: standard deviation associated with λX_mean
§
freqfilename specifies the following information for all the
markers analyzed:
o SNP_Index: index internal to the program for the snp
o SNP_ID
o chromosome_num
o atrue: “true” reference allele frequency in population
A, valid only in simulation mode
o anaive: naïve frequency of the reference allele in
population A using the ancestral genotype data
o amean: calculated frequency of the reference allele in
population A averaged over all the iterations
o asdev: standard deviation associated with amean
o btrue: “true” reference allele frequency in population
B, valid only in simulation mode
o bnaive: naïve frequency of the reference allele in
population B using the ancestral genotype data
o bmean: calculated frequency of the reference allele in
population B averaged over all the iterations
o bsdev: standard deviation associated with bmean
§
ethnicfilename specifies the following information for all the
markers:
o SNP_Index
o chromosome_num
o SNP_ID
o Avg_ethnicity: Average
θ or M over all iterations, and over all individuals at a particular
marker.
§
pubxfile: Contains ancestry estimates for either a single
marker or individual depending on the usage. In either case it outputs the
probability of having 0, 1 or 2 PopB chromosomes in
the columns G[0],G[1] and G[2].
§
localoutfilename: contains the scores for all the markers:
o SNP_Index
o Chromosome_Num
o Physical_Pos
o Genetic_Pos
o Log Genome Score
o Case Control Score
o G(Case) : Average ancestry for all cases at that
marker
o G(control) : Average ancestry for all controls at
that marker
o rpower: Information content
§
output:
This is the output file which has the following information for all the Markov
chain monte carlo
iterations:
o Iteration_Num
o θ_mean
o θx_mean
o θ_corr
o λ_mean
o λx_mean
o λ_corr
o t(popA)
o t(popB)
o log score
o log score averaged over iterations
Note
that if this file name is not specified in the parameter file, we write the
above to the standard output.
If
the program is run with checkit
= YES, then the results of the data check programs mentioned in Section 5 are directed to the standard output.
As
detailed in the paper we feel that 100 burn-in iterations and 200 follow on
iterations should be sufficient for most analysis. These are the number of
suggested iterations for most exploratory runs, and user can increase these
numbers in order to confirm the results. One can plot the genome-wide-score as
a function of iteration number, to see how well the score converges.
The
following two files are written to when we run the program in the simulation
mode:
§
Genotoyoutfilename: specifies genotype data for all the markers and
simulated individuals in simulation mode:
o SNP_ID
o Indiv_ID
o Vart_allele_count
§
Indtoyoutfilename: specifies the following information for the
simulated individuals in simulation mode:
o Indiv_ID
o Gender
o Population