Example 1. Mitochondrial DNA haplogroups or mtSNPs
Example on how to carry out the estimate of the statistical power in a case-control association study when several mitochondrial DNA (mtDNA) haplogroups or mtDNA SNPs (mtSNPs) are targeted, then involving 2 x k contingency tables, being k the number of haplogroups or mtSNPs.
Note that the statistical model does not take into account the phylogenetic relationship of the different haplogroups/mtSNPs. The method considers the situation where there is not a hierarchical relationship between haplogroups/mtSNPs; then, haplogroups/mtSNPs frequencies should sum to 1. A reasonable solution to a situation where there exists hierarchical phylogenetic relationships (and awaiting for more complex and sophisticated statistical models) it would be possible to consider only those haplogroups/mtSNPs categories that do not have a hierarchical relationship; for instance, taking into account only the haplogroups/mtSNPs representing the tips of the phylogeny.
For instance, let’s consider a scenario where haplogroup H and the following four sub-haplogroups of U are genotyped (we assume that all our U haplotypes fall in one of these four categories; we used Phylotree Build 16 as the reference phylogeny for this example): U1, U2’3’4’7’8’9, U5, and U6. Therefore, the number of haplogroups that could be used for the estimation of the statistical power is five (haplogroup H plus the four sub-haplogroups of U); thus, the frequency of the five categories should sum to 1.
In the example above, we could be also interested in testing mtSNPs (instead of haplogroups). For the estimation of the statistical power when employing mtSNPs, a reasonable solution would be to use the number of haplogroups in the tip of the phylogeny determined by the mtSNPs genotyped. In the example given above (and independently of the number of mtSNPs genotyped), we could also used five as the number of categories for the estimation of the statistical power.
Then, let's us consider we are carrying out a case-control study on five non-hierarchical haplogroups. Here are the parameters we would have to introduce:
Parameters
Total number of categories = 5
Frequency of the risky variant = 0.1
Odds ratio for the risky variant = 1.1;1.2;1.3;1.4;1.5;1.6;1.7;1.8;1.9;2
Sample size in controls = 5000
Sample size in cases = 4000
Number of simulations = 1000
Significance level = 0.05
Statistical test = Asymptotic
Graphical output = TRUE
• The values above indicate that we wish to estimate the statistical power of a case-control study that considers five different categories (haplogroups or mtSNPs) and 5000 controls and 4000 cases.
• A range of OR ranging from 1.1 to 2 in intervals of 0.01 is introduced (single values can also be introduced).
• The frequency of the risky haplogroup is 10% (one could try other frequencies).
• The simulation value and the standard significant level were set to the default values (0.05 and 1000).
• The selection of the asymptotic statistical test is very convenient from the computational point of view.
|