- Introduction
mitPower is a tool that allows estimating the statistical power in mitochondrial DNA (mtDNA) case-control association disease studies. There are other utilities in mitPower such as: (i) a posteriori estimation of the statistical power, (ii) estimation of the number of controls needed in order to reach a given statistical power, and (iii) the estimation of the minimum deviation from the null hypothesis (of no association) detectable under a given statistical power (expressed as OR and haplogroup frequency in cases). mitPower allows two different calibration procedures, asymptotic and permutation
There are different utilities in mitPower that can be divided in four different modules.
- Common features to several modules
2.1. On how to introduce haplogroup frequencies
It is quite common to consider statistical power assuming two different haplogroup categories, the “risky haplogroup” vs. the “remaining haplogroups” (as a single category). Note however that statistical power decreases as the number of haplogroups increases. This fact is considered in the algorithms implemented in mitPower.
For the sake of clarity on how mitPower performs, let us consider the module “Power estimation” and the following scenarios:
(i) “Total number of haplogroups (NH)”: NH = 2. In this scenario one must introduce the frequency of the risky haplogroup (p0); then, the “Frequency of the remaining haplogroups” will be set up automatically to (1-p0). Users can introduce this value themselves but note that the software will never run if the frequencies of the two haplogroups do not sum to 1.
(ii) NH = 3. In this scenario one must introduce p0 first; then, there are two different options regarding “Frequency of the remaining haplogroups”. If the box is kept empty, the software will set the frequency of the two remaining haplogroups as (1-p0)/(NH-1) (that is, the frequency in the remaining haplogroups is equally ‘shared’ in these haplogroups). Alternatively, the frequency of the remaining haplogroups (NH-1) can be incorporated by the user as decimal values separated by semicolons “;”. Again, the sum of the frequencies for the three haplogroups must sum to 1. For instance, if p0 = 0.3; them a possible combination for the “Frequency of the remaining haplogroups” would be “0.4;0.3”.
2.2. Sample sizes in cases and controls
In some modules it is possible to introduce single values for the number of cases/controls or alternatively, sample sizes can be introduced as vectors; the latter allows obtaining estimates for different values of sample sizes.
A vector is indicated using three different parameters, the minimum (e.g. Nca,min) and the maximum sample size (e.g. Nca,max) and the Step value. The Step value indicates the different sample sizes considered between the maximum and the minimum. For computational reasons, we limit the number of possible steps by selecting a value that has to be above 5% of the difference between the minimum and the maximum sample size indicate value indicated; e.g. 0.05 × (Nca,max - Nca,min).
2.3. Other features
(i) All boxes should be full in with appropriate values as indicated in mitPower. If not, mitPower will indicate an error.
(ii) By default, the number of simulations and permutations is set to 1000, while the alpha value is set to 0.05. Users can change these values, but in order to avoid computational problems, mitPower does not allow simulation and permutation values above 5000 and 5000, respectively. Note that mitPower will demand more time if the user introduce complex parameters, such as a vector of sample sizes and ORs. If a particular user has some special requirement, please, send us an email for help.
(iii) For most of the cases, the asymptotic and the permutation method will give similar estimates. The asymptotic method is however much faster. More information in Pardo-Seco et al. (2013).
(iv) In some modules the different estimates can be obtained for several values of odd ratios (OR). Instead of providing a single value, users can therefore provide with a vector of OR values separated by semicolons; e.g. 1.5;1.6;1.7:1.8;1.9;2. This is options is also availiable for the control-case ratio considered in the sample size module (see below).
- Power estimation
This is the main application. One can estimate the statistical power of a case-control study. First, the user has to introduce the total number of haplogroups and their frequencies for the risky haplogroup and the remaining haplogroup (see section 2).
A realistic odd ratio (OR) value (according to the disease model) for the risky haplogroup must be given. A vector of OR values can be considered as well if the user wants to obtain different power estimates according to different OR’s (see section 2).
Sample size in cases is also mandatory. Then, the user can select the number of simulations, the significant level (usually 0.05) and the statistic, Chi-square or Fisher's Exact Test. For the Chi-square, two calibration methods are possible, the asymptotic and the permutation approach (1000 permutations by default). Our results indicate that there are not relevant differences between the calibration and the permutation methods. Computational requirements using permutation could be an issue when considering a large number of iterations and therefore the asymptotic calibration method might be more convenient. A reasonable number of permutations could be between 500 and 1000. The Fisher's Exact Test can be used for 2 x 2 tables. For 2 x k tables (where k > 2) the Fisher's Exact Test can be computationally very demanded, but it can be approached using a Monte Carlo approach (1000 Monte Carlo simulations by default).
The graphical output will generate graphics in tiff and pdf formats.
- A posteriori power estimation
In this module, the user can estimate the a posteriori power by introducing the frequency of the risky haplogroup in controls and cases, and sample sizes in cases and controls.
Again, the user can select a number of options, such as the number of simulations, the significant level (0.05 by default), and the statistical test (Chi-square or Fisher's Exact Test). See section 3 for more information.
- Sample size
This module has been designed to investigate the sample size needed to reach a desired power. Two different alternatives have been considered:
5.1. Minimum number of controls given the number of cases
There are generally more limitations to recruit samples from patients than from controls, therefore it might be desirable to investigate if it is possible to reach a given statistical power by way of increasing the number of controls given a fixed number of cases.
5.2. Number of controls and cases given a control-case ration
It might be interesting to estimate the sample size considering different control:case ratio (as this is often a criteria for study design). This function allows graphical outputs.
- Minimum difference
This module has been designed for those interested in estimating the minimum deviation from the null hypothesis (of no association) detectable given a statistical power (expressed as OR and haplogroup frequency in cases). Again, various options are available. Sample sizes can be introduced as point values or as vectors (see section 2) indicating the minimum and the maximum sample size and the Step value. The algorithm also considers the total number of haplogroups tested (integer positive numbers always >1).
- Citation
When making use of mitPower, please cite the following paper:
Pardo-Seco J, Amigo J, González-Manteiga W, Salas A (2012) A generalize model to estimate the statistical power in mitochondrial disease studies. Submitted.
- Acknowledgments
The research leading to these results has received funding from the “Ministerio de Ciencia e Innovación” (SAF2008-02971) and from the Plan Galego IDT, Xunta de Galicia (EM 2012/045) given to Antonio Salas.