Guide to Essential BioStatistics VI: Designing and implementing experiments – sample size & replication
In this sixth article in the LabCoat Guide to BioStatistics series, we learn about Sample Size and Replication.
In the previous articles in this series, we explored the Scientific Method and Proposing Hypotheses and Type-I and Type-II errors.
Future articles will cover: Designing and implementing experiments (Significance, Power, Effect, Variance, Replication and Randomization), Critically evaluating experimental data (Q-test; SD, SE and 95%CI), and Concluding whether to accept or reject the hypothesis (F- and T-tests, Chi-square, ANOVA and post-ANOVA testing).
Calculating sample size using the coefficient of variance.
Replication is the repeated application of treatments to multiple independently assigned experimental units. The number of independently assigned experimental units that receive the same treatment is the sample size.
To obtain a rough estimate of the number of replicates required to generate a data set with a specific effect size, significance level and power, the estimated coefficient of variance for the experiment first needs to be determined:
Coefficient of Variance
Experimental variability may be expressed as the percent data spread relative to the mean and is termed the Coefficient of Variance or Coefficient of Variation (CoV), also known as the Relative Standard Deviation (RSD).
Calculation of the CoV should only be performed on data measured on the ratio scale (e.g., mass, length, duration, disease severity, etc.), and on Normally distributed data (see previous article).
Also, it should be noted that when the mean value is small (approaching zero) the CoV may become extremely large and will be sensitive to small changes in the mean. Accordingly, CoV for e.g., crop protection trials are typically calculated for the ED50 (the dose giving a 50% effect).
The coefficient of variance may be calculated as:
CoV= (SD/MEAN)*100
The coefficient of variance describes the size of the standard deviation or data spread relative to the mean (or average). Standard Deviation and Mean may be estimated from pilot trials or similar, prior experiments, or from the assumptions for biological trials presented in Figure 1.
Accordingly, in our proposed experiment in which plants sprayed with a phytotoxic insecticide with and without added herbicide safener is to be tested, the results of an initial pilot trial indicate that the data range was thus 84% efficacy (highest value) – 76% efficacy (lowest value) = 8 (see previous article), and applying the Range rule of Thumb allows us to estimate SD as 8/4 = 2. From this we may estimate the Coefficient of Variation as CoV = (2/80)*100 = 2.5%
▶︎ A biological rule of thumb is that a coefficient of variance of 20% or less is considered acceptable.
As variance increases for an experiment, it becomes more difficult to detect a significant difference, and a larger sample size will be required. In the absence of pilot trials or similar prior experiments, assumptions of CoV may be used together with Power and Effect Size (see earlier articles) for a rough estimate of required sample size when designing experiments.
Biological assumptions for estimating number of replicates
- a coefficient of variance of 20% or less is considered acceptable.
- a Power of 80% is considered sufficient, meaning that there is only a 20% chance of erroneously concluding that there was no difference in efficacy between the treatments.
- an Effect Size (treatment effect or improvement in efficacy) of 20% is considered economically viable.
- a 5% significance level, at which we are 95% confident that we will not make a Type one error (identify false positives) and reject the Null hypothesis despite its being true.
- a Normal distribution may be assumed as a valid working approximation.
- two-sided statistics are considered appropriate for biological trials.
Figure 1: Assumptions for biological trials.
With an indication of the relative data spread or variability, an estimate for the number of replicates required to achieve typical levels of significance, power and effect size may be obtained from the following tables, for a single sample (e.g. is it true that the treatment effect is greater than 0, the expected effect of an untreated control, UTC).
For comparing two treatments (e.g., treated and untreated control), the following table may be used to conveniently estimate the number of replicates needed based on CoV and Effect size, assuming 80% power and 5% significance:
Figure 2: Sample sizes for a range of COV and effect sizes for comparing two treatments, two-sided and assuming 80% power and 5% significance. Based on “Statistical Rules of Thumb” by Gerald van Belle.
Consider our proposed experiment in which plants sprayed with a phytotoxic insecticide with and without added herbicide safener is to be tested.
In the absence of any pilot trial data, or data from similar trials we may start with some underlying assumptions: a treatment effect of 20% (the safener-treated plants will need to show at least 20% less phytotoxicity to make further development worthwhile, and an assumption of 20% variability (CoV) is a realistic estimate of variability. We can now roughly estimate that each of the treatments will require 13 replicates (Figure 2).
For trials with a greater number of treatments or “experimental units”, the number of replicates required may also be determined from the minimum accepted degrees of freedom for a biological trial. This will be covered in the next article.
Thanks for reading – please feel free to read and share my other articles in this series!
For more information, visit BIOSCIENCE SOLUTIONS – Strategic R&D Management Consultancy.
The first two books in the LABCOAT GUIDE TO CROP PROTECTION series are now published and available in eBook and Print formats!
Aimed at students, professionals, and others wishing to understand basic aspects of Pesticide and Biopesticide Mode Of Action & Formulation and Strategic R&D Management, this series is an easily accessible introduction to essential principles of Crop Protection Development and Research Management.
A little about myself
I am a Plant Scientist with a background in Molecular Plant Biology and Crop Protection.
20 years ago, I worked at Copenhagen University and the University of Adelaide on plant responses to biotic and abiotic stress in crops.
At that time, biology-based crop protection strategies had not taken off commercially, so I transitioned to conventional (chemical) crop protection R&D at Cheminova, later FMC.
During this period, public opinion, as well as increasing regulatory requirements, gradually closed the door of opportunity for conventional crop protection strategies, while the biological crop protection technology I had contributed to earlier began to reach commercial viability.