Guide to Essential BioStatistics V: Designing and implementing experiments – Variance
In this fifth article in the LabCoat Guide to BioStatistics series, we learn about Variance in designing and implementing experiments.
In the previous articles in this series, we explored the Scientific Method and Proposing Hypotheses and Type-I and Type-II errors.
Future articles will cover: Designing and implementing experiments (Significance, Power, Effect, Variance, Replication and Randomization), Critically evaluating experimental data (Q-test; SD, SE and 95%CI), and Concluding whether to accept or reject the hypothesis (F- and T-tests, Chi-square, ANOVA and post-ANOVA testing).
An experiment comprises an independent as well as a dependent variable. The independent variable (in our example, the herbicide safener) can be changed or controlled (included or eliminated), and the effect it has on the dependent variable (insecticide phytotoxicity) can be recorded and analyzed.
A third group of variables are termed controlled variables and are those variables which the researcher controls (holds constant) and quantifies (records) during an experiment.
Temperature is a typical controlled variable – as temperature could affect the degree of phytotoxicity, it is important that it is held constant during the experiment. Other controlled variables relevant for biological trials include light, humidity and the variability within laboratory equipment.
Variance is the final factor which determines the required sample size (number of treatments or number of replicates) when designing an experiment. For the estimation of variance in a specific trial, the experimental standard deviation will need to be determined.
Variance is a measure of how far a data set is spread out or differs from the mean value. Variance is defined as the average of the squares of the differences between the observed values and the calculated average.
A biological data set will typically have a high variance, and experimental variance may arise from a number of sources including variance within the biological material being tested, researcher variance, equipment variance, and climatic variance.
Standard deviation (SD) is calculated as the square root of the variance, and approximates the average difference, or “spread” of the data set from the mean.
Why bother calculating the square root of the variance? Simple – the standard deviation is expressed in the same units as the mean (x̄), whereas variance is expressed in squared units. Standard deviation is thus a more intuitive method of indicating the spread of our data around the mean.
Figure 1: For an approximately Normal data set, the two-tailed values within one-, two- and three standard deviations of the mean account for about 68%, 95%; and 99.7% of the set, respectively.
From the above, we can see that about 68%, 95% and 99.7% of the values in a Normal distribution lie within one, two and three standard deviations of the mean, respectively. For practical purposes, scientists assume that their data is derived from an approximately Normal data set.
One-tailed vs. two-tailed tests
In Figure 1, the standard deviation intervals are based on a two-tailed (or two-sided) test, which distributes the possibility of an effect in both directions: positive and negative.
Stated in other terms: a two-tailed test divides your significance level, and each direction is only half as “strong” as a one-tailed test, in which the significance level is applied in a single direction.
▶︎ Two-sided Rule of Thumb: for the biological sciences, two-sided statistics should be used, unless there is reason (and you have the statistical insight) to do otherwise.
Without going into more detail, it is sufficient to know that for biological testing, two-sided tests should be used, unless there is reason (and you have the statistical insight) to do otherwise.
Normal data distributions are symmetrical and follow a bell-shaped density curve (see Figure 1), with data distribution denser in the center (around the mean) and less dense in the tails.
▶︎ Normality Rule of Thumb: for biological data, a Normal distribution may be assumed as a valid working approximation.
For biological data, the assumption of Normality is generally a valid approximation, as the Normal distribution of biological data is surprisingly ubiquitous. If the data is clearly not Normally distributed, it may need to be transformed – this will be discussed in a later article.
Estimating Standard Deviation
Standard deviation may be estimated from pilot trials or similar, prior experiments. Often as a researcher, you will be handed a set of data from a similar trial and expected to give an on the spot evaluation of its variance.
A useful trick to have up your sleeve is the ability to give a rough estimate of the standard deviation of a data set using the Range Rule of Thumb:
▶︎ Range Rule of Thumb: SD ≈ Range/4 where Range = (maximum value) – (minimum value).
The Range Rule becomes more accurate when written as SD ≈ Range/SQRT(n). So, to be able to give a quick estimate of the standard deviation by glancing at a table of data, we may use:
Figure 2: The Range Rule for estimating Standard Deviation (SD)
For rapid estimates of Standard Deviation in efficacy (0-100%) datasets, the following graphical representation may be a useful reference:
Figure 3: Standard Deviation estimates for selected efficacy (%) ranges and data set sizes.
Let us consider our proposed experiment in which plants sprayed with a phytotoxic insecticide with added herbicide safener is to be tested, relative to plants sprayed with a phytotoxic insecticide without added safener.
In a previous pilot trial using the phytotoxic insecticide against the same plant species we intend to use, the results of 19 replicates were distributed as shown in Figure 1 (left).
The range is thus 84 – 76 = 8 and applying the Range rule of Thumb allows us to quickly estimate SD as 8/4 = 2. In this example, a calculation of SD gives the same result.
We will examine the usefulness of the standard deviation, coefficient of variance and effect size in more detail in the later section: Critically Evaluating Experimental Data.
Thanks for reading – please feel free to read and share my other articles in this series!
The first two books in the LABCOAT GUIDE TO CROP PROTECTION series are now published and available in eBook and Print formats!
Aimed at students, professionals, and others wishing to understand basic aspects of Pesticide and Biopesticide Mode Of Action & Formulation and Strategic R&D Management, this series is an easily accessible introduction to essential principles of Crop Protection Development and Research Management.
A little about myself
I am a Plant Scientist with a background in Molecular Plant Biology and Crop Protection.
20 years ago, I worked at Copenhagen University and the University of Adelaide on plant responses to biotic and abiotic stress in crops.
At that time, biology-based crop protection strategies had not taken off commercially, so I transitioned to conventional (chemical) crop protection R&D at Cheminova, later FMC.
During this period, public opinion, as well as increasing regulatory requirements, gradually closed the door of opportunity for conventional crop protection strategies, while the biological crop protection technology I had contributed to earlier began to reach commercial viability.