Basic Statistics for dentists
Really, really basic stats. Enough to let you interpret data and perhaps to think about how to present your own data. These notes are intended only as a guide. Understanding statistics is your responsibility.
Why do dentists need stats at all? Because data without statistics are meaningless. Consider the word meaningless: MEANING LESS; devoid of meaning. To say that a drug or intervention caused a 50% increase in survival rate of a disease sounds fantastic and is perfect for a tabloid headline but, without any statistical indication of how good the data are, the statement is worthless. The MMR "scandal" in which a link was made between immunisation and autism is a perfect example of a poorly designed study that yielded meaningless results that nevertheless caused great alarm when the data were reported in the press in the complete absence of objective statistical analysis. Nobody with any biomedical or statistical understanding would have been in the least impressed or worried by the original paper. Tragically, this argument still rumbles on and the impact of the misinformation from newspapers and elsewhere on vaccination programmes have effectively reintroduced measles into this country.
What follows are (almost) maths-free definitions of all the key statistical terms that I think you might need, plus a few others that you need to know about in order to make sense of the rest. Writing in this way is always a compromise between giving too much information (and so losing the message) and missing things out or simplifying to the point of nonsense. Please let me know if you think I have missed anything or generated too much nonsense. Links are provided to relevant articles in Wikipedia and elsewhere for anyone who wants to know more.
Definitions: Data, Probability and the Null Hypothesis
- Population: Everything. All possible items from which samples may be drawn. If it were possible to work with populations then we wouldn't need statistics. You can't (ever!) work with the entire population of anything so instead you take a sample of the population and use statistics to prove beyond reasonable doubt whether the sample does or does not represent a particular population.
- Sample: Part of a population about which you have made observations. For example, If you were interested in the distribution of eye colour in the UK you couldn't possibly check the eye colour of every individual but you could look at sample of (for example) a few hundred and argue (statistically) that this sample was truly representative of the population as a whole. Similarly, if you wanted to know how tall people in the UK were then you could measure the heights of a sample of the population and so on. Making sure that you choose a representitive and fair sample of the population can be the most important part of any study. Bias in sample selection will invalidate the entire study. Most examples of bias are the accidental outcome of poor study design or inappropriate sampling techniques. A guaranteed way of invalidating any study is to determine the sampling criteria after collecting the data.
- Probability: Chance in a random universe. Usually given as a fraction, decimal (less than 1) or percentage. A 0.5 probability is a 50% chance or a 1/2 chance that an event will occur. Scientists and statisticians regard any event with a probability of 0.05, (5%, 1/20, one chance in twenty) to be unlikely to occur by chance and so use this level of probability to determine whether or not to reject the Null Hypothesis. Experimental scientists are generally sceptical about data (except their own). A 1 in 20 chance may be a bad bet, but it is one that does come off from time to time (about 1 time in 20) and so we are much more impressed by a probability of 1/100 (0.01, 1%) or even 1/1000 (0.001, 0.1%) and may even refer to such data as being extremely statistically significant.
Incidentally, accepting a 1/20 chance as "significant" implies that as many as 1 in 20 studies may be in error. This is a strong argument against "cherry picking" and in favour of formal meta-analysis
- Odds: Nearly the same as probability, but with what may be an important difference. Odds are the probability of an event occuring divided by the probability of it not occuring.
For example, open a page-a-day diary to any page at random. The probability that you open it to a page showing Thursday is 1/7 or .143, there being 7 days in a week. The odds of you opening the diary to a Thursday are the probability of the page showing Thursday divided by the probability that the page does NOT show Thursday. The probability of picking any other day is 6/7 or .857 (assuming that the diary has only pages showing days of the week). The odds of choosing Thursday are therefore 1/7 divided by 6/7..... which works out at 1/6. Algebraically, if the probability of an event is denoted by "p" then odds = p / (1-p). If it helps, when probabilities are small, for example 0.001 or 1/1000 then probability and odds will be almost the same because the probability of the event not occurring (in this case, 1 - 0.001 or 0.999) is be very close to 1. The effect of dividing by nearly 1 is almost the same as dividing by 1.
- Data types
The distinction between data types, and how you deal with them, is obvious at the extremes, for example you wouldn't try to average eye colours (what is the average of blue and brown), but it is possible to work statistically with such qualitative data simply by counting how often each type occurs (How many people have blue eyes). Some discrete data types, e.g. exam scores, come as numbers. One way or another your sample data must end up as numbers so that you can then interpret them using statistical tools.
- Continuous: data that you measure e.g. height, weight, intracellular calcium concentration etc.
- Discrete or discontinuous: data that you count. e.g. eye colour, petals on a flower etc.
- Simple Statistical tools
- Mean: (arithmetic mean, average): Add up all the items in your sample and divide by the number of items. The symbol for population mean is μ, the symbol for sample mean is . You don't need to know about geometric or harmonic means....(unless you develop a sudden interest in compound rates of change).
- Mode: The most commonly occuring value in a discrete data set. For example, the mode of (1,2,3,3,3,4) is 3. Any data set may contain more than one mode, for example the modes of (1,2,2,3,3,4) are 2 and 3. Mode is a useful concept in discrete data sets, but not necessarily in continuous data sets where every observation may be unique. It is possible to determine a mode for a continuous data set by grouping the observations (turn them into a discrete data set) for example the continuous data set (0.35, 0.34, 1.15, 1.55, 1.81, 2.11, 6.11) could be rounded and transformed into (0.5, 0.5, 1.5, 1.5, 1.5, 2.5, 6.5) which has a mode of 1.5. The question you must ask is not "is it possible to calculate the mode of this data set" but rather "is there any point in doing so".
- Median: The middle value of a data set arranged in numerical order. In an even numbered data set, the mean of the two middle values.
- Quartiles: The "median" of the two halves of the data set divided by the median. Rank the data then find the median then find the middle value of the lower half of the data (lower quartile) then find the median of the upper half of the data (upper quartile). The median is the same as the middle quartile.
- Null Hypothesis: (By-and-large) The hypothesis that two observations are not different. If you were testing a drug or treatment your Null Hypothesis would be that the drug or treatment had no effect on the condition. In statistical parlance, the simplest Null Hypothesis is that two samples belong to the same population. Wherever there is a Null Hypothesis, there is also an Alternative Hypothesis.... which usually boils down to whether the two samples belong to different populations. Statistical testing (see below) gives a probability that two (or more) samples belong to the same population. If your statistical test shows that the probability that two samples belong to the same population is 0.05 (5%) or less, then you get all excited because you can say that there is a significant difference between the samples i.e that the chances that they belong to the same population is less than 1/20 and you may therefore reject the Null Hypothesis. Just because your data are statistically significant does guarantee that your are right to accept or reject the null hypothesis, it means that you are likely to be right. If your data leads you to incorrectly reject the null hypothesis, this gives a false positive or a type I error.... e.g. the 1 time in 20 that you shouldn't have rejected the Null Hypothesis. A type II error is the other way around, a false negative.
The Normal Distribution (and friends)
- Normal Distribution: (Gaussian distribution) If you plot the frequency with which an observation occurs against the value of the observation and get a bell-shaped curve that centres on the mean of the sample then this is (probably) a normal distribution. Many, but not all, observations of biological (including clinical) phenomena are normally distributed. You may choose to believe that the reason that these data are normally distributed is a) magic or b) something to do with The Central Limit Theorem. Either explanation is equally useful in the context of these notes.
Standard Deviation: The most commonly used measure of the statistical dispersion of data. If standard deviation is small then the data are clustered closely around the mean. If standard deviation is large then the data are widely spread. It is harder to draw statistically interesting conclusions from samples or populations with a wide dispersal (big standard deviation). For normally distributed data, 68.26% of observations will occur within 1 standard deviation either side of the mean, 95.46% will occur within 2 standard deviations of the mean and 99.73% within 3.
The Z-statistic (aka Z-score, Z-value, normal score, standard score) is a measure of where a sample (or value) falls within a population. It is calculated by subtracting the population mean (μ) from the sample mean and dividing the result by the population standard deviation (σ). (-μ)/σ. In other words, how many standard deviations is the sample mean away from the population mean. If the answer is 2 (or more) then this sample is not representative of 95% of the population (remember, 95% of the data in a normally distributed population fall within 2 standard deviations of the population mean). In fact, a Z-score greater than 2 is usually taken as evidence that the sample does not belong to this population at all.
Calculating a Z-value does require you to know the population mean (μ) and population standard deviation (σ). If you have large samples (>50) then you can use the sample standard deviation (s or SD) as an approximation of the population standard deviation and get an estimate of the Z-statistic.
- Student's t distribution: Almost the same as the normal distribution. There are two interesting things about the t distribution. The useful one is that it is a better fit for small samples than is the normal distribution. (Why? More magic). At sample sizes >30 t and normal are the same. The other interesting thing about the t distribution is that it was invented by William Seally Gosset whilst working at the Guinness Brewery in Dublin. Before his work, statistical tests were designed by and used by biometricians who had hundreds of observations and no inclination to design tools specially to work with small samples. In experimental biology (e.g. brewing, cellular physiology etc), you only ever get small samples. Thank you Mr. Gosset. The t-distribution is of course the basis for the t-test.
Data should be presented in such a way as to make clear the observation and the degree of statistical certainty of the observation. there are various ways in which this may be achieved. Mean +/- SEM (n=number of observations) is one common way. Odds Ratios and confidence intervals are another. The important thing is to use the most appropriate technique for the data.
- Confidence Interval: Usually, the 95% confidence interval.... The interval between two numbers in which you are 95% confident that the mean (or other appropriate statistic) lies. Data may be writen 0.33 (0.13 - 0.82) i.e. mean (mean-CI - mean+CI). This makes data very easy to interpret.... Are 0.33 (0.13 - 0.82) and 1.68 (0.87 - 3.27) likely to be the means of samples from the same population? Answer no. The 95% confidence intervals of the two means do not overlap, therefore we can be at least 95% certain that these two means are significantly different. Confidence Intervals are calculated using the t or normal distribution as appropriate. Confidence Intervals are a natural accompaniment to Odds Ratios.... As may be seen in the paper about dental treatment and infectious endocarditis.
- Odds Ratio: is the ratio of the odds of an event happening in one group compared to the odds of it occuring in another. Odds ratios may used to assess the efectiveness of medical treatment. Suppose you have a wonder drug that stops people developing a particular sort of cancer. Find two samples of people with an equal chance of contracting this cancer. Treat one group with the drug and the other with a placebo. Observe the fraction that develop disease in each group (the observed chance or odds of contracting the disease) and then divide the odds of the test group by that of the control (placebo) group. An Odds Ratio of 1 indicates that the treatment had no effect. An Odds Ratio (significantly) less than 1 indicates that the treatment reduced the chance of contracting the disease (hurrah). An Odds Ratio greater than 1 indicates that the treatment made things worse. Odds Ratios are often used to assess risk factors, for example those relating to dental treatment and infectious endocarditis.
- Standard Error (of the mean): (SE, SEM) can be a bit confusing, but it is much beloved by experimental scientists because it is easy to calculate and it is a value that you can make sense of on a graph. What it really represents, the standard deviation of the sample mean, is harder to get your head round. If you sample a population, you can calculate a sample mean from your observations. If you did this lots of times you would obtain lots of sample means.... If you then made a frequency plot of the sample means, it would be normally distributed.... with a Standard Deviation estimated by sample Standard Error of the mean. In other words, The Standard Error of the Mean is an estimate of how well the sample mean reflects the population mean. Mean +/- SEM (n) is commonly used to present data from laboratory studies (See the results section of this paper on Sjogren's syndrome).
Mean +/- SEM measured under control conditions and in the presence of test substances 1 and 2. The three data sets contained 23, 10 & 14 observations repectively. Test substance 2 caused a statistically significant (P<0.05 determined using a two-tailed Student's t test, indicated by *) reduction in the test mean compared to that of the control condition.
|It doesn't really matter whether you understand this or not, you do need to know that Standard Error is an appropriate statistic to use when reporting experimental results in biology and medicine. Whenever you look a chart showing SEM, if the error bars overlap then there is NO significantly difference between them. If there is a gap between the error bars of the sample means equal to the combined size of the error bars then they may be significantly different. To demonstrate a statistical difference between two means, an appropriate statistical test, such as a t-test must be applied to the data. The results of a statistical test are often indicated on a chart using "*": one "*" indicates a significant difference at the 5% level; "**" indicates a significant difference at the 1% level and so on. Information on the level of significance (compared to what) and the statistical test should always be included in a figure legend.|
- t-test: The t-statistic is usually calculated from two samples of normally distributed data. The t-statistic and the number of observations may be used together to calculate the probability that the samples belong to the same population. In other words, the t-test tells you whether there is a statistically significant difference between your calculated means. If there is then, you may reject the Null Hypothesis.
t-tests come in a variety of flavours
I have deliberately included an example where it is not clear which test should be applied to the data. This is part of the fun. It may be very difficult to know which is the best statistical analysis for any given set of data. The golden rule is "keep it simple". A definitive answer to the question "which test should I use" usually requires input from a statistician. To be absolutely sure may require input from several statisticians. On the other hand, if you have good data sets that show clear effects (or clearly show a lack of effect) any "appropriate" statistical test will give essentially the same answer (Please don't tell any statisticians that I said this). Technical note: The t-test (and most other statistical tests) only work with samples with the same Standard Deviation (equivalent degree of spread). Therefore, you should test your data (perhaps using an F-test) first to see whether it is appropriate to perform a t-test.
- 2-sample t-test: The bog-standard t-test. You have two independent sets of observations obtained under different conditions. The t-test will tell you the probability that the means of the observations are the same. Either way, you should be able to draw conclusions about the effects of the "different conditions".
- 2-sample t-test with paired data: As above, except that one set of data is dependent on the other. In other words, the data come in pairs. For example, data before and after treatment.
- 1-sample t-test: Compares a sample mean to the population mean. Offers an alternative way of analysing before and after treatment data. Express the "after" data as a % of the "before" data and then apply a 1-sample t-test to a population mean of 100%. The 1-sample t-test is very similar to the Z-test, except that you don't need to know the population mean and standard deviation. With large samples the t-value and the z-value will be practically identical.
- (Pearson's) χ2 (Chi squared) test: The t-test only works for data that are normally distributed. The χ2 test is the simplest statistical test that is not based on the normal distribution. χ2 will work on normally distributed data, the test makes no assumptions about the frequency distribution of the data. This branch of statistics is called non-parametric statistics. χ2 is most commonly used to determine how well data fit a particular model. For example, you can make a prediction about the frequency with which any one number on a 6-sided dice should occur (1/6). This is your model. You could then test the model by rolling the dice and checking your observations using χ2. Applying the χ2 test return a χ2-statistic, analagous to the t-statistic. You can use the χ2-statistic to calculate a probability value which you can then use to accept or reject the Null Hypothesis.
- Mann-Whitney test: The closest non-parametric alternative to a 2-sample t-test. May be used for data not normally distributed. It is no better than the t-test in dealing with 2 samples with very different Standard Deviations. Applying the Mann-Whitney test return a U-statistic, analagous to the t-statistic. You can use the U-statistic to calculate a probability value which you can then use to accept or reject the Null Hypothesis.
How to perform statistical calculations and tests
If you are really interested in what underlies this brief run through elementary statistics then follow any of the links into Wikipedia and go from there. The people writing these articles know far more about statistical processes than I do. So far as calculating Standard Deviation etc. from data..... look closely at your calculator.... the σ and σ-1 buttons will calculate Standard Deviation based on normal or t distributions respectively. Better yet, use MS Excel (the spreadsheet...). Enter data into (for example) column A and then use the basic statistical tools. To calculate the mean of data contained in rows 1-20 enter "=average(A1:A20)" into any empty cell. Similarly "=stdev(A1:A20)" returns standard deviation and "=count(A1:A20)" returns the number of observations..... Always start a formula in Excel with "=". Excel will calculate (some) probabilities for you if you ask it nicely. A simpler alternative are the on-line statistical testing packages. The one offered by GraphPad is particularly good. If all you want is a simple t-test then try my home-grown software. I've compiled the most commonly used formulae in a simple spreadsheet.
Laboratory based experimental biologists (and physicists) are suspicious of complex statistics because a) we don't understand them and b), to paraphrase Ernest Rutherford, "If your experiment needs (complex) statistics, then you ought to have done a better experiment". Non laboratory based disciplines (most social sciences, epidemiologists etc.) depend more heavily on statistical analyses because they can't easily perform experiments.
Learning support documents, such as this one, can only get better with feedback from users. Please give feedback. Positive or negative.