Tests of Normality

· Q 1. I want to get a rough idea of whether the data for my continuous variable follow a normal distribution or are skewed. How can I do this using SPSS?

A. Try plotting a histogram and fitting a Normal curve to your data on the same plot. The instructions for this are available here: Creating histograms in SPSS.

However, if you are using this method as a means of helping you decide which is the right test for comparing two or more groups, then it would be best if you proceeded to Q 4., below and the accompanying solution.

· Q 2. What are parametric data?

A. Within the context of hypothesis testing, parametric data may be understood as data which follows a known distribution. Usually, the known distribution is the Normal distribution, which is why we often hear of the dichotomy between tests for Normally distributed data and tests for non-parametric data (or non-parametric tests). Examples of this dichotomy are provided in the two flowcharts on the WordPress page Some Useful Flowcharts of the current WordPress site.  These flowcharts are designed to assist you in choosing the right test(s) to address your study questions based on the type of data you have and your underlying hypotheses.

· Q 3. Why are tests of Normality so essential to parametric testing?

A. Often, when we perform a hypothesis test we are trying to refute a statement of no association or difference, called the Null Hypothesis. For example, if we were comparing two means for two samples, we would start with the null hypothesis that the means for the populations from which these samples were obtained are actually the same.

We might take this approach, for example, when comparing pain levels for two different groups of patients following use of a particular analgesic, provided the data are on a measurement scale (e.g. from 1 to 60) which is not limited to only a few categories.

In order to refute the null hypothesis we would need to have obtained a test statistic with a value which would be sufficiently extreme. But what is a test statistic, you may well ask?

Well, first of all each test statistic is dependent on a choice of test and therefore if we are going to draw the correct conclusion for our data, we had better be sure that we have chosen the correct hypothesis test.

For the example I have mentioned above, you would therefore use a t-test only if it was correct to assume that the two samples were taken from populations which were approximately Normally distributed. There are many ways of testing this assumption.

It is true that sometimes we can transform the data using logs or other functions to force the data to be Normally distributed but often this does not work. If this were the case with our example above, but the two samples did nevertheless come from populations with similar distributions, then you would perform a different test called the Mann-Whitney U-test.

Now back to the test statistic. A test statistic is a statistic which is calculated using the sample data and for which the value decides whether or not to reject the null hypothesis. The formula for calculating the test statistic depends on the distributions of the data from which our samples were taken. There is one test statistic for the independent samples t-test and another for the Mann-Whitney U-test when simply comparing for a difference between our samples. It is therefore important to check the distributions from which these samples were taken before calculating the test statistic.

To see an example of the application of a test statistic in practice, have a look at  Chapter 7. The t tests of the e-book
Statistics at Square One
a
nd consider in particular how the test statistics, t, are calculated for different types of t-test.

However, I cannot stress too highly how useful the text Medical Statistics at a Glance would be in helping you get to grips with these essentials.  If you are registered with the University of Edinburgh, you can consult the electronic version of this book via the University’s library discovery system, DiscoverEd.
Here are some reference details:

    • Title: Medical statistics at a glance
    • Author: Aviva Petrie
    • Caroline Sabin
    • Publisher: Hoboken : Wiley
    • Publication Date: 2013
  • Edition: Third edition

· Q 4. I am considering comparing time to hospital discharge data for groups of patients defined according to how long they waited to see the consultant. Where can I find more comprehensive information on tests of Normality for each group (including advice on more precise tests).

A. Please refer to the resource

Testing data for Normality to assist in deciding whether to carry out a parametric or non-parametric test

 NB. If you wish to test for Normality for one or more variables but these variables are in separate columns because the data are related across columns, you can still use the instructions in the tutorial. However, you will not have to enter a variable into the SPSS dialogue box for groups (the cell salvage variable in the example provided). In simplicity, this means you don’t have to enter anything in the box Columns when creating a histrogram or in the box Factor list when running the Shapiro Wilks test. You should deal with each variable separately when creating histrograms. However, on running the Shapiro Wilks test, you can enter several variables into the box Dependent List. By doing so, you can save yourself a little time. In this context, you should refer to slide 53 in the first instance.

· Q 5. I have encountered a grey area in at least one of the following senses: a) the results of the Shapiro-Wilks test are not consistent with those which I would have anticipated on the basis of examination of my histograms; b) according to my findings, it appears that one of my groups is Normally distributed whilst the other is not. Are there any further tests which I could perform to help me develop cumulative evidence to decide for or against Normality?

A. Yes, there are various possible approaches which you can take here. The first of these is to examine the box-plots which were generated when you carried out the Shapiro-Wilks test. Are there any extreme outliers? If so, are they sufficiently reliable to keep them in? If not, try removing them and re-generating your results. Please refer to the resource on box-plots to see how extreme outliers (denoted by asterisks rather than ‘o’s) can be identified. In a box-plot, the ‘o’s may be regarded as fairly harmless outliers and it is best to try to avoid removing them. The same resource also illustrates how to use box-plots as a crude preliminary test for assessing the Normality of your data.

You will also have generated some statistics for kurtosis and skewness (definitions provided here). Divide each of these statistics by their standard error (which you will also have generated) to obtain an absolute kurtosis and skewness. The journal article Statistical notes for clinical researchers: assessing normal distribution (2) using skewness and kurtosis will guide you in interpreting these absolute values as a means of assessing Normality.

Q-Q plots – You will have also generated quantile-quantile plots, abbreviated Q-Q plots.  These are highly recommended in helping you make your final decision when you encounter a gray area.

Q-Q plots are plots in which quantiles for the observed values are provided along the x-axis. The quantiles are formed by ordering the observed values in increasing order. The values along the y-axis are z-scores. The y-co-ordinates for the plotted points (x,y) represent the quantlies for the z-scores derived from the ranks for the observed values according to the formula z-score = (k – 0.5)/n (where ‘k’ denotes the rank of the observed value under the above ordering and ‘n’ denotes the sample size). If the points lie approximately on the 450 straight line provided in this plot, then the data approximate to Normality; otherwise, the original sample data is non-Normal. The plot may also be used to identify outliers.

Please take time out to consider Yearsley’s examples on interpreting Q-Q plots and a cherry-picked video cross-examining Q-Q plots alongside histograms  to train yourself in making the correct judgements.

. Q 7. I would like to test some transformations on my ESR data in an attempt to see if I can Normalize it. The reason for this is that for all tests of Normality I have considered, one of my patient groups (those with septic arthritis) has Normal data but the other (those without septic arthritis) has skewed data. I would like to try to make the data more uniform in distribution across the two groups. Can you suggest suitable functions for this purpose?

A. Finding a suitable function to transform your data in such circumstances can be challenging and you should allow plenty of time for exploration. A good first start is to use the natural logarithm function (ln) or the function for the logarithm to the base 10. Have a look at the PowerPoint presentation Computing Transformations to obtain some highly relevant information on techniques for transforming data in SPSS. Also the resource Tips for Recognizing and Transforming Non-normal Data is a useful reference for considering more of the relevant theory.

· Q. 8. Should I include the full results of my tests of Normality in my write-up?

A. No, this is not usually appropriate as these tests involve exploratory analysis of your data to help you decide on the correct hypothesis test(s). It would be a good idea, however, to provide an indication in the methods section of those tests of Normality which you used and in what contexts, exactly..

View Page

CC BY-NC-ND 4.0 Tests of Normality by Margaret MacDougall is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

The WordPress site for supporting undergraduate medical student learning in statistics for short research projects

  • If you are visiting StatsforMedics for the first time, welcome!
  • Please take time to visit the page SCOPE OF SITE (see menu bar, below) for advice on how to make best use of the site and how to contact me.
  • University of Edinburgh undergraduate medical students: feel free to contact me if you need further assistance with your *curricular* activities.