Hypothesis Tests for Comparing Two Groups of Measurement or Ordinal Data

· Q. 1. I am aware that there are different kinds of t-test. How can I familiarize myself with these and gain some understanding as to which of these tests, if any, might be right for my study?

A. A useful primer for this purpose is the Wikipedia page Student’s t-test. You should also be aware, however, of the need to test for Normality when deciding on the right test for your data. At this site you can learn a little history about these tests, including why any one of them is occasionally referred to as the student’s t-test.f

You are also recommended to have a good look through the StatsforMedics WordPress page TESTS OF NORMALITY (paying particular attention to the solutions to Q.’s 3 and 4).

· Q. 2. I need to compare the national average depression score in elderly patients (n = 3000+) with the depression score of my small group of patients (n < 16). Is there a specific test I can use to see if the patients collected in my data are significantly more depressed, considering the the small sample size I have?

A. It is best not to compare the reference population with your sample as though they were two samples. Instead, it hwould make sense to perform a one-sample t-test with your sample data prepared in a SPSS spreadsheet. I would recommend that you don’t decide in advance that you are looking for an increase in mean, but that you perform a two-sided (or, two-tailed) test, which just tests for a difference. This is a tougher test and less susceptible to picking up a difference due to chance. (You can do a one-tailed test, however, if (prior to seeing your sample mean) you had every reason to believe it would be higher than the population mean and that to believe anything else would not have made sense.) To further enhance your findings, you should consider constructing a confidence interval for the difference between your sample mean and the population mean.

In carrying out the above steps, you are assuming that your data already satisfy the Normality conditions for the one-sample t-test. Have you checked that they do?

All of these areas are covered very well by means of the resource One sample t-test. (In the table on p. 3, ‘Lower’ and ‘Upper’ refer to the bounds of the above confidence interval. The results can be expressed as “the mean difference (sample mean – population mean) was 26.86 (95% CI: (11.27, 42.46), p = 0.002))”.

· Q 3. For a population of neonatal women, I wish to test whether there is a significant difference in babies’ birth weights across the two cohorts for which mothers are aged below 35 and mothers are aged 35 or over. Which test should I use and can I perform this test using SPSS?

A. Provided that you can verify that the two groups are approximately Normally distributed or are both of size at least 30, you should perform an unpaired (or, independent) samples t-test. This test is designed to investigate whether the arithmetic mean difference between your two groups is statistically significant. The procedure for performing this test with SPSS is explained in detail under SPSS TUTORIAL: INDEPENDENT SAMPLES T-TEST. For completeness, however, you should use this resource in conjunction with the one provided under the solution to Q. 4 below. The outcome of Levene’s test determines which row of your table of output you should be referring to in obtaining the results of the t-test.

If you are intending to work with two independent groups of data, you can get to grips with the instructions in this solution by applying them to the practise data in the spreadsheets Work6 and Prac6.

In this case, the first column is your factor or group column and you can choose any of the remaining columns to represent your dependent or measurement variable. You can think of the data as representing time to purchase (in days) for individual cars according as to whether or not they are manual or automatic. The groups are independent, as the manual and automatic cars are to be thought of as unrelated.

The idea is to test for a difference in time to purchase when advertising is involved.

You should also have a look at the solution to Q. 4 below for a more complete picture of what you need to know in performing the independent samples t-test and reporting 95% CIs for the mean difference.

· Q 4. I have carried out an independent samples t-test using SPSS and discovered that the results for Levene’s test (for equality of variances) are contained in the same table as the t-test results. There are a lot of statistics in the table and I do not know which one to select to draw the appropriate conclusions for my t-test. Where can I find an illustration of how such a table is used in practise?

A. Such an illustration is provided under in the resource SPSS TUTORIAL: INDEPENDENT SAMPLES T-TEST provided in the solution to Q. 3.

For your information. if, having consulted this resource, you discover that you cannot assume equal variances for the two groups you are comparing (although you have already checked that both groups are either a) Normally distributed or b) contain measurement data and are each of size at least 30), then you may wish to consider Welch’s t-test instead as a test for comparing the means for the two groups. Welch’s t-test is just a special version of the t-test which can be used when there is statistical evidence for a true difference in variances for the data across the two groups you wish to compare but the other requirements for an independent samples t-test are satisfied. The results for Welch’s t-test are provided in the second row of output for the t-test within the row entitled Equal variances not assumed.

The 95% confidence interval for the mean difference and how to interpret it

It is also important to include a 95% confidence interval (CI) for the mean difference. If reporting on the results for the example at the above link, you might wish to say something like ‘the mean difference in scores (male – females) was 43.00 (95%CI (18.25, 67.75))’. The lower and upper bounds contained within this CI were extracted from the SPSS table Independent Samples Test by referring to the section 95% Confidence Interval of the Difference within this table. In this case, the CI tells us that we can be 95% certain that the true mean difference lies somewhere between 18.25 and 67.75, reflecting a rather wide CI. Thus, in order to gain a better hold on the difference in scores between males in females, it is recommended that in a future study, the sample size should be increased considerably. (The larger the sample size, the narrower the CI.) Also, when seeking to further interpret a CI for the difference between two means, it is useful to check whether the CI includes zero (the value representative of no difference). Zero would be included if the lower and upper bounds of the CI differed in sign, and its presence in this example would have indicated that we cannot be 95% certain which of males and females perform better on average. The inclusion of zero in a 95% CI is coincident with the p-value for the independent samples t-test being greater than 0.05, indicating a lack of statistical significance for the mean difference. In the example provided, p > 0.05 and the 95% CI excludes zero.

· Q. 5. My project focuses primarily on the quantity of radiation that children are exposed to when undergoing surgery for scoliosis. There are two surgical techniques- free hand and image guidance. My study hypothesis is that the free hand technique exposes children to less radiation than the image guidance technique. I have chosen my hypothesis to be one-sided as based on my knowledge of these techniques it is impossible for the discrepancy to be in the opposite direction. However, I would like to assess both the clinical and the statistical significance of the study results. I understand that from the Central Limit theorem, I can progress to the independent samples t-test. Additionally, however, I would like advice on how I should adapt the advice on useof SPSS provided under the solution to Q. 4, above to reflect the fact that I am performing a one-tailed test.

A. The solution is straightforward. Just follow the instructions while noting the following changes:

On selecting the recommended dialogue box options, choose a confidence level of 90% rather than 95%. Using the button Options to achieve this.
Once you have selected the appropriate p-value for a two-tailed test using the advice in the solution to Q. 4, half this p-value to get the p-value you require.
In using the upper limit you have generated to obtain a CI, note that this is a 95% limit for a one-tailed test. You are choosing the upper limit based on the direction of your study hypothesis, although you should check that the direction of the subtraction of the means by SPSS is consistent with this (namely ‘mean for free hand – mean for image guidance’), in which case you are expecting this difference to be negative. Note that it is very easy in SPSS to control the direction in which the mean differences are subtracted. When using the button ‘Define the groups’ within the dialogue for the independent samples t-test, enter the codes for your group categories in the order that you want the corresponding groups to be calculated. For example, if you enter ‘2’ then ‘1’, the mean for the group coded ‘1’ will be subtracted from the mean for the group coded ‘2’ and the CI limits calculated accordingly.
You should examine the mean difference in radiation levels and decide whether you think the absolute value of this difference is clinically significant before commenting on statistical significance based on your p-value.
You should also comment on a) the value below which the true mean difference lies with 95% certainty (the right-hand CI limit) in terms of your view on clinical significance and b) how helpful it is to know that the mean difference is likely to be below this particular value. (You can discard the lower limit.)

· Q. 6. How does Levene’s test compare with Bartlett’s test when testing for homogeneity of variance?

A. To find out, have a look here.

· Q 7. I don’t have access to SPSS but would like to compare mean haemoglobin levels for males and females. Having performed tests of Normality, I recognize that the appropriate test is the independent samples t-test. How can I find out more about this test and can I use MS Excel to perform it? I am using Office 2013.

A. In previous versions of MS Office, the appropriate procedure to have followed would have been to install the MS Excel add-in Analysis ToolPak – VBA (available under the Tools menu). You could then use MS Excel to perform the above tests (also known as the Student’s t-test). The following guide provides some very useful advice on how to proceed:

Two Sample (independent groups) t-test Using Microsoft Excel.

If you are using MS Office 2013, you will experience a different menu layout. In particular, under the menu Home, there is a tab Add-ins with a drop-down menu enabling you to search the Excel Apps Store. Using this facility, search for the free App XLMiner Analysis ToolPak. Once you have installed this App, you will see that there is a considerable list of statistical procedures available for you to choose from. The above guide for the old App provides very sound statistical advice can be carried over to the new App, particularly in terms of testing for equality of variances before deciding between one of two independent samples t-tests for your data.

Among the procedures listed for XLMiner Analysis ToolPak, you will find ANOVA: Single Factor. The method corresponding to this title is useful if , for example, you have three (not just two) treatment groups and you wish to assess whether variation in haemoglobin levels can be explained in terms of treatment received.

· Q. 8. For my patient cohort (n = 229), I would like to compare the mean for SF-36 scores with that obtained for Normative data. Is there a suitable test I can employ? (I should add, that I do not have raw data for the normative group, and am concerned that this may be an issue.)

A. The appropriate test to employ is the independent samples t-test. Regarding not having raw data for the normative group, don’t worry as the statistical package Minitab can assist here by allowing you to work with summary data rather than raw data. You can find out all you need to know by accessing Minitab and following the instructions below.

Go the menu Stat and select the sequence of options Basic Statistics –> 2 Sample-t...
Tick Summarized data and enter your summary data as the corresponding boxes suggest.
Use the button Help in the same dialogue box to find out more about how to interpret your output by comparison with the example provided.

It is usual practice to assume a significance level of 0.05. In interpreting your output, please note that evidence for a true difference in means is represented by a p-value which is less than or equal to 0.05. If the p-value is greater than 0.05 there is insufficient evidence for a true difference between the two means.

· Q 9. I wish to perform an independent samples t-test. Is there a sample size calculation which I can used to estimate the minimum required sample size?

A. A good sample size calculator for this purpose can be accessed from Sample size calculator for comparing two means (independent samples) on selecting the option Average, Two Sample from the left-hand menu.

However, before considering performing a sample size calculation, please very carefully read all of the information provided under the section General Advice within the resource Sample size calculations. This will help you ensure that you are fully aware of the conditions under which such a calculation makes sense and what estimates you need to be able to provide in order for this calculation to be possible.

Also, if you are using data from a previous study to inform your sample size calculation, please be careful to check which version of the t-test applies for these data (see Q. 5, above). Calculators or software for performing sample size calculations in connection with Welch’s t-test are not standardly available and you may need to request specialist assistance from a statistician.

· Q 10. For my continuous variable, my two sample groups exhibit a considerable degree of non-Normality and at least one of my groups is of size less than 30. Are there alternatives to the unpaired samples t-test which I can use to test for a significant difference between the two population groups?

A. Here are some conditions and corresponding options for you:

if the sample data for either of the two groups are highly skewed to the right or are exponentially distributed, you should try a log transformation of these data (either to the base e or the base 10);
if the transformed data are Normally distributed, you should perform an unpaired (or, independent) samples t-test using these data but be careful with the interpretation of the results;
in taking the anti-log of your mean difference and of the limits of your confidence interval for the mean difference, you will arrive at a ratio of means – that is, the interpretation of your findings in terms of the scale of your original data (not the log-transformed data) is according to a ratio of means, not a difference of means, as is explained in more details in the accesible BMJ paper Log transformation of data;
otherwise, provided your two groups have similar distributions, you could consider the Mann-Whitney U-test.

Details of how to perform the Mann-Whitney U-test using SPSS
Please note that in your output, you should look for the p-value provided under the header Asymp. Sig. (2-tailed) if you have a large sample size with groups of roughly equal sizes. The p-value provided under the header Exact Sig. (2-sided) is more appropriate when the groups sizes for the groups you are comparing are markedly different or the sum of the two groups sizes is small. In order to obtain this second p-value you need to click the button ‘Exact’ in the corresponding dialogue box when running the instructions for the Mann-Whitney U-test.

Your p-value will give an indication of whether there is sufficient evidence (p less than or equal to 0.05) for a true difference between the two groups you are comparing.

In interpreting the direction of any difference between the two groups, it is helpful to compare the mean ranks generated from the output. Note that, contrary to a traditional rumour, it is not safe to assume that the Mann-Whitney U-test is a test of a difference in medians just as the independent samples t-test is a test of a difference in means. For the Mann-Whitney U-test to be a test of a difference in medians, there is a need for the frequency distributions for the two groups to be so similar that one distributions can simply be translated on to the other with almost perfect overlap. It is in fact possible to produce data which illustrates that two groups can have identical medians but on comparison via the Mann-Whitney U-test, yield statistical significance!

In terms of obtaining an appropriate estimate of effect size to represent the extent to which your two groups differ, Rosenthal’s s r is a useful choice.

The square of this statistic also has an interpretation which is easier to communicate to non-statisticians than some other effect size estimates for non-parametric data. In particular, it can help us understand to what extent the variability in the data over which the dependent variable ranges can be explained in terms of the group variable.

Here is a comprehensive tutorial to support better understanding and application of the Mann-Whitney U test, together with sound reporting of corresponding output from SPSS:

The Mann-Whitney U test: Interpretation of the null hypothesis and findings based on a real-life scenario

(Please allow up to five seconds for the tutorial to load.)

· Q 11. Is it possible to do a power calculation to estimate the required sample size before applying the Mann-Whitney U-test?

A. Yes. Have a look at:

Estimating sample sizes for binary, ordered categorical, and continuous outcomes in two group comparisons

As explained, the equation you require is equation 6 in the appendix of the article. However, you should read the article very carefully first to get a handle on the notation!

· Q 12. I wish to compare blood pressure readings before and after exercise. Which test should I use and can I perform this test using SPSS?

A. Provided that you can verify the condition that the two groups are approximately Normally distributed or are both of size at least 30, you should perform a paired samples t-test. This test is designed to investigate whether the arithmetic mean difference between your two groups is statistically significant.

Details of how to perform this test using SPSS are provided under Paired Sample t-Test: Assessing Differences Between Correlated Group Means (Look out for the button ‘Next’ at the bottom of the page.) If you would like to view the complementary videos at the above link, you may find the instructions How to unblock Flash Player when using Google Chrome helpful.

You may also find the instruction on the Laerd Statistics page for the paired samples t-test useful. Note that in this resource, the paired samples t-test is referred to as the dependent t-test. This is slightly unusual nomenclature, but don’t be put off by this!

For advice on obtaining and presenting a 95% CI for the mean difference, refer to the solution to Q. 4, above, where it is straightforward to form the required analogies.

(With the support of an illustrative dataset, the underlying theory, including on how to create a confidence interval for the sample mean difference and how to estimate the required sample size for your test for a given level of statistical power, may be found at paired samples .)

Alternatively, you may be looking for tests for pairwise comparisons to follow on from applying the Friedman test in a within-subjects ANOVA or your may be looking for an alternative to the independent samples t-test having found that your data do not satisfy the requirements for Normality. In either case, you should use

a) the Wilcoxon matched pairs signed ranks test provided each of your distributions appears symmetric in shape (check the individual histograms)

b) the sign test if it is not clear that individual distributions are each symmetric in shape.

Symmetry is rare with modest sample sizes. For your interest, you may wish to refer to A good example of symmetric data displayed via a histogram. While, as you well know, symmetry can appear in many alternative shapes, the above example, should guide you in seeing at a glance whether your data are symmetric for any one group! If you are in doubt, however, you may like to consider creating a symmetry plot using the software package Minitab. You can find advice on this simple procedure by using the Minitab menu Stat, from which you should select the sequence Quality Tools > Symmetry Plot of menu options. Now, select the button Help.

The menus
how to example data

in the resultant information box guide you through the relevant steps for creating and interpreting such a plot.

Whichever route you choose to assessing symmetry, some level of subjective judgement is involved and you should take responsibility for the final decision. For a thorough explanation of the Sign test in language which is not highly complex, please consult

Taking too long?

Reload document

Open in new tab

Instructions on how to perform the Wilcoxon matched pairs signed ranks test using SPSS may be accessed at Wilcoxon Signed-Rank test using SPSS. The corresponding instructions for the Sign test are entirely similar; just select ‘Sign’ instead of ‘Wilcoxon’ for your test type!

Please bear in mind that if you are carrying out multiple pairwise comparisons, you will need to apply a correction to the p-values to mitigate the effect of achieving statistical significance due to chance. The best know of these corrections is the Bonferroni correction. For a total of k multiple comparisons, this correction involves multiplying each p-value obtained from such comparisons by k.

The choice for the value of k should be made at the study design stage when you are clear why it makes sense to choose particular multiple comparisons to satisfy your study aims and objectives, not retrospectively through trying to decide how high k can be while still preserving statistical significance!

The Bonferroni correction is rather conservative, though. You can read more about this using the resource Adjusted p-values. The same resource provides advice on a more liberal correction – the Šidák correction, which tends to be the preferred choice, As you should see from the resource, the relevant calculations for adjusting each p-value can easily be performed using arithmetic functions in Excel without resorting to programming in SPSS. When reporting the results of hypothesis testing involving multiple comparisons, it is good practise to present a table including the original and adjusted p-values in separate columns.

If you are intending to work with paired data, you can get to grips with the instructions in this resource by applying them to the practise data in the spreadsheets Work6 and Prac6. In this case, only the last two columns are of relevance. You can think of them as representing time to purchase (in days) of individual cars with and without advertising. The data might be thought of as paired if identical cars are for sale prior to and during advertising. The idea is to test for a difference in time to purchase when advertising is involved.

· Q 13. How can I estimate the required sample size for a paired two samples t-test?

A. The statistical package Minitab is a helpful resource here. Please go to the menu Stat and select the following options:

Power and Sample Size –> 1 – sample t

A dialogue box will then appear. Select the button

“Help”

within this dialogue box and then the necessary guidance will appear.

However, before considering performing a Power Calculation, please very carefully read all of the information provided on the StatsforMedics WordPress page SAMPLE SIZE CALCULATIONS to ensure that you are fully aware of the conditions under which such a calculation makes sense and what estimates you need to be able to provide in order for this calculation to be possible.

Hypothesis Tests for Comparing Two Groups of Measurement or Ordinal Data by Margaret MacDougall is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

StatsforMedics