Hypothesis Tests for Categorical Data

· Q 1. a) I wish to compare the relative proportions of individuals falling into two categories (males and female, say) of a nominal variable (gender, say) so as to ensure that these proportions don’t differ significantly from what would be expected merely due to chance. Which statistical test should I perform?

A. The binomial test is the appropriate test for this kind of investigation and you can use the software package Minitab very conveniently for this purpose using summary data. Have a look at the solution to Q. 4 (especially the last paragraph) on the StatsforMedics WordPress page Confidence Intervals.

b) What happens when I have more than two categories (e.g. in relation to tumour morphology, the proportions of tumours which fall into the categories ‘diffuse’, ‘nodular’, ‘scattered’, and ‘central scar’) and I want to test for a difference in the proportions of tumours falling into each of these categories?

A. In such cases, the binomial test is generalizable to a test known as the chi-square goodness-of-fit test. If you have a similar question, you might find it useful to find out more about this test and read about its use with SPSS by consulting chi-square goodness-of-fit test: reference 1. This reference assume as a null hypothesis that the proportions of cases in each category are the same and that you have arranged your categories in a single column with each row representing an individual case (e.g. an individual patient case). In following, the instructions for SPSS, you should take the extra step of selecting Legacy Dialogs from the menu Analyze. In order to select the above test. The corresponding sequence of steps for choosing the test is therefore
Analyze–>Non-parametric Tests–>Legacy Dialogs–>Chi-square…. This is to reflect the requirements of the most recent version of SPSS.

What to do with data in summary form
If you only have your data in summary form (in terms of how many cases fall into each category), please refer to the resource Chi-Square Goodness-of-Fit Test . After laying out your data in the way recommended in this resource, make sure that you select the option Weight cases for the SPSS menu Data and then choose to weight cases by the variable ‘Frequency’ before you perform the analysis using the instructions provided. This brief step is missing from the resource and is provided here to avoid you needing to sign up for the “enhanced chi-square goodness-of-fit guide”.

c) Although, I have more than two categories, my null hypothesis is not of the form ‘the proportions in each category are equal’. Instead, I am carrying out a survey and a colleague has made an estimate of the proportions of individuals who will select a particular response to one of the questions in my questionnaire. I am more concerned therefore with checking whether the actual proportions are significantly different from the expected ones. What test is best in this case?

A. You can still use the chi-square goodness-of-fit test, but in this case, your null hypothesis is that the proportions of individuals falling into each category are just as expected. You need to explicitly enter your expected proportions when running the test. Clearly, if you discover a significant difference, there is evidence to suggest that the actual responses were different from those expected! To find out more, have a look at Chi-Square Goodness-of-Fit with SPSS.
In following, the instructions for SPSS, you should take the extra step of selecting Legacy Dialogs from the menu Analyze. In order to select the above test. The corresponding sequence of steps for choosing the test is therefore
Analyze–>Non-parametric Tests–>Legacy Dialogs–>Chi-square…. This is to reflect the requirements of the most recent version of SPSS.

. Q 2. I have two categorical variables, each of which has two or more categories. One variable ranges over attitudes in GP practice towards use of the Christo inventory (‘used with particular patient’, ‘used in the practice but not with particular patient’ or ‘never used in the practice’). The other variable ranges over the answers (‘yes’ or ‘no’) as to whether the particular patient has been tested for HIV and Hepatitis B. Is there a suitable hypothesis test to determine whether there is an association between the two variables?

A. Yes, you should try the chi-square test of association. A useful starting point in learning about this test is to refer to Topics 24 and 25 of the electronic version of the book Medical Statistics at a Glance.

If you are registered with the University of Edinburgh, you can consult the electronic version of this book via the University’s library discovery system, DiscoverEd.

Here are some reference details:

- Title: Medical statistics at a glance

- Author: Aviva Petrie

- Caroline Sabin

- Publisher: Hoboken : Wiley

- Publication Date: 2013

Edition: Third edition

NB!!

A comprehensive presentation is available which consolidates different categories of information on what you need to know about the chi-square test of association and related procedures.

This presentation is provided for efficiency and with your best interests in mind and combines background theory and practical steps in SPSS, while filling in gaps which appear in textbooks, thus saving you disappointments in the longer term! In using the presentation, you may find it useful to bear in mind that use of Fisher’s Exact test is not restricted to the 2 x 2 case, although strictly speaking, the extension of this test to the m x n case where at least one of integers m and n are of size greater than 2, is referrred to as the Fisher-Freeman-Halton Test.

The presentation relies on a syntax file to enable you to construct percentage stacked bar-charts using IBM SPSS.

Here is the link to the syntax :

Syntax for creating percentage stacked bar-chart using IBM SPSS 19.0 .

N.B. You will need to copy-paste this syntax into a new SPSS syntax file before following the instructions recommended below. When in SPSS, just select the menu sequence File –> New –> Syntax. You can then save the new syntax file from within SPSS. The extension attached to this file will be sps.

Instructions for editing and using the syntax file are provided within slides 30 – 33 of the presentation
The chi-square test of association, the percentage stacked bar chart, Fisher’s exact test, odds ratios and relative risks (PowerPoint version: designed to help you learn step by step)

To assist you in engaging effectively with the above resource, please use the accompanying sample data. This ought to build up your confidence as you replicate the findings from the worked examples and visualize the findings through doing the work yourself.

Handing summary data

Just in case, as is less common, you have your data arranged in summary form with the frequencies calculated in advance, the instructions for use with summary data are available HERE:

Obtaining relative risks and odds ratios using SPSS and a note on the chi-square test of association (PowerPoint version: designed to help you learn step by step)

· Q 3. I am testing for an association between current psychiatric diagnosis and adherence (‘yes’ or ‘no’) to ARV (antiretroviral) treatment. I have five categories of current psychiatric diagnosis (none, anxiety disorder, mood disorder, psychiatric disorder and other). With the support of the tutorial in the solution to Q. 2, I have found a significant association between psychiatric diagnosis and ARV treatment. I would now like to probe deeper by making comparisons for adherence across pairs of psychiatric diagnoses. However, I am struggling to identify how to manage my data to perform the relevant calculations and generate odds ratios and corresponding 95% confidence intervals in the sense explained in the above tutorial. How should I proceed?

A. You should decide in advance of your calculations which pairs of diagnostic groups you wish to compare. Do you wish to make all possible comparisons or does it make better sense for your study to focus only on certain ones? Your decision will vary according to the clinical scenario, which, of course, is not limited to the one described here. Once you have decided how many, you should take into consideration the importance of correcting for chance. Doing lots of comparisons to test for statistical significance can become a little like a fishing expedition where you need to catch a p-value that is less than or equal to 0.05! Therefore, it makes best sense to adjust your p-values from your pairwise comparisons. The best know correction for this purpose is the Bonferroni correction. This is a simple correction which you can do by hand or using Excel if you prefer. Here is what to do:

Carry out the pairwise comparisons you have decided on.
Count how many pairwise comparisons there were (n).
Multiply every p-value from your pairwise comparisons by n.
Report you original and Bonferroni- adjusted p-values for each comparison in a single table, if you like, so that the reader can gain a clear picture of your account of statistical significance.

Now for a little help with using SPSS. SPSS has a really handy piece of functionality, known as Select Cases, which allows you to filter two categories at a time. To find out how to use this functionality, just hop over to Q. 6 and the accompanying solution on the StatsforMedics page WORKING WITH SUBSETS OF THE ORIGINAL DATASET – FILTERING DATA. Once you have mastered filtering pairs of categories, you can use the tutorial in the solution to Q. 2 as recommended above together with steps 1. to 4., above.

· Q 4. When carrying out the chi-square test of association, I obtain a chi-square statistic and a p-value. Can these be interpreted graphically and would it be possible to see how the chi-square statistic is calculated?

A. There is a family of chi-square distributions. These are summarized graphically in the StatPrimer document Cross-Tabulated Counts and Independent Proportions. The p-value is the area under the curve to the right of the chi-square statistic. This area must be less than or equal to 0.05 to reject the null hypothesis. In the same file, you will discover just how easy it is to calculate the chi-square statistic and what it is really designed to do.

· Q 5. I have heard that there are two corrections to the chi-square test of association – Yates’s Correction and Fisher’s Exact test. How do I know which to apply, if any?

A. Yates’s continuity correction is a correction to the standard calculations for the chi-square test of association which is only applied where you have two categories for both variables. The correction takes into consideration the fact that while the p-value for your hypothesis test is derived from a Normal distribution, your data is very different from continuous (or, measurement) data. Find out more about how this correction is applied. As for Fisher’s Exact test, see Topics 24 and 25 of the electronic version of the book

Medical Statistics at a Glance.

If you are registered with the University of Edinburgh, you can consult the electronic version of this book via the University’s library discovery system, DiscoverEd.

Here are some reference details:

- Title: Medical statistics at a glance

- Author: Aviva Petrie

- Caroline Sabin

- Publisher: Hoboken : Wiley

- Publication Date: 2013

Edition: Third edition

· Q 6. Can you recommend a good reference for estimating the required sample size for the Chi-square test for association?

A. Have a look at the section on binary data within the following publication:

Campbell MJ, Julious SA, Altman DG (1995) Estimating sample sizes for binary, ordered categorical, and continuous outcomes in two group comparisons. BMJ, 311(7013):1145-8.
.

You may also find it helpful to consider using the sample size calculator for testing a difference between two proportions which is available via the statistical software package Minitab. From the menu Stat, select Power and Sample Size, followed by 2 Proportions… . The button ‘Help’ in the dialogue box is a useful reference point. However, it is particularly important for you to know that the calculation (which has a formula behind it) requires you to provide an estimate of the proportion (Baseline proportion (p2)) of cases you anticipate being in one of your groups, at least one value for the required statistical power for your test (under Power values) and at least one value (Comparison proportions (p1)) for the proportion that might occur in the remaining group. (After clicking ‘Help‘, choose the highlighted text ‘minimum difference’ to see an example in which Baseline proportion (p2) is chosen as 0.25 and a difference of 0.03 is assumed in either direction between the two groups. Comparison proportions (p1) is therefore chosen as being any one of 0.22 and 0.28.

Note also that when you first enter the dialogue box, you will find a button entitled ‘Options…’. On selecting this button, you will find that the default significance level is 0.05. This is the usual choice of value for statistical significance, and it is best not to change it without a good reason. Also under Alternative Hypothesis, the default choice is Not equal. This represents the assumption that you are carrying out a two-tailed test, according to which you are not clear about the direction of the difference in proportions between your two groups. This assumption explains why, for the above example, you would supply the above two proportions for Comparison proportions (p1). If you choose this option, then the sample size provided on running the calculation will refer to the minimum number of individuals required for each group.

Some important advice

As you can see, in order to perform a sample size calculation, you are expected to provide rather a lot of information.

Usually, such information is provided with hindsight through knowledge of related pilot studies. If you cannot provide sound estimates based on informed studies, please don’t try a guestimate; otherwise you may well be left wriggling later when you try to make sense of your study findings based on a required sample size which is completely off the mark.

· Q 7. I am considering various staging classification systems for Cancer of the liver. For each such system I wish to test for a linear trend relationship between mortality rates and cancer staging. Which test would you recommend?

A. Here you have an n x 2 structure in your data, where n > 2 and represents the number of stages in the staging system while ‘2’ is the number of mortality outcomes (corresponding to presence or absence of mortality). Having checked the word of caution under ‘NB’, below, you may therefore wish to consider the chi-square test of linear trend (also referred to as the chi-square test of linearity). This test is for a monotonic (relatively consistent upward or downward) trend in the percentage data across n categories, where n > 2, where for each of the n categories there are only two possible outcomes (e.g. death and survival). It is not the same test as the chi-square test of association which simply tests for an association between two variables . Details of the background to this test and how to perform it using SPSS are provided within the section A Correlational Approach of a more general resource on analysing categorical data An example is provided in the same section. To appreciate the analogy with the problem in the current FAQ, think of the event death as being categorized as 1 and the event survival as being categorized as 0. For the staging systems, each stage can be ranked 0, 1, 2, … according to the level of severity it represents.

N.B. If, on tabulation, your percentage data do not show evidence of a relatively consistent upward or downward trend (across your n categories), the chi-square test of linear trend is not appropriate for your data and through applying it, you run the risk of obtaining a meaningless and misleading result by violating a key assumption of the test.

. Q. 8. I am conducting a study within the field of medical education and having familiarized myself with the material under the solution to Q. 2, above, I recognize that the chi-square test of association may prove relevant in testing for an association between whether a student obtains a specified level of sleep and whether they pass their exam. However, I suspect that gender may also matter in determining the result of this investigation and therefore I wish to take a more sophisticated approach whereby I correct for gender as a confounding factor. Can you recommend how best to proceed?

A. Ultimately, you may wish to consider the Mantel-Haenszel procedure. To read up on the rationale behind this approach, please refer to Chapter 18 of the text Essential Medical Statistics.

If you are registered with the University of Edinburgh, you can consult the electronic version of this book via the University’s library discovery system, DiscoverEd. Two editions of this book are of interest in this respect.

Here are some reference details:

- Title: Essential medical statistics

- Author: Betty R. Kirkwood

- Publisher: Blackwell Pub.

- Publication Date: 2012

Edition: 3rd

- Title: Essential medical statistics

- Authors: Betty R. Kirkwood and Jonathan A. C Sterne

- Publisher: Malden, Mass. : Blackwell Science

- Publication Date: 2003

Edition: 2nd

You would be best advised in the first instance, however, to verify that you do not require to use Fisher’s Exact test (see solution to Q. 2, above) in testing for your initial association; otherwise, you would be spreading your data far too thinly in attempting to include a new variable to express combinations of categories. If Fisher’s Exact test is not required, then you ought to take a simple exploratory approach with your data by including gender as a layer variable under the usual dialogue box for the chi-square test of association. This will prompt SPSS to carry your chi-square test of association out separately for each gender, thus offering you some crude results indicating whether the presence or absence of evidence for an association differs depending on the gender. If at this stage, you find you are appealing to Fisher’s Exact test for one of the gender categories, this might be a good place to stop; otherwise, you can proceed to the Mantel-Haenszel methodology and the corresponding instructions for performing the required analyses in SPSS:

Introductory video on the Mantel-Haenszel procedure – recommended
More detailed video on the Mantel-Haenszel procedure – important content for gaining a better grasp of the test results from the Mantel-Hanszel procedure generated via SPSS; lookout, though: at start of video, presenter appears to hover over wrong p-value from table for 2×2 chi-square test of association (see solution to Q. 2, above for advice on use of Yates’s correction).
PowerPoint presentation with worked examples demonstrating how to interpret results generated from applying the Mantel-Haenszel procedure – recommended reading: Confounding, Effect Modification, and Stratification.

Please note that the instructions in the first video assume that your data are in summary form, they are also most useful in cases where your data are recorded in separate rows per patient. Just bear in mind that in the latter case, the step of weighting cases using the SPSS menu entitled ‘Data’ does not apply. Also, to add to the content of the tutorial, look out for the 95% confidence intervals for odds ratios in the output and interpret these appropriately. For this, you can proceed just as in the tutorial for the chi-square test of association (please check), except that here you have more than one odds ratio – one for each gender, which is not a big deal!

More generally, please note that it is a pre-requisite for use of the Mantel-Haenszel procedure that the initial chi-square test of association (prior to introducing a layer variable) involves a 2 x 2 case.

Paired or related data

· Q. 9. I would like to compare the proportions of individuals who obtained the correct answer for different questions but I wish to use the same cohort throughout. Can you recommend a test for comparing performance in one test versus performance in another?

A. The sign test is an appropriate test for comparing test performance either in terms of the scores obtained (continuous data) or in terms of whether the question was answered correctly or not (recorded as ‘yes’s’ and ‘no’s’ or ‘1’s and ‘0’s’, say). The second type of data (two-valued categorical data) is often referrred to as binary data. If your data is in binary form (consisting of two categories), as already indicated above, the sign test is still the right test for your data. Please consult Sign Test – Statistics Solutions for instructions on how to perform the sign test using SPSS.

N.B. While a p-value of less than or equal to 0.05 would normally provide evidence for a difference in performance across the two questions, if you choose to perform the sign test to make multiple pairwise comparisons, you need to adjust your p-values to correct for chance. One common correction for making this adjustment is known as the Bonferroni correction. This correction requires you to multiple each p-value by the total number of pairwise comparisons you have chosen to run. For example, if you compared performance in Q. 1 vs performance in Q. 2, performance in Q. 1 vs performance in Q. 3 and performance in Q. 2 vs performance in Q. 3, you would have carried out 3 pairwise comparisons. You would then need to multiply each of the 3 p-values you obtained from running the Sign test by 3. It is the adjusted p-values which you must then examine to see if they are less than or equal to 0.05. Unfortunately, SPSS does not compute corresponding confidence intervals for the difference in performance across two groups. Please consider reporting individual performance statistics for each of the questions compared (proportions in the case of binary data).If you have more than two tests to consider and you wish to compare performance across these tests to see if there is a difference overall, then you should use Cochran’s Q test. Clear details of this test and how to perform it using SPSS are available at the link Cochran’s Q test using SPSS.

· Q 10. I wish to compare the conclusions on aetiology of erectile dysfunction (ED) as obtained from GP diagnosis and by means of utilizing a Nocturnal Penile Tumescence Rigidity (NPTR) test. Possible conclusions are that aetiology is psychogenic or organic. Each patient will receive a GP diagnosis and an NPTR test result. It is hoped that the NPTR test will make a difference to clinicians’ approaches to ED patients and therefore I wish to test for a significant difference between the GP diagnoses and NPTR tests results overall for each patient. Which hypothesis test would you recommend?

A. Here, we need an alternative to the chi-square test that is suitable for paired data. The test you require is the McNemar test. Learn about this test and how to perform it in SPSS (Please note that the instructions in this resource for performing the McNemar test in SPSS assume that you have presented your data in summary form (see resource)). However, if you have laid out your data as one row per patient, please use the instructions provided at McNemar test (SPSS instructions for data which are not summarized).

By way of complementing the recommended steps in SPSS provided in the latter resource, please note that if you have a rather small sample size (less than 50, say), it is best to click the button ‘Exact’ and choose the option ‘Exact’ in the relevant dialogue box. This will enable you to generate a corrected p-value in your output (see ‘Exact Sig’). The above test applies in a variety of other cases where data are related. For example, you may wish to compare for presence or absence of lung comets at two different altitudes in relation to volunteers who participated in an expedition to the Bolivian Andes. Here you have the same subjects at each altitude and therefore the data are again related.

N.B.

1. If you have more than two categories to compare across two related groups, proceed as above but note that the test is now referred to correctly in the output as the McNemar-Bowker test.

2. The McNemar and McNemar-Bowker tests cannot be performed unless all categories used from one variable are also used at least once by the other variable. For example, if 0-4 is the range of possible pain categoiries but only categories 0-2 apply at rest while categories 0-4 apply on mobilizing, the McNemar-Bowker test cannot be used. However, you may wish to consider using Cohen’s Kappa instead as a measure of agreement. To learn more, please consider Q. 1 and its solution under the StatsforMedics page STATISTICAL INDICES FOR MEASURING AGREEMENT AND CONSISTENCY BETWEEN GROUPS OF CATEGORICAL AND MEASUREMENT DATA. To visualize your data, I recomend you use the chart builder under the menu Graph in SPSS to construct a clustered barchart. While the dialogue box for the McNemar test allows you to create a crude one to explore your data (do have a go), the chart builder allows much more editing power. In particular, you can opt to have percentages along the vertical axis and then insert frequencies for each bar. This provides your reader with a better understanding of your data.

· Q 11. I have been using the McNemar test in an attempt to test for evidence that an intervention is having an effect on a cohort of patients with poor physical health. While I have obtained a p-value from the above test to assess statistical significance, I would also like to quantify an estimate of the size of the effect of the intervention. Have you any suggestions?

A. Yes, three in fact – 1) Cohen’s effect size; 2) an odds ratio for paired data, together with a corresponding 95% CI); and the difference in proportions of positive cases across the intervention and non-intervention groups (equivalently, across the intervention and non-intervention stages for the same group), together with a corresponding 95% CI. Let’s look at these one at a time, while noting that it is a good idea in report writing to refer the reader to original reference for the methodology you are choosing. You should also note that that the arithmetic functions in Excel are adequate in supporting you with the calculations required for the relevant formulae.

1) Cohen’s effect size

Learn how to proceed step by step with the relevant calculations (original reference included).

Please expand or collapse as needed

1. Choose what you consider to be a positive difference (e.g., difference in symptom rate after and before the intervention was implemented).

2. Calculate this difference, P, as a proportion (e.g., P = 0.72).

3. Assume that the expected difference, p, is equal to 0.5.

4. Calculate the absolute value of P - p and define this as Cohen's effect size index. (Here, this value is given by |0.72 - 0.5| = 0.22.)

5. Classify the magnitude of the effect size (representing departure from the expected difference) according to the following guidelines:

i) a value less than 0.05 is trivial'

ii) a value between 0.05 and less than 0.15 represents a small effect,

iii) a value between 0.l5 and less than 0.25 represents a medium effect

and

iv) a value of at least 0.25 represents a large effect.

Thus, in the above example, where the effect size is 0.22, we can assume a medium effect.

The original source for the above classification scheme is Statistical Power Analysis for the Behavioural Sciences (2nd ed.).

If you are registered with the University of Edinburgh, you can explore the up-to-date borrowing status of this book via the University’s library discovery system, DiscoverEd. Two editions of this book are of interest in this respect.

Here are some reference details:

- Title: Statistical power analysis for the behavioral sciences

- Author: Jacob Cohen 1923-1998.

- Publisher: Hillsdale, N.J. ; Hove : Lawrence Erlbaum

- Publication Date: 1988

Edition: 2nd

2) Odds ratio and corresponding 95% CI for paired data

In a matched case-control study, this particular odds ratio involves calculating a ratio of discordant pairs in the form,

odds ratio = no. of events in which case exposed, control not exposed/no. of events in which case not exposed, control exposed,

where the event in question could, for example, be the occurrence of a thromboembolism, with corresponding case and control groups relating to use and non-use of oral contraceptives, respectively, as in the example in the reference below.

This odds ratio can be adapted to other sorts of pairing scenarios not involving case-control studies, such as scenarios comparing the identification of an abnormality in a scan across two grades of observer.

Please refer to Essential Medical Statistics, p. 219, where you will obtain the formulae you require.

Here are some reference details:

- Title: Essential medical statistics

- Authors: Betty R. Kirkwood and Jonathan A. C Sterne

- Publisher: Malden, Mass. : Blackwell Science

- Publication Date: 2003

Edition: 2nd

Please note that if you find that, on account of the calculation involving division by zero, you cannot obtain one of the limits for your CI, then it is not advisable for you to provide the odds ratio alone. You, or at least your colleagues, may need to consider carrying out a larger study in the future, though.

3) Confidence interval for the difference in proportions for paired data

Please refer to Statistical Methods for Rates and Proportions. Equation 13.15 of Chapter 13, p. 378 provides the formulae for the limits of the CI, but you need to work backwards from these formulae to verify what the different components represent.

Matched pairs can arise in a variety of way, including where one wishes to compare the proportion of patients, p₁, with a beneficial outcome before the implementation of a clinical intervention with the corresponding proportion,p₂,after the intervention has been introduced. Equation 13.15 is intended to provide an estimate of the CI for the difference p₂ – p₁.

In this equation, ‘α’ denotes the statistical significance level, which is usually 0.05. Also, it should be clear from p. 378 of the resource that the author is using a, b, c, d, n and z_α/2 as follows:

a is the frequency of patients with a positive outcome both before and after implementation of the intervention,

b is the frequency of patients with a positive outcome before but not after implementation of the intervention,

c is the frequency of patients with a positive outcome after but not before implementation of the intervention,

d is the frequency of patients with a positive outcome neither after nor before implementation of the intervention,

n is the total frequency of patients under consideration

and

z_α/2 is the z-score on the horizontal axis for a standard normal curve which has area α/2 to the right of it under the standard normal curve. Do you know what this value is when α = 0.05? if not, you will find out soon!

The Excel template below uses the above notation to provide the confidence limits for the difference between two matched proportions for the example provided on pp. 379 – 340 of the above resource. If you find the formulae a bit cumbersome to apply, you can simply replace the frequencies in this template by your corresponding frequencies to obtain the CI limits you require for your case.

Excel template for calculating CI for matched proportions

Download here!

The template provides more accurate estimates for these limits than those provided on p. 376 of the above electronic book. This is due to more careful rounding of intermediary results during the steps for the calculation of each limit.

In terms of the wording of the outcome, we would conclude that there are 20% more patients with positive outcomes after implementation of the intervention compared to before (95% CI (3.9, 36.1)).

While this CI points to an improvement, it is still rather wide, pointing to the need for a larger sample size to gain greater accuracy.

· Q 12. I understand that there are two corrections available for the McNemar test, one involving the use of the Exact Method and the other involving a continuity correction. How do they differ and when should they be applied if at all?

A. The purposes for using each of these corrections are not the same. The continuity correction, which is not so common, is used on account of the fact that you have a 2 x 2 array of data but are really using a method for continuous data to approximate your p-value. This correction to the McNemar test is not particularly popular. It is believed to be rather conservative – that is, it tends to make tests of significance rather stringent by increasing the p-value. On the other hand, you may find that the individual groups sizes for your 2 x 2 array of data are rather small and that a correction to your p-value is needed to compensate for this. This is when the Exact Method is most useful. You can read about when the Exact Method should be applied for calculating p-values for the above test under The McNemar test at McNemar Tests of Marginal Homogeneity. You should find that SPSS automatically uses the two-sided p-value using this method when it is required; otherwise, the asympototic method will be used and you will find the correct p-value under the header Asymp. Sig. It is helpful for your to be aware that the Exact method is more conservative in terms of detecting statistical significance (in the sense that it tends to increase the size of the p-value to correct for small group sizes on cross-tabulation of data).

· Q 13. wish to compare for presence or absence of lung comets at more than two different altitudes in relation to volunteers who participated in an expedition to the Bolivian Andes. How should I proceed?

A. Here, you would be best advised to opt for Cochran’s Q-test in the first instance and (as is implicit from the example in the latter resource) move on to consider pairwise comparisons across the altitudes, provided you arrived at statistical significance on applying Cochran’s Q-test. Please note, however, that a condition for using Cochran’s Q-test is that you have only two categories at any given altitude. By way of complementing the recommended steps in SPSS provided in the latter resource, please note that if you have a rather small sample size, it is best to click the button ‘Exact’ and choose the option ‘Exact’ in the relevant dialogue box. This will enable you to generate a corrected p-value in your output (see ‘Exact Sig’).

· Q 14. Can you recommend a good reference for estimating the required sample size for the McNemar test?

The abstract for a useful paper on this topic can be found under

Estimating sample sizes for continuous, binary, and ordinal outcomes in paired comparisons: practical hints.

The full text of this paper can be accessed at:

Estimating sample sizes for continuous, binary, and ordinal outcomes in paired comparisons: practical hints.

Note. In the above paper, section 4, entitled ‘Paired binary data’ should be of particular interest. In the title, Paired binary data refers to paired categorical data for which each variable ranges over two categories, e.g. present and absent. For example, in comparing the effects of separate interventions on the same patients, one may wish to compare outcomes (present or absence of a prior condition) across the two treatments for the same patients. Alternatively, binary outcomes might be compared across patients and their siblings (a different form of pairing). These are a few of many possible examples! The case of paired binary data is addressed under section 4 of the above paper but in addition, you should at least read the abstract and introduction of the paper to set the requirements for your sample size calculation(s) in context.

More advanced techniques involving multivariable analysis

· Q 15. My question is on effectively discriminating between two groups according to their characteristics. I have collected data retrospectively for women who underwent wide excision for carcinoma followed by re-excision. The data relates to two groups of individuals following examination of tissue on re-excision, namely those who had residual disease and those who didn’t. My data set contains information on lucency of breasts, type of lesion, calcification of lesion and size of margins, all of which are potential factors in deciding which of the two groups a candidate for re-excision could fall into. It would be desirable to construct a model based on the retrospective data which separates potential candidates for re-excision into the two residual disease groups to the greatest possible extent according to the values of at least some of these and of other factors. Having identified and ranked influential factors, I would like to use my model to predict which group some arbitrary individuals from a prospective study would fall into, given particular values for the influential factors. How can all of this be done.

A. The ideal procedure for you would be a stepwise discriminant analysis. However, discriminant analysis has some restrictive assumptions which you need to explore before drawing any conclusions concerning your data. Rules for exploration of data and instructions for construction of a discriminant model are covered very well (conceptually and in terms of using SPSS) in the PowerPoint presentation at: http://www.utexas.edu/courses/schwab/sw388r7/SolvingProblems/DiscriminantAnalysis_CompleteProblems.ppt. When working with the retrospective data, you need to enter the new data as an extension to the columns in your current spreadsheet, leaving the cell for the value of the dependent variable blank and run a fresh discriminant analysis in SPSS, ensuring that you click Save … followed by the radio button for Predicted Group Membership. The predicted values (representative of which group the individuals should fall into) will appear as the last column of entries in your datasheet.

· Q 16. I would like to find a statistical procedure which carries fewer assumptions than discriminant analysis and which is more tolerant of categorical factors such as gender. Also, I would like to include in my analysis a measure of how the odds in favour of the event ‘has no residual disease’ is influenced by the individual factors.

A. The appropriate procedure is binary logistic regression analysis. Experience has repeatedly verified that undergraduate medical students embarking on short-term research projects are not well placed to appreciate the nuances of model building and validation associated with this type of analysis and that they should definitely not assume that a successful publication on the part of a more senior colleague using this method is evidence that they know better. It would make best sense to consult a professionally trained statistician and seek their advice on appropriate delegation of responsibilities. This does not preclude the possibility of proposing the use of this model for future work when writing up your report.

Q 17. I am interested in deciding which factors are predictors of a patient’s choice of management of their first trimester pregnancy loss. As there are 3 possible management options (natural, surgical or medical), my categorical dependent variable is not binary. Is their an extension to binary logistic regression which covers the case where the dependent variable has more than two outcomes?

A. Yes, the method is called multinomial logistic regression. The advice in red in the solution to Q. 15, above is also relevant here.

******

Loading…

Taking too long?

Reload document

Open in new tab

Hypothesis Tests for Categorical Data by Margaret MacDougall is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

StatsforMedics

Hypothesis Tests for Categorical Data

NB!!

A comprehensive presentation is available which consolidates different categories of information on what you need to know about the chi-square test of association and related procedures.

The presentation relies on a syntax file to enable you to construct percentage stacked bar-charts using IBM SPSS.

Handing summary data

Obtaining relative risks and odds ratios using SPSS and a note on the chi-square test of association (PowerPoint version: designed to help you learn step by step)

· Q 4. When carrying out the chi-square test of association, I obtain a chi-square statistic and a p-value. Can these be interpreted graphically and would it be possible to see how the chi-square statistic is calculated?

· Q 5. I have heard that there are two corrections to the chi-square test of association – Yates’s Correction and Fisher’s Exact test. How do I know which to apply, if any?

· Q 6. Can you recommend a good reference for estimating the required sample size for the Chi-square test for association?

Some important advice

· Q 7. I am considering various staging classification systems for Cancer of the liver. For each such system I wish to test for a linear trend relationship between mortality rates and cancer staging. Which test would you recommend?

More generally, please note that it is a pre-requisite for use of the Mantel-Haenszel procedure that the initial chi-square test of association (prior to introducing a layer variable) involves a 2 x 2 case.

· Q. 9. I would like to compare the proportions of individuals who obtained the correct answer for different questions but I wish to use the same cohort throughout. Can you recommend a test for comparing performance in one test versus performance in another?

1) Cohen’s effect size

Learn how to proceed step by step with the relevant calculations (original reference included).

· Q 12. I understand that there are two corrections available for the McNemar test, one involving the use of the Exact Method and the other involving a continuity correction. How do they differ and when should they be applied if at all?

· Q 13. wish to compare for presence or absence of lung comets at more than two different altitudes in relation to volunteers who participated in an expedition to the Bolivian Andes. How should I proceed?

· Q 14. Can you recommend a good reference for estimating the required sample size for the McNemar test?

Estimating sample sizes for continuous, binary, and ordinal outcomes in paired comparisons: practical hints.

Estimating sample sizes for continuous, binary, and ordinal outcomes in paired comparisons: practical hints.

More advanced techniques involving multivariable analysis

SiteOriginEditor column

The WordPress site for supporting undergraduate medical student learning in statistics for short research projects