N.B. (Practise tip): The instructions provided in the solutions below for calculating correlation coefficients in SPSS can be conveniently applied to the practise data available in the spreadsheets Prac5 and Work5. There is no need to wait till you have all your data to learn the relevant techniques! It may also prove helpful for you to be aware that sometimes the Pearson correlation coefficient is referred to as the Pearson product moment correlation coefficient.
· Q. 1. I would like to learn about correlation coefficients. Where is a good place to start?
A. You should consider the Statistics at Square One chapter, Correlation and Regression (see Chapter 11) and Chapter 8 of Using SPSS to Understand Research and Data Analysis.
· Q 2. My data for blood hormone level against saliva hormone level display a linear trend and each of the corresponding variables are approximately Normally distributed. How can I use SPSS to calculate the Pearson Correlation coefficient?
A. Yes; it’s best to check first:
a) if your two variables range over observations on a random sample of variables;
b) if your data approximate to a linear trend
and
c) depending on your needs (see Note on Normality testing, below), using suitable tests of Normality, that blood and saliva hormone levels meet the requirements for Normality.
Note on Normality testing.
Please note, however, that strictly speaking, the requirement regarding testing for Normality only applies where you wish to provide a 95% confidence interval for your correlation coefficient using the traditional method (which uses Fisher’s Z transformation). If you wish to use a test of significance to test (and typically, disprove) the null hypothesis that your correlation coefficient is equal to zero the requirement is that at least one of your two variables is approximately Normally distributed. If you want to calculate the correlation coefficient only (one wonders why), then it is not necessary to test for Normality. Also, so that you know exactly what yo do in relation to testing for Normality where needed in this context, please refer to StatsforMedics WordPress page Tests of Normality and in particular, note the paragraph starting ‘NB.’ in the solution to Q. 4.
· Q 3. I would like to test the validity of the 5 – point Groningen Distress against a gold standard defined in terms of a visual analogue scale (VAS) used by an health practitioner. The VAS consists in a 10 cm line which the health practitioner will mark for each patient. However, the corresponding scores will be expressed to the nearest cm. I read in a related journal publication that the authors used the Pearson Product Moment Correlation Statistic to address the same question for a different sample of patients. Is this the best measure to use to assess the validity of the GDS?
A. First a little lingo: as you are endeavouring to measure the validity of the GDS by the correlation with a measure taken at the same time, you are measuring concurrent validity for the two scales. Regarding the appropriateness of the correlation statistic used by the authors, please note that ‘Pearson Product Moment Correlation Statistic’ refers to the same statistic as the Pearson Correlation coefficient, above. Given the nature of the data in these scales, it is possible that the relevant assumptions for use of the Pearson correlation coefficient will not be met and that Spearman’s correlation coefficient (also referred to be the names Spearman’s rank correlation coefficient, Spearman’s rank order coefficient and Spearman’s rho, among others) is more appropriate for your needs. However, please refer to FAQ 2. and the corresponding solution, above in the first instance, prior to making a decision.
Use of Spearman’s correlation coefficient is appropriate where there is an increasing or decreasing monotonic trend in the plot of one of the two variables of interest against the other. This sort of trend involves one variable either consistently increasing or consistently decreasing with the other, with the exception of a few blips. Monotonic trends are illustrated graphically under Spearman’s rank correlation coefficient. Also, please consider referring to What is a monotonic relationship?.
You should find the guide under Spearman‘s rank order coefficient using SPSS very helpful in guiding you through the relevant steps for calculating this statistic using SPSS, although please skip the listed assumption that there is a linear relationship between the two variables. The latter property need not be present for you to use Spearman’s Correlation Coefficient.
To gain a sense of how you may interpret the strength of any relationship you may observe, have a look at the resource Correlation coefficients. (Scroll down to read the content immediately after the remark ‘Remember, correlation does not imply causation’ highlighted with a blue background.) The advice given here can be carried over from the interpretation of the magnitude of Pearson’s correlation coefficient to that of Spearman’s rho.
· Q 4. I have been exploring the relationship between HU measurements and bone mineral density (BMD), as I have an interest in finding out if there is an association between COPD and osteoporosis. The scatter-plot of severity of emphysemia in Hounsfielf Units (HU) against BMD suggests that there is a trend but I am not convinced that a straight-line fit would represent the optimal model and form the best basis for future analysis of the strength of any correlation. Can you recommend how best to proceed.
A. If your data follow a monotonic trend (see Q. 3., above), you ought to consider using Spearman‘s rho to measure correlation. Alternatively, assuming a non-montonic curve from a recognized family of distributions would fit your data rather well, please consider availing yourself of the help provided under the Curve Estimation procedure in SPSS. To find this help, select the following sequence of commands:
Analyze –> Regression –> Curve Estimation.
Notice the range of models you have to choose from! There are, in fact, ten different models in addition to the listed linear model. Click the button Help and in turn the option Show me to arrive at a couple of worked examples (first two links in middle menu) in tutorial form. These will guide you how to use such non-linear models responsibly.
To gain a sense of how you may use the regression coefficient to interpret the strength of any relationship you may observe, have a look at the resource Correlation coefficients. (Scroll down to read the content immediately after the remark ‘Remember, correlation does not imply causation’ highlighted with a blue background.) The advice given here can be carried over to non-linear regression.
· Q 5. I have noted that using the standard procedures for calculating Pearson’s correlation coefficient and Spearman’s rho in SPSS, I am unable to generate a confidence interval for my correlation coefficient. Can you recommend what to do?
A. There is a range of possibilities. Assuming that you do not want to calculate these CIs manually using the original formulae or use SPSS syntax, I would recommend that you consider the video and corresponding spreadsheets provided for you at the how2stats page Confidence Intervals for Correlations – Calculator.
The above resources assume that you wish to calculate a CI for Pearson’s correlation coefficient. However, it is also possible to use the same resources to calculate a CI for Spearman’s rho. Note that Spearman’s rho is equivalent to Pearson’s correlation coefficient applied to the ranks of the data. Therefore, to obtain the CI in this case, first transform the data to ranks and then follow all of the instructions in the above video. To generate ranks for your data in SPSS, you can select the option ‘Rank cases’ from the menu ‘Transform’.
· Q 6. I would like to compare the value of the Pearson Correlation Coefficient across independent study groups. How best should I proceed?
A. Given sufficient time, your best option may be to take the approach of multiple linear regression (see the StatsforMedics WordPress page LINEAR REGRESSION ANALYSIS (SINGLE AND MULTIPLE), although in the first instance, you will require to test all of the associated model assumptions for this approach. Alternatively, you may wish to consider obtaining a confidence interval and corresponding p-value from a hypothesis test of difference between any two of your coefficients. For the confidence interval, see Confidence Interval, Difference between Independent Correlations and for the hypothesis test see A Tutorial on Correlation Coefficients (and in particular, the sub-header Conducting Significance Tests of Correlation Coefficients). To find the p-value corresponding to the z-statistic you calculate, refer to the z- to p-value table and insert your value for the z-statistic to the right of ‘z = ‘. It is usual practise to opt for the two-tailed p-value (see the table under your selection), as it is usual to assume that the two coefficients are equal under the null hypothesis and that your refutation of this hypothesis (where appropriate) could go either way in terms of the direction of the difference between your two coefficients (see the Statistics index search term one-tailed versus two-tailed hypotheses tests).
If you are performing pairwise comparisons of correlation coefficients for any given variable (e.g. age-group) and this entails comparing more than one pair, you ought to apply a Bonferroni correction to each p-value you obtain for that variable. As explained in various sections within the StatsforMedics WordPress site, this involves multiplying each p-value for that variable by the number of comparisons you made for that variable.
· Q 7. I have heard that Kendall’s tau-b is a statistic which is similar to Spearman’s rho. Should I use both or is one superior to the other?
A. Kendall’s tau-b provides a measure of association between two ordinal variables according to the ordering of values in the sense explained at 18.3 – Kendall Tau-b Correlation Coefficient. As such, unlike Spearman’s rho, it is not sensitive to the magnitude of the difference between corresponding values for two variables, only whether they differ. These properties are discussed in some detail in the video (see parts 1 and 2) Kendall’s tau vs Spearman rank correlation. Arguably, there is room for ongoing debate as to which choice is best in any one context. Likewise,arguably interpreting Spearman’s rho for a pair of variables as the Pearson correlation coefficient on the ranks of the data is straightforward, preventing this statistic being notably inferior to Kendall’s tau on grounds of conceptual non-transparency.
Nevertheless, hopefully, the above resources will be of some assistance in helping you make an informed choice. Note also, however, that just as with Spearman’s rho, Kendall’s tau-b carries assumptions for its use and these assumptions require to be tested first! The resource Kendall’s Tau-b using SPSS Statistics should meet your needs in addressing this need as well as allowing you to identifying the correct steps for calculating this statistic using SPSS.
· Q 8. Why is there more than one type of Kendall’s tau and what is the gamma coefficient?
A. Kendall’s tau-a is the original tau, whereas Kendall’s tau-b and Kendall’s tau-c were created to deal with ties in the data for the two variables under consideration. The choice between the latter two statistics is best determined by whether or not, on cross-tabulation of the categories for your two variables the array of categories is square (use tau-b) or rectangular, but not square, (use tau-c ). By way of consolidation, details on this choice together with a comparison of the above statistics with the more conservative gamma coefficient (‘gamma’) can be found under Gamma and Kendall’s tau.
Formulae for each of the above statistics and advice on how to calculate them using SPSS can be found under Chapter 14 Measures of Association for Ordinal Data of the resource IBM SPSS Exact Statistics.
***PROGRESS TO SOMER’S D’
Published by the Medical Teaching Organisation and the Learning Technology Section – last updated 27/9/2012
Copyright © 2015 – College of Medicine and Veterinary Medicine, The University of Edinburgh. All rights reserved. Privacy Policy.
Correlation Coefficients – Linear and Non-Linear and Concordance by Margaret MacDougall is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.