· Q. 1. I have heard that some of my data are censored but I am not really sure what that means. Where can I find out more?
A. There are three main types of censored data: left-censored, interval-censored and right-censored. A good synopsis of what these terms mean is provided under The Basics of Survival Analysis.
· Q 2. Where can I learn about survival analysis, including Kaplan Meier analysis, the log-rank test and Cox regression analysis?
A. See Topics 44 of the electronic version of the book
Medical Statistics at a Glance.
If you are registered with the University of Edinburgh, you can consult the electronic version of this book via the University’s library discovery system, DiscoverEd.
Here are some reference details:
- Title: Medical statistics at a glance
- Author: Aviva Petrie
- Caroline Sabin
- Publisher: Hoboken : Wiley
- Publication Date: 2013
- Edition: Third edition
You should then progress to the topic Survival Analysis listed at the Statistics at Square One site, where you will find a very helpful and more thorough explanation of the above procedures. Note that survival analysis is not limited to the event death (see Basic Concepts in Survival Analysis for more on this point).
· Q. 3. I would like to progress to learning how to perform a Kaplan Meier analysis and the log-rank test using SPSS. What do I need to know?
A. Please read what follows very carefully.
1. Carrying out Kaplan Meier Analysis and the log-rank test using SPSS
2. Further help and a worked example
Please note from the outset the importance of selecting a fixed follow-up time which is clinically meaningful. Don’t just use the longest survival time across all individuals in your sample as your follow-up time, as this approach can carry fatal flaws. Whichever resources you choose to use from those below, you should carefully read and follow the advice provided under Q.3, below.
Start here however:
To assist you in taking an interactive approach to learning, consider opting for the SPSS file HepsurvKM.sav.
Chapter 13 of the training resource IBM SPSS Advanced Statistics 22 will take you through many of the necessary steps in performing a Kaplan Meier analysis using these data.
Performing the log-rank test in SPSS (to test for a significant difference in the distribution of survival times across different cohorts (e.g. different treatment groups)) is also covered at the above site.
In the event that you encounter alternative names for tests to compare survival curves such as the Gehan-Wilcoxon method and wish to know whether these tests are equivalent to those available in SPSS and which of these tests is ideal for your purposes, you should also refer to the resource How do the three methods [for comparing] survival curves (log-rank, Mantel-Haenszel, Gehan-Breslow-Wilcoxon) differ?.
However, please make sure that you also fully cover the material under Q. 4 below before embarking on your own Kaplan-Meier analysis.
· Q. 4. What important additional advice to that above do I really need to know in performing a Kaplan-Meier analysis?
When performing a Kaplan Meier Analysis, it is advised that you choose one or more fixed follow-up times and classify patients according as to whether or not they survived up to the end of that follow-up time, irrespective of what happened to them later on (that is, by the end of the actual follow-up time).
This is perhaps best illustrated by means of the an example, so first open up the following SPSS spreadsheet on different follow-up times and then read the instructions below.
Explanation of columns in spreadsheet
The data is this spreadsheet have been prepared for the case study involving the event death for hepatocellular cancer patients. Clinically speaking, it makes sense to consider follow-up times of 1 year, 3 years and 5 years, separately and compare survival across patients according their level of exercise. As survival time has been recorded in days, the choice has been made to define a year as 365 days. Thus, the corresponding follow-up times are 365 days, 1095 days and 1825 days, respectively. The column headers in the spreadsheet may be defined as follows:
-
patient: a number tag for each patient in the study
-
exercise: a factor in the analysis representing amount of exercise taken
-
followuptime: the length of time until the patient died (if died during follow-up) or the length of time that the patient was followed up for (if alive during follow-up). This is the basic survival time column.
-
status: status of patient (whether dead or alive). This is the basic status column.
-
oneyearfollowuptime: the length of time until the patient died (if died within one year); otherwise 365 days
-
oneyearstatus: just as for the above column status until you reach a value greater than 365 in the column followuptime, at which point you record a value of 0 (labelled ‘alive’) and continue doing so until you reach the end of the column
-
threeyearfollowuptime: the length of time until the patient died (if died within three years); otherwise 1095 days
-
threeyearstatus: just as for the column status until you reach a value greater than 1095 in the column followuptime, at which point you record a value of 0 (labelled ‘alive’) and continue doing so until you reach the end of the column
-
fiveyearfollowuptime: the length of time until the patient died (if died within five years); otherwise 1825 days
-
fiveyearstatus: as for the column status until you reach a value greater than 1825 in the column followuptime at which point you record a value of 0 (labelled ‘alive’) and continue doing so until your reach the end of the column
If you are comparing more than 2 groups, you may also wish to consider Q. 5, below.
· Q. 5. I have heard that the median survival time is a useful summary statistic to use in presenting my findings from a Kaplan Meier analysis. How can I learn more about this statistic?
A. A useful starting point is
Basics of Median Survival (MS) .
The content at the above link will help you see that there is a special case where the median will not be forthcoming for your data. To appreciate how the median relates to the Kaplan Meier distribution, you should also find it helpful to consider the BMJ paper
Survival (time to event) data: median survival times.
Note that quartiles are of interest too and these can be easily requested using SPSS via the options button in the main diaglogue box for a Kaplan Meier analysis.
In the example in the above BMJ paper, the statements below could be thought of as pertaining to the first (Q1) and third (Q3) quartile survival time, respectively.
The probability of surviving 19.6 months or longer after starting chemotherapy was 0.25.
The probability of surviving 8.2 months or longer after starting chemotherapy was 0.75.
Refer to the supporting content in the solution to Q. 3, above to view examples of the relevant output for other examples.
· Q. 6. I wish to compare survival across 5 different diagnosis groups and am uncertain as to how to interpret the log-rank test in this context. Can you please advise?
A. Where there are more than two groups involved, the log-rank test tests for a significant difference overall across these groups but does not allow you to identify how this difference is distributed across corresponding pairs of groups. Where you have already identified a significant difference overall across your groups, you may wish to add to your results by probing a little deeper and performing pairwise comparisons between selected pairs of groups. This will allow you to obtain a separate p-value for each pair of groups you may wish to compare.
To accomplish this, go back to the button ‘Compare Factor’ within the SPSS dialogue box for the Kaplan-Meier method and select the option ‘Pairwise over strata’ rather than ‘Pooled over strata’ when you have ticked the box for ‘Log rank’ under ‘Test statistics’. You will then obtain a table of corresponding p-values. You need only select the p-values for the groups you really wanted to compare.
Be aware that if you have overlapping curves, the Breslow test is more appropriate than the log-rank test but that the remaining ideas in this solution still apply.
Once you have identified your p-values, for your pairwise comparisons you should apply a simple adjustment to them known as the Bonferroni correction. This involves multiplying each p-value by the number of comparisons you planned to carry out. For example, if you want to obtain p-values for comparing 3 pairs of diagnosis groups, just multiply the 3 corresponding p-values you obtained by 3.
When you finally report your results, it would be a good idea to create a table containing a column for the original p-values and to include alongside this a further column with the adjusted (or, corrected) p-values.
The rationale behind the Bonferroni correction is to correct for the increased likelihood of carrying out a Type 1 error (obtaining a significant difference by chance) when carrying out multiple significance tests.
In the final analysis, you should verify that the adjusted p-values are less than or equal to 0.05. However, it is good practise to let the reader see whether the conclusions of your hypothesis tests were influenced by the Bonferroni correction. This is why the more complete table recommended above is useful.
· Q 7. I have too small a dataset and too limited time and background training in statistics to embark on a full-blown multivariate analysis at this stage. However, for my survival analysis comparing survival in hepatocellular cancer patients across an intervention and control group, I would like to involve gender as an additional factor to that of intervention. I would therefore value tips on how to perform the relevant Kaplan Meier analysis.
A. The secret lies in the presentation of your data, including the codes you use for your categories. By way of illustration, refer to the column ‘Values’ within the window ‘Variable View’ of the spreadsheet HepsurvKMstrat.sav. You should especially explore the codes that have been assigned to the column ‘gender’. These have been chosen so as to ensure that when you combine the two columns ‘intervention’ and ‘gender’ to obtain the column ‘group_by_gender’, each possible combination of intervention and gender has a unique numerical code. You can then use the above ‘Values’ column to assign labels (reflecting combined categories) to these numerical codes (such as ‘Control and male’) so as to assist with the interpretation of your SPSS output. The new column ‘intervention by gender’ now represents the new factor for inclusion in your Kaplan-Meir analysis.
As for the question of how you to combine columns in SPSS, you would do well to consider the handy guide How to Combine Variables in SPSS.
In terms of comparing survival curves, Q. 5 should be of direct relevance to your needs.
· Q 8. I wish to present some summary statistics for my survival data in terms of proportions surviving or dying at a series of stages of the follow-up period. How can I access a worked example for this purpose?
A. In association with the statistical package SPSS, there is a useful worked example under Life tables which should help you greatly.
The example offered uses a file telco.sav. If you have access to SPSS, please refer to the resource Sample files for advice on how to access this and other SPSS files included in the SPSS installation package. Please also take into consideration the advice provided on fixed follow-up times and the relevant lay-out of data provided in the solution to Q. 4, above.
Have a careful read through the example in order to identify what each of the columns in the life table represent. If there are very few events (e.g. death) in your sample, you may wish to choose the first 5 columns only when presenting your findings in your report. It is quite easy to delete the remaining columns from your SPSS table. Just follow the steps below in the order shown:
- double click the table to activate the Pivot Table window,
- click on the column header of any column you wish to delete,
- choose ‘Select–>Data Cells’ from the menu ‘Edit’
and
- choose ‘Clear’ from the menu ‘Edit’.
· Q 9. I wish to perform a Kaplan Meier survival analysis to compare the time to death for patients with osteosarcoma before and after the introduction of chemotherapy. What sample size and how many uncensored cases would I require?
A. Have a look at Chapter 9 of the book “Modelling survival data in medical research”, 2nd ed., Boca Raton ; London : Chapman & Hall/CRC, c2003. The 1st edition would also be acceptable.
If you are registered with the University of Edinburgh, you can consult the electronic version of this book via the University’s library discovery system, DiscoverEd. Here are some reference details from the above source:- Title:
Modelling survival data in medical research
- Author: D. Collett 1952-
- Statement of responsibility: D. Collett.
- Subjects: Survival analysis (Biometry); Medicine — Research — Mathematical models; Clinical trials; Linear models; Prognosis; Research; Statistics
- Publisher: Boca Raton ; London : Chapman & Hall/CRC
- Publication Date: 2003], ©2003
- Edition: Second edition
or (original edition) as above, with the following changes:
- Publisher: London : Chapman & Hall
- Publication Date: 1994
· Q 10. I wish to perform a Cox regression analysis to identify the effects of my individual study variables on time to progression from HIV to AIDS. How many uncensored cases do I need?
A. A good rule-of-thumb is the one in ten rule.
· Q 11. I am currently involved in a study designed to investigate whether high-dose intravenous Vitamin C improves outcomes for mechanically ventilated patient with severe pneumonia in ICU compared to standard care. I am includes some covariates (Demographics, Medical History, Concomitant Medications, Sequential organ failure assessment (SOFA), and Acute physiologic and chronic health evaluation (APACHE II) – first day of ICU admission, among others). I’d like to use a Cox regression analysis to estimate the relative rate of successful extubations at 28 days while accounting for the fact that death can prevent the event extubation from occurring. Can you suggest any modifications to the Cox regression model that would suit my study objective?
A. A subdistribution hazard ratio (SHR) for a ventilation-free event, with mortality as a competing event, measures the relative rate of successful liberation from a ventilator (e.g., extubation) while accounting for the fact that death prevents the event from occurring. Using the Fine and Gray model, it provides a direct, unbiased estimate of the cumulative incidence of extubation.
It is a ratio of the subdistribution hazards between groups (e.g., treatment vs. control) for successful 28-day extubation.
Competing Event: Mortality is considered a “competing risk” because a patient who dies before extubation cannot be extubated.
Interpretation: An SHR > 1 indicates a higher probability of earlier successful ventilation-free status in the treatment group compared to the control group, even when accounting for patients who died.
Advantage: Unlike standard Kaplan-Meier analysis, which might treat deaths as standard censoring (potentially overestimating success), the SHR correctly handles deaths by keeping them in the risk set (as “subdistribution” suggests).
· Q 12. I’ve noted that in my current version of SPSS (version 29), the Fine and Gray (“Fine-Gray”) model is not mentioned under the Analysis menu. Does this mean that SPSS cannot address this type of analysis?
A. For some more sophisticated analyses, SPSS requires installation of an add-in.
1. If you are using a Windows-based machine, proceed to Step 2.
If you are using a Mac, you will first require to install Xquartz. Here are some details:
Because the Competing Risks extension uses R ‘under the hood,’ it needs a system called XQuartz to help display its components.
- Download it for free from XQuartz.org.
- Install the package and restart your Mac (which is essential for SPSS to recognise this package).
- Once you’re back in SPSS 29, go back to the Extension Hub and follow the previous instructions.
The video at the link below provides a straightforward, visual walkthrough of downloading and installing XQuartz on macOS, which are the main steps you need to complete to get the previous instructions to work.
How to Install xquartz on a Mac. Install me second. – YouTube
2. With the SPSS package open:
i. Go to the Extensions menu at the top.
ii. Select Extension Hub.
iii. With the “Explore” tab selected, used the search box to search for: COMPRISK.
iv. Look for the extension named Competing Risks Regression (or something similar).
v. Check the Get extension box and click OK.
vi. One the installation has stopped, re-check the Analyze menu.
3. Statistical analysis
How to Run the Analysis in Version 29
Use the menu pathway Analyze > Survival > Competing Risks Regression
Once the dialogue box opens, map the project variables like this:
-
Time: The “time to event” variable.
-
Status: The variable that indicates what happened (e.g., 0=Alive/Censored, 1=Extubation, 2=Death).
-
Event Code: Set this to 1 (or whatever number represents extubation).
-
Competing Event Code: Set this to 2 (or whatever number represents death).
-
Covariates: This is where to put the “prognostic factors” (e.g., a).
4. Interpreting the Output (The “Subdistribution Hazard”)
When you run this model, SPSS will give you a Subdistribution Hazard Ratio (sHR).
-
Standard Cox HR: Tells you the rate of 28-day extubation, inclusive of people who haven’t had one yet (ignoring that some people can’t get one because they’ve died).
-
Fine-Gray sHR: Tells you the effect of a factor on the actual cumulative incidence of extubation in the real world, where death is a permanent barrier.
5. Reflecting on the Fine-Gray procedure
In a clinical setting, based on the results of a standard Cox model, a doctor might tell the ICU team for specific categories of patient that, “There is a 90% chance of 28-day extubation.” In contrast, subsequent analyses using the Fine-Gray procedure may provide a more accurate figure, closer to 75%.
6. Visualizing the Overestimation
As we’ve noted the standard Cox model is likely to overestimate the overall hazard ratio. You can explore this by generating a Cumulative Incidence Function (CIF) plot within that same SPSS menu.
The CIF plot will show the “real” probability of 28-day extubation. If you compare it to your old Kaplan-Meier curve, you will likely see the Kaplan-Meier curve sits higher on the graph—this visual gap represents the “overestimation”.
6. Writing the Rationale for Choice of Methodology in the Methods section of a Report
Here is a sample statement that could be included in the methods section of a report or manuscript:
To account for the competing risk of death, which violates the censoring assumption of the standard Cox model, a Fine and Gray competing risks regression was performed using the SPSS STATS_COMPRISK extension. This allowed for the estimation of subdistribution hazard ratios to more accurately estimate the influence of prognostic factors (representend by covariates in the model) on 28-day extubation.
Survival Analysis by Margaret MacDougall is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.