Simple Descriptive Statistics

· Q 1. Where can I find an overview of measures of central tendency (mean, median, mode, trimmed mean) and measures of spread (range, semi-inter-quartile range, variance and standard deviation) and how to decide which measure from each of these two groups is the most appropriate for my data?

A. Have a look at topics 1 and 2 of the contents list for Describing Univariate Data.

You can then progress to the following very lucid guide on the interpretation of measures of central tendency and spread, where you will also find advice on how to obtain these values using SPSS and more detail on the inter-quartile range: Summary Statistics.

It is also helpful and perhaps surprising to note that selecting the path Descriptive Statistics –> Frequencies from the menu Analyze in SPSS will enable you to access a versatile dialogue box for calculating summary statistics. If you have measurement data for which you wish to calculate summary statistics, make sure that the box labelled’Display frequency tables is unchecked. Then, having selected your chosen variable(s), use the button Statistics to display a wide range of functions.

· Q 2. I would value some specific information on the standard deviation, the standard error of the mean and confidence intervals and in particular when to use them. Could you direct me to a suitable resource?

A. The following book chapter should prove helpful:

Maintaining Standards: Differences between the Standard Deviation and Standard Error, and When to use Each

You may need to login via your institution to gain accesss.

The book from which this chapter is taken is

A Guide for the Statistically Perplexed: Selected Readings for Clinical Researchers.

· Q 3. I would like to generate summary statistics such as means and standard deviations a) for different groups or b) for different groups and variables simultaneously. Is there an efficient way I can perform either of these procedures in SPSS?

A. Yes indeed there is; have a look at the material in the resource Introduction to SPSS for Windows Version 15.0.

You should also be aware of the Split File facility in SPSS which allows you to tell SPSS in advance that you want to split future output according to groups which you have already specified in a separate column. This facility, which is also covered in the above resource, is particularly useful when performing the same analyses for separate groups. You can very easily cancel the request to split output once you are finished. T

Also, if you are specifically interested in creating tidy tables of output in SPSS for simple summary statistics, you ought to find the resource Creating Basic Tables. You should regard the tutorial as a primer, since as you will see there are additional menu and dialogue box options close by which you may wish to avail yourself of to tailor the output to your specific needs. This tutorial is also helpful in illustrating how you can control the precision in terms of number of decimal places for the different statistics displayed in your table. To help you interact with the methodology in the tutorial, please feel free to use the data for constructing basic table.

· Q 4. I would like to learn more about the geometric mean, including when and how to calculate it. Can you please direct me to a useful resource for this purpose.

A. Yes, indeed. You will find all of the above, including calculating a confidence interval for the geometric mean covered in the resources What is a Geometric Mean and How to Calculate a Geometric Mean Have a look at the Wikipedia site Geometric mean first, however, to ensure that you can see why the geometric mean really just involves applying the exponentiation function to an arithmetic mean over logarithms.

· Q 5. I am interested in detecting neurocognitive deficits in patients attending a general adult psychiatry clinic with a view to obtaining information on the functional problems of patients with depression. A number of cognitive tests have been performed on these patients and the corresponding scores recorded. I have been advised to consider the use of z-scores. Can you point me to a resource which explains how these scores are calculated and what they can tell me?

A. Certainly; have a look at the resource z-scores by means of which you will learn (among other things) that z-scores provide the number of standard deviations which any of your sample values is from the population mean. You will also find it useful to obtain histograms for your data, as this will give you an idea of whether your z-scores are evenly distributed about the population mean (which is transformed to the value of zero). If they are not, you should ask yourself questions such as, where do most of the values lie? You might also wish to consider what would constitute an acceptable degree of departure from the population mean in terms of number of standard deviations, e.g. within 2 or within 3 standard deviations of the population mean. What is acceptable will depend on the clinical context, including existing conventions, and may need to be discussed with clinical colleagues. You could then indicate your choices on your histograms by annotating your histograms with cut-off lines at the x-axis. This can be done with an SPSS graph on copy-pasting the graph into MS-Word. To add more meaning to your data, you could in turn consider what percentage of values lie within or outside these boundaries. In order to calculate these percentages conveniently, it can help in the first instance (particularly with large datasets) to form a new column of data representing your z-scores according to categories, where these categories are defined by your boundaries. To learn about how to form a categorical variable from a continuous variable very efficiently in SPSS, refer to the resource recoding data in SPSS.

Taking too long?

Reload document

Open in new tab

Also, you can do quite a lot with your non-transformed data. For example, after plotting histograms for these data, you could investigate the degree of skewness of these data and indeed (scroll down within the same resource) whether these data and the transformed data are unimodal or bimodal.

· Q 6. Is it possible to calculate z-scores and statistics for skewness in SPSS to help me apply the techniques recommended in the solution to
Q. 5?

A. Yes, very quickly in fact. Go to the menu ‘Analyze’ and select the path ‘Descriptive Statistics –>Descriptives’. Pop the variable(s) of interest into the box Variable(s) box and if you wish z-scores, tick the box ‘Save standardized values as variables’. To obtain data on skewness, select the button ‘Options’ and tick the box labelled ‘Skewness’.

· Q 7. I am studying the pathogenesis of acute mountain sickness have obtained multiple measurements per individual for vascular endothelial growth factor and would like to calculate the coefficient of variation to assess whether these measurements are consistent. How is this coefficient defined and can I calculate it using Excel?

A. A suitable definition and the rationale for using this measure can be found under Coefficient of Standard Deviation and Variation. The instructions for calculating this measure using Excel can be found under Calculate the coefficient of variation in Excel. N.B. You are free to arrange your data in rows for this calculation. The key point is that you use the correct formula for your chosen range of cells. For your interest here is a published article from the journal Respiratory Medicine in which the above application of the coefficient of variation was used:

Change in plasma vascular endothelial growth factor during onset and recovery from acute mountain sickness.

In the above paper, “results were excluded as unreliable where an upper limit of 20% in the coefficient of variation was exceeded.”

· Q 8. What is an acceptable cut-off value for the coefficient of variation?

A. When looking for consistency across repeated values of the same measure, ideally, the coefficient of variation should not exceed 0.20; otherwise, they are likely to be unreliable. In a different context, where you are consistency is not expected and you are simply looking for an indication of whether the standard deviation is high, you may find it useful to compare the coefficient of variation with the value 1, as explained in the Statology resource on this topic.

· Q 9. I understand that when providing summary data for ages, it is common practise to quote the mean and standard deviation or the median and range or inter-quartile range. However, unfortunately I did not collect the raw data for age and only have age in categorical form. Are there similar summary statistics which I can obtain for my categorical age data or have I blown my chances?!

A. Have no fear; we live in the real world! Even where the data are not optimal it can be possible to find a suitable procedure. The Excel spreadsheet below illustrates by means of an example how data grouped according to age can be used to calculate a median and an inter-quartile range.

Excel spreadsheet: Calculating the median and inter-quartile range for data in classes

If you are puzzled by the notion of lower class boundary, please have a look under the sub-header ‘Class limits for Table (a)’ at Frequency Distribution. Notice that the data in the spreadsheet are laid out in the form of a table, so you will need to summarize your data before doing your calculations. By forming this sort of table, you will be able to identify which class each of the the actual sample median and quartiles ought to lie in, even although you cannot provide their values. This information is needed for your calculations!

The cumulative frequencies form a running total and a special formula has been inserted to make this happen. A formula has also been inserted for each of the median, lower quartile, upper quartile and inter-quartile range. More generally, you can find out where a formula has been implemented by clicking on individual occupied cells and observing the formula bar ‘f_x‘ towards the top of the spreadsheet. Also, by hovering over the cells in column E with red flags, you can read the comment boxes which explain what the different parameters represent. If you sense you need a primer on Excel formulae, I *strongly recommend* that you consider the resource Excel 2010: Absolute Beginners. The latter resource covers Excel functions very nicely and once you have mastered this skill, your confidence should increase.

Q. 10. How can I conveniently obtain frequency tables for age-groups in SPSS if my data are laid out in the form one row per patient?

A. Please consider using the Data Preparation Tutorial to prepare your data for analysis in SPSS. Next, by choosing the path ‘Descriptive Statistics–>Frequencies’ from the menu ‘Analyze’ in SPSS and then the option ‘Display Frequency Tables’ (if it is not already selected), you can very rapidly obtain the frequencies corresponding to each age range. Note also that the frequency tables forthcoming from this approach include separate columns labelled Percent and Valid Percent. The second of these takes into account the fact that you may have missing data. The calculations for this column assume as a denominator the total frequency of individuals for who data (in this case age-groups) are forthcoming. This is handy to know where you have a spreadsheet with missing data for individuals in one or more columns and you want to express your percentages in terms of the total number of individuals for which data are available.

Simple Descriptive Statistics by Margaret MacDougall is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

StatsforMedics

Simple Descriptive Statistics

· Q 1. Where can I find an overview of measures of central tendency (mean, median, mode, trimmed mean) and measures of spread (range, semi-inter-quartile range, variance and standard deviation) and how to decide which measure from each of these two groups is the most appropriate for my data?

A. Have a look at topics 1 and 2 of the contents list for Describing Univariate Data.

· Q 2. I would value some specific information on the standard deviation, the standard error of the mean and confidence intervals and in particular when to use them. Could you direct me to a suitable resource?

· Q 3. I would like to generate summary statistics such as means and standard deviations a) for different groups or b) for different groups and variables simultaneously. Is there an efficient way I can perform either of these procedures in SPSS?

· Q 4. I would like to learn more about the geometric mean, including when and how to calculate it. Can you please direct me to a useful resource for this purpose.

· Q 6. Is it possible to calculate z-scores and statistics for skewness in SPSS to help me apply the techniques recommended in the solution to
Q. 5?

· Q 8. What is an acceptable cut-off value for the coefficient of variation?

The WordPress site for supporting undergraduate medical student learning in statistics for short research projects

· Q 1. Where can I find an overview of measures of central tendency (mean, median, mode, trimmed mean) and measures of spread (range, semi-inter-quartile range, variance and standard deviation) and how to decide which measure from each of these two groups is the most appropriate for my data?

A. Have a look at topics 1 and 2 of the contents list for Describing Univariate Data.

· Q 2. I would value some specific information on the standard deviation, the standard error of the mean and confidence intervals and in particular when to use them. Could you direct me to a suitable resource?

· Q 3. I would like to generate summary statistics such as means and standard deviations a) for different groups or b) for different groups and variables simultaneously. Is there an efficient way I can perform either of these procedures in SPSS?

· Q 4. I would like to learn more about the geometric mean, including when and how to calculate it. Can you please direct me to a useful resource for this purpose.

· Q 6. Is it possible to calculate z-scores and statistics for skewness in SPSS to help me apply the techniques recommended in the solution to Q. 5?

· Q 8. What is an acceptable cut-off value for the coefficient of variation?

The WordPress site for supporting undergraduate medical student learning in statistics for short research projects

· Q 6. Is it possible to calculate z-scores and statistics for skewness in SPSS to help me apply the techniques recommended in the solution to
Q. 5?