Excel – handy guides, tutorials and tips

Please take on board the information provided in the paragraph below before progressing further.

Despite its limitations for use in statistics, Excel has many commendable facilities which can assist you greatly in the manipulation of data, in creating summary tables for cross-tabulating frequencies and in creating simple charts. A good first step in familiarizing yourself with the graphing features of Excel 2010 is to view the scatter-plot with trend-line movie. This presenter goes into considerable detail regarding the wealth of features available for editing a chart and this advice can be conveniently carried over to other chart-creating activities within the same package. Notice also that on accessing the movie you will be presented with a range of further movies to the right of the movie screen to allow you to explore how to create alternative types of charts in Excel 2010. If you have any queries in terms of finding what you are looking for, please feel free to contact me (Dr Margaret MacDougall) via the medstats.help@ed.ac.uk and I shall aim to advise you accordingly.

Adding error bars to charts

If you are specifically interested in adding error bars to an existing chart in Excel, please refer to Q. 10 and the accompanying solution on the StatsforMedics page EPIDEMIOLOGY FOR THE RUSTY AND COMPARING A STUDY COHORT WITH A GENERAL POPULATION.

Barcharts

While the tutorial Custom charts in Excel serves as a good primer in laying out data for creating column charts (including stacked barcharts) in Excel, it is good practice to be more thoughful in the final presentation of your data. In particular, it is recommendable to include frequencies (often the “values” from your Excel table) inside the bars and percentages along the vertical axis. This can be achieved through good use of the menu options provided during the construction of your chart. (To apply some cosmetic surgery, try right-clicking and double-left-clicking on various regions of your chart. Also, with the above changes in place, it makes sense with stacked barcharts to ensure that all columns are of the same height. The range of column charts to choose from in Excel allows for this possibility (choose ‘100% stacked column’).

Plotting Multiple Data Series in Excel

You may wish to plot multiple data series on the one graph. This can extend to including different types of plot on the same graph. For example, you may wish to plot a curve for a nomogram corresponding to actual patient data with the existing patient data presented as points in the form of a scatter-plot alongside this curve. The tutorial Microsoft Excel: Plotting Multiple Data Series in Excel is a very good training resource in this area. Be aware, though, that if you have arranged your data in a particular way for future analysis in SPSS, there is no need for you rearrange these data into separate tables as in the tutorial. The secret is to select the appropriate portion of your data (including when your data are arranged as multiple columns in parallel), not to move the data around.

Often with a plot like the one in the case-study like the one described above, the curve is a standard one which you wish to reproduce  for comparative purposes, and you are really creating the correspondingscatter-plot for these data in the first instance so that you can fit the curve. You don’t want the reader to see that you have manipulated Excel for your purposes and would like to hid the data points for the curve and just  display the curve.  This is easy (see Excel | How to change the dots on the line graph?).

Related idea – plotting two sets of data with a secondary axis
Sometimes, you may wish to see if the behaviour of a particular variable (e.g., referral rate for liver transplants) is consistent with that of another potential explanatory variable (e.g., mean age), where both variables are defined at the same time points (e.g., for the same years). However, you may be puzzled about how to graph these variables on the sample plot. In Excel this can be achieved rather easily using a secondary axis. To learn about this, you can have a look at the video How to plot two data sets of different scales in same graph in excel or Adding a Second Y axis.

Using  Excel functions

There are many built-in functions within Excel which allow you to perform operations on columns of data within your spreadsheet.  An example is the function AVERAGE which is used in the tutorial Microsoft Excel: Plotting Multiple Data Series in Excel, above.  Also, a large variety of functions is presented under the Excel tab Formulas.

If you sense you need a primer on Excel formulae, I strongly recommend that you consider the resource Excel 2010: Absolute Beginners. The latter resource covers Excel functions very nicely and once you have mastered this skill, your confidence should increase. As this resource is not currently accessible and I am waiting to hear back from the relevant division of the University of Edinburgh about this, here is an alternative one: 12 Excel Functions Made Easy.

Using the function ‘IF’ in Excel

One particularly handy function for allowing you to combine data for multiple columns of patients according to one formula while treating the data for remaining patients in these columns differently, is the function ‘IF’. For example, you may wish to subtract two columns of times and dates for patients who have overdosed to determine time from overdose to presentation, but there may be some patients for whom this does not make sense because their overdoes has been cumulative rather than tied to a discrete time and date.  For the latter patients, you may wish to create a blank for time to presentation while calculating this time for the remaining patients. Have a look at the resource on the function IF under Performing Calculations with the IF Function, including those under Related Articles on the IF Function, to see how this function can best suit your purposes.

Q.1. I am in the process of preparing questionnaire response data for analysis in SPSS. The data are currently in Excel.  One of my survey questions required clincians to specify the nature of their employment through choosing from a list of options all categories which were applicable.  This meant that some respondents opted for multiple responses. Currently, I have indivudual ‘yes/no’ columns for each of the listed categories. However, I would like to create a single summary column which codes each respondent according to the combination of responses they chose from the list.  As there was a considerable number of respondents, I would like to find an efficient way of applying the coding process in Excel. Have you any recommendations? 

A. Yes, indeed. You should create a logical function in Excel that assigns a different code to each permutation of choices.To help you on your way, I recommend you consider the resource at http://www.techonthenet.com/excel/formulas/if_nested.php and note that the example commencing “If column B = Ross” and column C = 8″ should be really helpful for your needs. In particular, see if you can create a similar  function to that in the example which will allow you to create a new column as described below. Now create a logical function in Excel that assigns a different code to each permutation of choices. For example, a respondent who chose only ‘Clinical Practice’ might be assigned the code ‘1’ while a  different respondent who chose only ”Clinical Practice’ and ‘Academic Teaching’ might be assigned the code ‘7’ and so on.

You may want to use this technique from time to time with response data, so it is a handy one to be aware of.

However, be sure to read the preceding parts of the resource before progressing to the above example!

Formatting in Excel

It is good  to be aware that you can control how a column of data are formatted within Excel. Usually, Excel will decide on the formatting of data within each column for you and get it right. However, when previously collected data are being read into Excel from a pre-existing spreadsheet, inconsistencies in format can even arise across cells of a single column.  (For example, Excel may be interpreting the data in a few cells by means of one of the formats for date, whereas this is not what you wanted. Formatting errors can lead to problems further down the line if you try to read your data into SPSS for statistical analysis.

A further formatting issue to be aware of is that for Excel, the format category ‘Number’ only makes sense with real numbers. Therefore, the only entry commencing with ‘0’ which Excel will recognize under this category is ‘0’ itself.

If this has been causing you problems because you have several multi-digit entries in a column which ought to commence with a ‘0’, have a think about what these data are meant to signify. Most likely, they represents identifiers, e.g. for patients.  In that case, they are not real numbers, they are really just strings of digits. Excel is relying on you to manually format these data as ‘Text’, as by default Excel assumes that all strings of numbers are meant to be real numbers.  Similarly, if Excel has assumed the format category ‘General’ for your data, it will need some further help to understand what you are trying to do with these strings, so the same solution applies!

If you need some help in finding the formatting tool in Excel, please refer to Displaying a zero at the beginning of a string in Excel. I would particularly recommend the solution listed under item 1. within this resource.

Filtering data in Excel

There is wide range of possible filtering activities that you may wish to engage in as part of the process of data management in Excel.  One such activity is selecting a subset of your data based on specific values of a given variable, where these values can be stored in a separate sheet of your Excel book. For this purpose, you should find the tutorial How To Filter Rows Based On A List Selection In Another Sheet? most useful.

CC BY-NC-ND 4.0 Excel – handy guides, tutorials and tips by Margaret MacDougall is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

The WordPress site for supporting undergraduate medical student learning in statistics for short research projects

  • If you are visiting StatsforMedics for the first time, welcome!
  • Please take time to visit the page SCOPE OF SITE (see menu bar, below) for advice on how to make best use of the site and how to contact me.
  • University of Edinburgh undergraduate medical students: feel free to contact me if you need further assistance with your *curricular* activities.