· Q 1. I have downloaded data from a patient database on foetal scans into an Excel spreadsheet. I would like to select only those patients who satisfy both of the following criteria: scanned between 01/01/2006 and 30/11/2006 and scan result was normal. Is there an efficient way to carry out this procedure in Excel?
A. Yes; indeed there is! Please refer to the tutorial Filtering Data to see how the Custom option from the Autofilter drop-down box can be used to deal with the dates. Once you have filtered according to date, you can copy-paste the filtered data into a new Excel sheet and proceed to the next stage of filtering according to scan result. For this step, apply the Autofilter to your new spreadsheet and use the drop-down menu for the column of scan results to choose the category for normal scan result. Now copy-paste your filtered data into a new Excel spreadsheet and you should have exactly the data you require!
. Q 2. I have a dataset from which I wish to obtain a simple random sample. Can this be done conveniently by means of a statistical package?
A. Yes, if you wish to carry out the procedure in Excel, please refer to the tutorial Getting a Random Sample (steps 1 to 15), while choosing ‘Decline’ in response to any Cookie invitations. I suggest that you also save the sampled data under a new filename.
The package SPSS can also be used very easily to select a simple random sample (see the video Random select cases from the dataset).
Advice when using SPSS
1. Should you choose the approach of selecting a certain frequency rather than a % of cases, I would advise against you doing so from a select number of cases other than the sample as a whole, unless you have a well justified reason for doing so. Unknown to you, some bias may be hidden in your sample relating to the ordering of values, so that for example, if you choose a random sample from the first 100 values and your sample size is greater than 100, you may be selecting a random sample from a non-representative sample of the population as a whole. In short, select the fixed number of individuals from the sample as a whole unless you have good reason not to.
2. The If condition is satisfied option in the dialogue for random sample can be used in the first instance for excluding specific categories of entries in a spreadsheet (e.g. missing data) from the random sampling process.
. Q 3. What is the difference between simple and systematic random sampling?
A. Helpful advice can be obtained under The Difference Between Simple and Systematic Random Sampling.
. Q 4. I want to use a more sophisticated approach to random sampling whereby I am able to select pharmacies randomly from different regions of Lothian. Is there a nifty way of doing this by means of a software package?
A. The procedure to which you are referring is stratified random sampling, where the regions are the strata from which you wish to sample. The guide Survey Tips – Random Samples and Statistical Accuracy covers a lot of extremely useful ground, including what is involved in stratified random sampling. For the example in question, this covers estimating how many pharmacies to select from each stratum and how to perform the sampling process in Excel. Please consider reading this guide carefully and note that its use can be extended to cases where, on account of limited time, you are forced to create a stratified random sample to conduct a case-control study.
Note that the size of strata can be pre-determined by means of a fixed count or proportion or by varied counts or proportions (e.g. so as to reflect the distribution of the original population). For this purpose you should find the SPSS complex samples sampling wizard particularly helpful. This wizard will also enable you to conveniently use frequencies from the online calculator referred to in the above guide in order to arrive at your desired sample.
Tips on use of Complex Sample Preparation Wizard
1. To access instructions through a worked example on how to access and use the Wizard in SPSS, open up SPSS and select Case Studies from the menu Help.From the left-hand margin, follow the sequence of steps Case Studies > Complex Samples Option > Complex Samples Sampling Wizard > Obtaining a Sample from a Full Sampling Frame. The use the command ‘Next’ on the middle page to work your way through the tutorial.
2. Note that this tutorial is used for illustrative purposes only. For example, unless you are clear that you are also intending to use cluster random sampling, there is no need for you to complete the box entitled Cluster.
3. If you intended to create a new complex analysis sampling plan don’t be put off by the button Browse, as this button is simply inviting you to select the folder for saving your new plan. Once you have done so, you can give the plan a file name. Also the wizard will take you through the steps needed for creating the plan.
4. As you proceed through the wizard, you will see that it is possible to stratify according to a first variable (e.g.diabetic status) and then in turn, according to a second variable (e.g.year of clinic), or vice versa. If you only wish to stratify according to one variable, then you will not need to proceed to step 2 of stratification in the wizard. This is one of several options you can therefore leave as default when whizzing through the wizard.
5. It is likely that you will want to save your new sample in a separate SPSS file. Therefore, be sure to say when working your way through the Wizard. See Image from Complex Samples Sampling Wizard to keep you right!
You may also find it useful to consider the advantages and disadvantages of stratified random sampling by referring to Statistics Tutorial: Stratified Random Sampling.
· Q 5. I have far too many variables in my SPSS file and I only want to select a subset of them in order to carry out my analysis. Is there a way of deleting all of the unwanted columns (containing the values of these variables) simultaneously.
A. Yes, this is explained in the movie beside the title Managing data in the list of SPSS Class Notes provided by the UCLA Institute for Digital Research and Education.
· Q 6. How can I delete or ignore patients with specific variable values in my SPSS data file so as to form a subset of the original dataset for analysis.
A. There is a very useful menu option ‘Select Cases’ within SPSS which can serve you well. You can learn how to use this tool and lots more by using the movie
NB. However, I recommend that you decide whether you wish to delete the unwanted data and create a new spreadsheet with these data or temporarily filter out the data you wish to work with. (If you intend only to filter, then make sure you select the option ‘Filter’ rather than ‘Delete’ as illustrated in the movie.)
. Q 7. What if I want to select multiple categories of the one variable from a large list of categories in SPSS and ignore or delete the others?
A. The movie Select Cases – SPSS (Part 1) ought to be of particular interest if you are in a position where you wish to filter your data to consider only those cases where a given variable assumes any one of two or more values. For example, you may have a categorical variable for patient compliance with more than two categories but intend to look at only two non-consecutive categories (e.g. 1 and 3) at a time for the purpose of calculating odds ratios. This is where the or operator, ‘|’ can come in handy. In the video example, the intention is to consider questionnaire responses from undergraduate and PhD students only in a context where there are responses for more than two educational levels. Please consult the movie to learn how this task can be very conveniently managed. Don’t forget to select the option ‘Select all cases’ to undo your filtering once all relevant analyses have been performed on flltered data.
. Q 8. I need to perform a lot of data processing in Excel before entering SPSS. This includes filtering the data to ensure I include only those patients which have an all growth plate fracture. To achieve this, I have several yes/no columns for classification of a growth plate fractions where I have used the entries ‘1’ and ‘0’ to denotes presence or absence of a specified criterion for this type of fracture. Presence of at least one of the criteria is adequate for the patient to have a growth plate fracture. What are the next steps that I should take to obtain the subset of data I require in Excel?
A. Please consult the
brief video on filtering data and saving exclusively the filtered data in Excel. To make best use of the video, I would recommend that for the example in the above question, you form a column of totals by adding the different classification columns for fractions. This can be achieved using the Excel function SUM. You can then select all non-zero entries for this final column and ignore zero entries when filtering the data.
If you are not confident about using the Excel function SUM, go to the menu ‘Help’ in Excel and type in ‘SUM’ as your search term. The material at the first few links in the list of search returns should prove particularly helpful.
Working With Subsets of the Original Dataset – Filtering Data by Margaret MacDougall is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.