Project Planning: Sampling

A sample is used to randomly identify a subgroup of faculty, staff, students, alumni, and other stakeholders that the researcher plans to study for generalizing different populations at the University of Minnesota. 

It is inappropriate to send out multiple survey invitations and reminders to a large number of recipients. This leads to low response rates and institutional survey fatigue, which negatively impacts data quality across the University. This section will provide information for you to consider when pulling or requesting a sample.

Representative Sampling 
A representative sample is when a sample is an accurate, proportional depiction of the population under study. There are two techniques used to attain representative samples: randomization and stratification. Below is an example of each.

Randomization 
If you want to study the attitudes of U of M students regarding student services, it would not be enough to interview every 100th person who walked into Coffman Memorial Union. That technique would only measure the attitudes of U of M students who go to the union, not those who do not. In addition, it would only measure the attitudes of U of M students who happened to go to the union during the time you were collecting data. Therefore, the sample would not be very representative of U of M students in general. In order to be a truly representative sample, every student at the U of M would have to have had an equal chance of being chosen to participate in the survey. This is randomization.

Stratification 
If you took a list of U of M students, uploaded it onto a computer, and then instructed the computer to randomly generate a list of students, your sample still might not be representative. What if, purely by chance, the computer did not include the correct proportion of seniors or graduate students? If these fields are of interest to you and you want the sample to be more representative, you might want to use a sampling technique called stratification. 

In order to stratify a population, you need to decide what sub-categories of the population might be statistically significant. For instance, graduate students as a group probably have different opinions than undergraduates regarding student services, so they should be recognized as separate strata of the population. Once you have a list of the different strata, along with their respective percentages, the sample would be ensuring that a certain percentage is graduate students and a certain percentage is seniors. You would then come up with a more truly representative sample.

Controlling Marginal Error 
Anytime you survey a portion of a population there will be some margin of error in the results. However, you can control your level of error mathematically by using a specific confidence interval and sample size.

To do this, you must:

  1. Define the size of the target population.
  2. Determine your desired level of error.
  3. Determine your desired level of confidence.
  4. Calculate the sample size.

The level of error is measured as a percentage, as is the level of confidence. The level of confidence represents how confident you feel about your error level. For example, if you have a 95% confidence interval with an error level of 4%, you are saying that if you were to conduct the same survey 100 times, the results would be within +/- 4% of the first time you ran the survey 95 times out of 100.

The tables below give examples of different sample sizes at a 95% confidence interval. 

Population Infinite 10,000 5,000 1,000
Error level 3% 3% 3% 3%
Confidence interval 95% 95% 95% 95%
Sample size 1067 964 879 516
Population Infinite 10,000 5,000 1,000
Error level 4% 4% 4% 4%
Confidence interval 95% 95% 95% 95%
Sample size 600 566 536 375

Population Infinite 10,000 5,000 1,000
Error level 5% 5% 5% 5%
Confidence interval 95% 95% 95% 95%
Sample size 384 370 357 278

Custom Insight also offers an online Survey Random Sample Calculator that calculates how many respondents are needed for a survey, how many people to send a survey to, and the accuracy of survey results.

Duplication 
The most common sampling error is duplication, or instances where one of the target population elements is overrepresented. This can produce biased survey results.

There are three ways to reduce or eliminate duplication as a source of sampling error.

  1. Eliminate duplicates from the sample. This can be done by sorting the samples and removing any duplicate information through Microsoft Excel.
  2. Establish a rule for handling duplicates. This occurs during selection and/or data collection. One example rule could be using only the first entry on the list, then identifying and removing its duplicate.
  3. Weight responses known to be duplicates. When analyzing a sample, a researcher can weigh the duplicates to ensure that these elements are not overrepresented in the data without eliminating responses. For example, if there are four known duplicates, each could be weighted by .25.