Collecting Data
In the sciences we collect data by sampling individuals from a population, then using that sample to
infer about the population as a whole. The picture to the right (from: http://simon.cs.vt.edu/SoSci/converted/Sampling/) correctly illustrates the process, but note that many websites are mistaken about this.
The ideal way to sample a population so that your data are statistically valid is to take a random sample, where every individual in the original population has an equal probability of being sampled. The best way to take a proper random sample is to number every individual in the population then to use a random number table to pick which individuals to sample. In practice that is usually impossible so we often use other methods to take a random sample, as follows:
- select every nth individual encountered, where n is determined from a random number generator (or table)
- randomly select locations to sample within the population distribution
- divide the population into equal groups or equal ranges and select random individuals within each section, either by random locations or some other sampling method
Many other kinds of samples are sometimes (often?) called random, but they are not and should be referred to correctly:
- haphazard sample: you just guess which individuals to include in your sample—this method is often called random but it’s not
- convenience sample: you sample the individuals that are readily available to you
To get some random numbers for your sampling, go to http://www.random.org/sequences/ where you can generate a random sequence of numbers between any two numbers. Note that this method ensures that each number is listed only once. Another random number generator at that site (here) samples with replacement such that you might get the same number more than once. That whole site (here) is an excellent source of information about random numbers and random number generators