Sample size refers to the number of observations or data points selected from a population for a study, directly impacting the accuracy, reliability, and generalizability of research findings. A well-calculated sample size ensures statistical validity, reducing bias while maximizing precision.
Selecting the right sample size is crucial—too small, and the results may lack significance; too large, and resources may be wasted on unnecessary data collection. Researchers use specific formulas, incorporating variables like confidence level, margin of error, and population variance, to determine the optimal sample size for reliable conclusions.
Sample Size Definition
“Sample size” is a term used in market research to define the number of subjects included in a sample for a study. This sample is selected from the general population and is considered representative of the population for that specific study.
For example, if we want to predict how the population in a specific age group will react to a new product, we can first test it on a sample size that is representative of the targeted population.
In this case, the sample size refers to the number of individuals from that age group who will be surveyed.
How to Calculate a Sample Size
Determining the appropriate sample size involves using statistical formulas that begin with choosing a significant benchmark based on the expected outcomes of the qualitative research. Researchers typically have two main approaches to choose from:
- Variable-Based Sampling: Researchers can monitor the measurement of variables and determine specific indicators that express their evolution. For example, to understand consumer behavior, a researcher might track the frequency of visits to a retail store and use the weekly average frequency of visits as an indicator. In specialized literature, this method is referred to as sampling in relation to the variables investigated.
- Attribute-Based Sampling: This method may be aimed at evaluating specific attributes of the investigated marketing phenomenon. For instance, a researcher might examine consumer preferences regarding the interior layout of a retail space, assessing various attributes significant to interior design. This approach is known in the specialized literature as sampling with the investigated characteristics.
Improve your website and stop guessing.
• Choose the audience
• Apply the change
• See the results in real-time
Sample Size Formula
To apply these concepts practically, researchers use formulas that consider the expected effect size, the desired power of the test (commonly set at 80% or higher), and the acceptable level of significance (usually at 5%). These calculations help ensure that the sample size is neither too small to detect a real effect nor too large that it becomes inefficient.
One commonly used formula is:
N = population size • e = Margin of error (percentage in decimal form) • z = z-score
Another sample size formula is:
n = N*X / (X + N – 1),
Where,
- X = Zα/22 *p*(1-p) / MOE2,
- Zα/2 is the critical value of the Normal distribution at α/2 (for a confidence level of 95%, α is 0.05, and the critical value is 1.96).
- MOE is the margin of error.
- p is the sample proportion.
- N is the population size.
Note that a Finite Population Correction has been applied to the sample size formula.
Sample Size Calculation Example:
Suppose a researcher wants to conduct a survey to estimate the proportion of smartphone users in a city with a population size of 500,000. The researcher aims for a 95% confidence level and a margin of error of 5%. Using the formula:
X = ((1.96)^2 * 0.5(1-0.5)) / (0.05)^2
X = (3.8416 * 0.25) / 0.0025
X = 0.9604 / 0.0025
X = 384.16
After calculating X, the researcher then uses it in the sample size formula to determine the required sample size:
n = (500,000 * 384.16) / (384.16 + 500,000 – 1)
n ≈ 192,080,000 / 499,383
n ≈ 384.59
Thus, the researcher would need a sample size of approximately 385 respondents to achieve the desired level of confidence with a margin of error of 5%.
Sample Size Process
The sampling size process involves several specific activities, namely:
- Define the population that is the object of the research: Defining the target population must be done with great care to avoid either the tendency to choose an unjustified large population or the inclination to select an unjustifiably narrow population. For example, for companies that produce cars, the total population can be represented by the people of the whole country, including children of different ages. But, the relevant population, which will be the subject of the research, will be made up only of the population over 18 years old.
- Choose the sampling size frame: The sampling size frame, often derived from existing databases or lists, represents the population from which the sample will be drawn. It’s important to recognize that the sampling frame may differ from the target population due to practical constraints or limitations in data availability. For instance, while the target population for car buyers may include adults aged 18 and above, the sampling frame may consist of registered vehicle owners or individuals with active driver’s licenses.
- Choose the sampling size method: Selecting an appropriate sampling method involves considering the research objectives and characteristics of the target population. Random sampling, for example, ensures each member of the population has an equal chance of being selected, but the sampling frame may not always perfectly align with the target population. Thus, researchers must be mindful of potential biases introduced by the sampling method and take steps to mitigate them.
- Establish the modalities of the selection of the sample size units: Defining the criteria and procedures for selecting sample size units helps ensure the sample is representative of the target population. For instance, if conducting a survey on car preferences, eligibility criteria might include individuals who have purchased a car within the past five years. This helps ensure that the sample reflects current consumer trends and preferences within the target population.
- Determine the mother of the sample size: Understanding the broader context from which the sample is drawn is essential for interpreting research findings accurately. For example, while the target population for car buyers may include individuals aged 18 and above, the “mother of the sample” encompasses all potential car buyers within the specified demographic, including those not captured in the sampling frame.
- Choose the actual units of the sample size: Selecting specific individuals or groups for inclusion in the study involves applying the sampling method and criteria defined earlier. Researchers must ensure that the selected sample is representative of the target population and free from bias introduced by the sampling process. For example, if using random sampling, researchers might randomly select participants from the sampling frame to participate in the study.
- Conduct field activity: Implementing data collection procedures according to the research design is the final step in the sample size process. This may involve conducting surveys, interviews, or observations to gather relevant data from selected participants. Researchers must ensure that data collection activities adhere to ethical standards and minimize potential sources of bias or error.
The establishment of the sample implies the establishment of the sampling unit. The sampling unit is represented by a distinct element or a group of different elements within the investigated population, which can be selected to form the sample. The sampling unit may be a person, a family, a household, a company, a locality, etc. It is necessary to specify that the sampling unit is not always identical to the unit of analysis. For example, in the study of family expenses, the sampling unit may be the home or the household, and the unit of analysis may be a person or a family.
Important Definitions in Research
Margin of Error
The margin of error is the amount of accuracy you need. That is the plus or minus number that is often reported with an estimated percentage and can also be referred to as the confidence interval. It’s the range where the true population ratio is estimated to be and is frequently expressed in percentage points (e.g., ±2 percent ). After collecting your information, it’s important to note that the actual margin of error may vary from the estimated value. This variation is influenced by the population proportion rather than just the sample percentage, affecting the precision achieved.
Confidence Level
The confidence level is the probability that the proportion that is true is contained by the margin of error. In case the study was repeated and each time was calculated by the range, you’d expect the true value to lie inside these ranges on 95 percent of events. The higher the confidence level, the more certain you can be that the interval includes the true ratio.
Population Size
This is the entire number of individuals in your population. In this formula, we use a finite population correction to account for sampling from populations that are small. However, if your population is large, you may approximate it by using 100,000. The sample size doesn’t change significantly for populations larger than this.
Sample Ratio Definition
The sample proportion is what you expect the outcomes to be. This can often be set using the results in a survey, or by running small pilot research. If uncertain, it’s common to use 50%, which provides the largest sample size and is considered conservative. Notice that this sample size calculation uses the Normal approximation to the Binomial distribution. However, if the sample proportion is close to 1 or 0, this approximation may not be valid, and an alternative sample size calculation method should be considered.
Sample Size
This represents the minimum sample size you need to gauge the true population ratio accurately. It’s important to consider non-response rates; if there’s a chance of non-response and those individuals cannot be included in your sample, you may need to increase your sample size. Generally, a higher response rate improves the accuracy of the estimate, while low response rates can introduce biases into your results.
What Is Standard Deviation?
The standard deviation is a statistic that measures the dispersion of a dataset relative to its mean and can be calculated as the square root of the variance. It is calculated as the square root of variance by specifying the variation between each data point relative to the mean. If the data points are further from the mean, there is a higher deviation within the dataset; consequently, the greater the standard deviation of the data.
How to Determine the Sample Size?
We cannot test the entire population. The sample size is based on confidence intervals: we are interested in calculating the population parameter, in measuring the sample size. Therefore, we should establish the confidence intervals, so that the values of this sample lie inside that range.
Sampling addresses fundamental questions like “how?” and “how many?” The population encompasses all members of a specific community sharing a common characteristic, such as age group (e.g., youth aged 18-25) or student status.
Selecting an appropriate sample size involves balancing representativeness with practical constraints. While the population is theoretically infinite, we must work with finite samples in practice. The behaviors and scores observed within the sample are used to infer or estimate the behaviors and scores that would be observed if the entire population were studied.
The number of participants required for a representative sample depends on the research type:
- For correlational studies, around 30 participants are typically sufficient to approximate a normal distribution of data.
- Experimental and quasi-experimental research may require varying sample sizes depending on the study design.
- Descriptive research, such as surveys or studies on aviators, often requires a sample size equivalent to around 20% of the population. However, this percentage may decrease as the population size increases.
- For small populations (fewer than 100 individuals), the entire population may serve as the sample. For larger populations, a sample size of around 20% to 1% is common, depending on the research objectives and context.
image created with: Flyer Maker
Sampling Algorithms
Random Sample Size
(1) Identification and definition of the population
Ex. The population is made up of all 5000 school directors in a random country.
(2) Determining sample size (descriptive research)
Ex. The sample size will consist of 10% of the 5000 executives, resulting in 500 people.
If it is correlational or experimental, N = min 30.
(3) We make a list of all the members of the population.
Ex. All school principals are on the list
(4) A number is assigned to each listed. If we have up to a thousand people, the numbers from 000 are given, and the last one on the list will have 999; If we have 100 people 00-99.
Ex. On the list of directors, give numbers to each first will have 0000 and the last 4999.
(5) There are tables with random numbers, and then a name from the tables with random numbers is randomly selected.
Ex. From the table was chosen 53634 (out of 5 we do not consider that we have 5000 people).
(6) From the extracted number, all the numbers or how many numbers are required depending on the population from which we extract.
Eg. We have only 5000 people.
(7) If we have imprisonment at the set number, we enter it in the table on the sample size list.
Ex. Because there is the director with the number 3634in, we go into the sample size.
(8) Go to the next number in the column.
Variant: We choose the method of the ballot box if we do not agree with the process, that is, all the order numbers of the participants or their names are included in the ballot box, and we extract the number necessary for the preparation of the sample size.
Systematic Sample Size
It is established according to the type of research: descriptive, correlational
(1) Identification and definition of the population.
Ex. The population is made up of all 5000 teachers from a random region in a country.
(2) Determining sample size (descriptive research)
Ex. Suppose it is descriptive research, it turns out that 10% of the population = 500 people
(3) We make a list with all the members of the population
Ex. The 5000 teachers are arranged in alphabetical order; already, the list is not randomly made up, but the procedure is valid.
(4) Determine the parameter or step K = population size / sample size.
Ex. K = 5000/500 = 10
(5) It starts with a certain position at the beginning of the list.
Ex. Suppose I put my finger on the 3rd name (using the list directly).
(6) Starting with the chosen position, each K name is chosen.
EX. In our sample size: 3-13-23-33-etc.
(7) If the sample size was not made up by the end of the list, it would come back from the beginning;
Stratified Sample Size
(1) Identification and definition of the population.
Ex. To compare the efficiency of the two methods of training psychosocial competence in management according to the level of self-esteem, the population consists of 300 top managers from a random city.
(2) Determining the sample size (calculating sample size)
Ex. The sample size will be 45 managers for methods A and B
(3) The variable and the subgroups are established the layers for representing the representativeness (Equal number / Proportional number in each subgroup.
Ex. The desired subgroups are established based on three levels of self-esteem: medium, high, and low (age, level of training, male-female)
(4) The members of the population are divided into one of the established subgroups.
Ex.300 managers are classified according to their level of self-esteem: 45 with high self-esteem, 225 with average self-esteem, and 40 with low self-esteem.
(5) By simply sampling (we use the table with numbering in disorder or drawing in lots). The number of participants from each subgroup (proportional number) is established
Ex. We determine that from each layer, a number of 30 is extracted. Using the table with random numbers or drawings, we extract 30 managers with high self-esteem, 30 with average self-esteem, and 30 with low self-esteem. The 30 participants in each sample size were thus randomly distributed (half method A and half method B)
Multistage sample size
The selection of the participants who make up the sample size is made indirectly through the selection of the groups of which the participants are part.
(1) Identification and definition of the population.
Ex. The population is made up of all 5000 teachers from schools that are localized from a random region in a country.
(2) Determining sample size (Descriptive research)
Ex. Sample size = 10% = 500.
(3) Establish the logical type (Cluster)
Ex. The cluster is the school.
(4) The list containing the groups that make up the population is made
Ex. The list is made up of 100 schools from a random region in a country.
(5) The population number for each group is estimated. (Cluster)
Ex. Although the schools differ in the number of teachers, we chose only 50 from each school
(6) The number of groups is determined by dividing the sample size by the estimated size of the groups.
Ex.500 / 50 = 10.
(7) The number of groups is randomly selected through the table with random numbers or the ballot box.
Ex. We select 10 schools from the 100 schools from a random region in a country!
(8) All members of the selected groups are part of the sample size.
Ex. All teachers in the 10 schools are part of the sample size.
Let us conclude.
The best way to make a representative sample size is random sampling.
Sample Size Dimension and Sample Size Type:
Probability depends on the kind of research.
For correlational and experimental research, a number of 30 subjects are sufficient. For descriptive research, the sample size may vary depending on the population size, typically ranging from 1% to 10%.
Regardless of the specific technique used in the large sampling steps, they consist of:
- identification of the population
- determining the required sample size
- selection of participants
- data collection
Simple random sampling is the best way to obtain a representative or stabilized sample size if we have an exciting variant (ex: self-esteem).
The primary source of deforming tendencies in sampling is the use of the nonprobabilistic method.
Using non-standard techniques can often be challenging, if not impossible, to accurately describe the population from which the sample was drawn and to generalize the results from the sample to the entire population.
Dangers of Small Sample Size
We might be tempted to believe that a sample size obtained from a larger volume always produces more accurate results than one obtained from a smaller volume. However, this is not always the case.
In reality, a larger sample size is more likely to be accurate than a smaller one, but there is still a possibility that an average obtained from a larger sample size deviates significantly from the true average compared to one obtained from a smaller sample size. This occurrence is less likely, and the difference in volume between the two sample sizes affects the likelihood.
Reducing the equation to its extremes reveals that the significance level of a test can be achieved with a small sample size and a large effect size, or with a sufficiently large sample size and a small effect size. This raises questions about the relevance of research conclusions.
The systematic error results from factors that are not related to the sample size. These factors that generate the standard error are related to the imperfections of the sampling process, such as, for example, errors in the selection of the sample units, errors in the sampling frame, measurement errors, non-answers, answers that do not correspond to reality, the refusal to participate during the investigation, etc.
Customer Satisfaction Survey and Market Research
Customer satisfaction surveys do not depend on statistically significant sample size. These surveys must be accurate and have more precise answers. It is vital for you to carefully analyze every response a customer has given, in a customer satisfaction survey. All feedback, positive or negative, is important.
When it comes to market research, a statistically significant sample size helps a lot. These market surveys help to discover new information about customers and the market you want to activate. With this survey, you will receive the latest information about the target market and about the customers who would buy your services or products.
Calculating Sample Size for an Ab Test
Any experiment that involves statistical inference requires a sample size calculation done before such an experiment begins. A/B tests (split testing) are no exception. Measuring the minimum number of visitors required for an AB evaluation before beginning prevents us from running the test to get a smaller sample size, thus with an “underpowered” test.
We establish three criteria before we start running the experiment:
- The significance level for your experiment: A 5% significance level means that if you declare a winner in your AB evaluation, then you’ve got a 95% likelihood that you’re correct in doing so. It also suggests that you have a significant effect difference between the control and the variant with a 95% “confidence.” This threshold is, clearly, an arbitrary one and one when making the design of an experiment chooses it.
- Minimum detectable effect: The desirable, important difference between the prices you would like to find
- The evaluation power: the likelihood of detecting that the difference between the original rate and the variant conversion rates.
FAQs
The sample size is important in research because it determines the accuracy and reliability of the results. A larger sample size tends to produce more accurate and representative results, while smaller sample sizes may be subject to greater variability and uncertainty. The research question determines the appropriate sample size, the population being studied, and the desired level of statistical power and confidence.