# Statistics Review: Mean, Median, Mode, and Range## Mean (Average)The mean is calculated by adding up all the numbers in a set and then dividing by the count of those numbers.- **Formula:** Mean = (Sum of all values) / (Number of values)- **Example:** For the set [2, 3, 5, 7, 11], Mean = (2+3+5+7+11)/5 = 28/5 = 5.6## MedianThe median is the middle number in a sorted, ascending or descending, list of numbers. If there is an even number of numbers, the median is the average of the two middle numbers.- **Steps:** 1. Sort the numbers in ascending order. 2. If the count of numbers is odd, the median is the middle number. 3. If the count is even, the median is the average of the two middle numbers.- **Example:** For [2, 3, 5, 7, 11], Median = 5. For [3, 5, 7, 9], Median = (5+7)/2 = 6.## ModeThe mode is the number that appears most frequently in a data set.- **Note:** A set of numbers may have one mode, more than one mode, or no mode at all.- **Example:** In [1, 2, 2, 3, 4], the mode is 2.## RangeThe range is the difference between the highest and lowest values in a set.- **Formula:** Range = Highest value - Lowest value- **Example:** For [2, 3, 5, 7, 11], Range = 11 - 2 = 9Remember, these statistical measures give different insights into the distribution of a data set, and it's important to know when to use each one for analysis.# Calculating the 5-Number Summary for a DatasetThe 5-number summary provides a quick overview of the distribution of a dataset and includes the following values: minimum, first quartile (Q1), median, third quartile (Q3), and maximum.## Steps to Calculate When Averaging is RequiredGiven a dataset where Q1, the median, and Q3 all require averaging two numbers, follow these steps:1. **Sort the Data**: Arrange the data in ascending order.2. **Find the Median**: Since the dataset has an even number of observations, the median will be the average of the two middle numbers.3. **Find the First Quartile (Q1)**: - Divide the sorted dataset into two halves at the median. - For the lower half (excluding the median if the dataset had an odd number of observations), find Q1 as the average of the two middle numbers.4. **Find the Third Quartile (Q3)**: - Using the upper half of the dataset (excluding the median if the dataset had an odd number of observations), find Q3 as the average of the two middle numbers.5. **Find the Minimum and Maximum Values**: These are the smallest and largest numbers in the dataset, respectively.## ExampleConsider the dataset: [3, 7, 8, 12, 13, 14, 18, 21]- **Minimum**: 3- **Q1**: Average of 7 and 8 = (7+8)/2 = 7.5- **Median**: Average of 12 and 13 = (12+13)/2 = 12.5- **Q3**: Average of 14 and 18 = (14+18)/2 = 16- **Maximum**: 21## SummaryThe 5-number summary for this dataset is: Minimum = 3, Q1 = 7.5, Median = 12.5, Q3 = 16, Maximum = 21.This summary provides a concise description of the dataset, highlighting its central tendency and dispersion.# Calculating the Mean of a DatasetThe mean, often referred to as the average, is a measure of central tendency that summarizes the central value of a dataset.## Steps to Calculate the Mean1. **Sum the Values**: Add together all the numbers in the dataset.2. **Count the Values**: Determine the total number of values in the dataset.3. **Divide**: Divide the sum of the values by the count of the values. This gives you the mean.- **Formula**: Mean = (Sum of all values) / (Number of values)## ExampleConsider the dataset: [4, 8, 15, 16, 23, 42]- **Step 1**: Sum the Values = 4 + 8 + 15 + 16 + 23 + 42 = 108- **Step 2**: Count the Values = There are 6 values in the dataset.- **Step 3**: Divide = 108 / 6 = 18## SummaryThe mean of the dataset [4, 8, 15, 16, 23, 42] is 18.The mean provides a useful measure of the overall trend of the dataset but can be influenced by outliers or extremely high or low values. It's important to consider this, especially when the dataset is not symmetrically distributed.# Outlier Detection in a Dataset Using the IQR MethodDetecting outliers is crucial for accurate statistical analysis. Outliers can significantly affect the mean and standard deviation of a dataset, leading to misleading conclusions. One common method for identifying outliers is the IQR method. Here's how you can use it:## Problem StatementGiven a dataset, determine if it has any outliers using the $1.5 \times (Q3 - Q1)$ rule.## Steps to Solve1. **Sort the Data**: Arrange the data in ascending order.2. **Calculate Q1 and Q3**: - **Q1 (First Quartile)**: The median of the first half of the dataset. - **Q3 (Third Quartile)**: The median of the second half of the dataset.3. **Calculate the IQR**: - **IQR** = $Q3 - Q1$4. **Determine the Outlier Range**: - **Lower Bound** = $Q1 - 1.5 \times IQR$ - **Upper Bound** = $Q3 + 1.5 \times IQR$5. **Identify Outliers**: - Any data point below the Lower Bound or above the Upper Bound is considered an outlier.## ExampleConsider the dataset: \[5, 7, 9, 12, 15, 18, 19, 21, 25, 29\]- **Step 1**: Data is already sorted.- **Step 2**: Calculate Q1 and Q3 - Q1 is the median of \[5, 7, 9, 12, 15\], which is 9. - Q3 is the median of \[18, 19, 21, 25, 29\], which is 21.- **Step 3**: IQR = \(21 - 9 = 12\)- **Step 4**: Calculate outlier range - Lower Bound = $9 - 1.5 \times 12 = -9$ - Upper Bound = $21 + 1.5 \times 12 = 39$- **Step 5**: Identify outliers - All data points are within the range \([-9, 39]\), so there are no outliers in this dataset.## ConclusionUsing the $1.5 \times (Q3 - Q1)$ rule, you can easily identify outliers in a dataset. This method is robust and widely used in statistical analyses to ensure the reliability of the conclusions drawn from the data.# Outlier Detection in a Dataset Using the IQR MethodOutliers can significantly affect the analysis and interpretation of data. The $1.5 \times (Q3 - Q1)$ rule, known as the Interquartile Range (IQR) method, is a popular approach to identifying outliers. Here’s how to apply this method step by step.## Problem StatementIdentify any outliers in a given dataset using the $1.5 \times (Q3 - Q1)$ rule.## Steps to Solve1. **Sort the Data**: Organize the data in ascending order.2. **Calculate Q1 and Q3**: - **Q1 (First Quartile)**: The median of the lower half of the dataset. - **Q3 (Third Quartile)**: The median of the upper half of the dataset.3. **Calculate the IQR**: - **IQR** = $Q3 - Q1$4. **Determine the Outlier Range**: - **Lower Bound** = $Q1 - 1.5 \times IQR$ - **Upper Bound** = $Q3 + 1.5 \times IQR$5. **Identify Outliers**: - Data points below the Lower Bound or above the Upper Bound are considered outliers.## ExampleConsider the dataset: $[1, 5, 7, 9, 12, 15, 18, 19, 21, 25, 29, 100]$- **Step 1**: The data is already sorted.- **Step 2**: Calculate Q1 and Q3 - Q1 is the median of $[1, 5, 7, 9, 12, 15]$, which is 8. - Q3 is the median of $[18,19, 21, 25, 29, 100]$, which is 23.- **Step 3**: IQR = $23 - 8 = 15$- **Step 4**: Calculate outlier range - Lower Bound = $7 - 1.5 \times 18 = -20$ - Upper Bound = $25 + 1.5 \times 18 = 52$- **Step 5**: Identify outliers - The data point 100 is above the Upper Bound, identifying it as an outlier.## ConclusionUsing the $1.5 \times (Q3 - Q1)$ rule, we detected 100 as an outlier in the dataset. This method helps in identifying data points that are significantly different from the rest, ensuring a more accurate analysis by considering the effect of outliers.# Two way tables### Problem: Analyzing Student PreferencesA class survey was conducted to find out the students' preferences for pizza and their choice between co*ke and Pepsi. The results were compiled into a two-way table as follows:| | **co*ke** | **Pepsi** ||----------------|:--------:|:---------:|| **Likes Pizza** | 15 | 10 || **Doesn't Like Pizza** | 5 | 2 |Based on the table above, answer the following questions:1. How many students in total participated in the survey?15+10+5+2=32 students participated in the survey.3. What is the total number of students who like pizza?15+10=255. How many students prefer co*ke over Pepsi?15+5=207. Among the students who like pizza, what percentage prefers co*ke?15/(15+10)*1009. Which is less popular among the students who don't like pizza, co*ke or Pepsi?Pepsi.11. If a student is chosen at random, what is the probability that this student likes pizza and prefers Pepsi?10/3213. Compare the preferences for co*ke and Pepsi among all students. Which beverage is more popular?co*ke is preferred whether the students like pizza or not.These questions are designed to test your ability to interpret data from a two-way table and apply basic principles of probability and percentage calculation.
0 103
Read more
Read more from chillmathtutor
Published on HackMD
Sign in
or
By clicking below, you agree to our terms of service.
Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox
Sign in with Wallet Wallet ( )Connect another wallet
New to HackMD? Sign up