Lesson 4: Variability
Lesson 4: Variability
- Quanify spread == work out "spreadoutness" of distribution
- Outliers increase the variability of data
- Dealing with outliers:
- Cut off tail of data - lower 25% and upper 25%
- Called the 'Interquartile Range' or IQR
- Cut off tail of data - lower 25% and upper 25%
- IQR
- 50% of data falls within IQR
- It's not affected by every value in dataset like outliers
- Outlier formula:
outlier < (q1 - 1.5 * iqr)
outlier > (q3 + 1.5 * iqr)
- Boxplots (aka box-and-whisper plots)
- Variance:
- Mean of squared deviations
sum(each deviation_from_the_mean**2) / sample_count
- Standard deviation (lower-case sigma)
sqrt(variance)
- Properties of std dev
- ~68% of data falls within 1 std devs of the mean in either direction
- ~95% of data falls within 2 std devs of mean in either direction
- Bessel's correction
- Samples tend to be values in the middle of population
- Variability in sample will be less than in population
- Instead of dividing by n, divide by n-1 when calculating variance and std dev of a sample
- Called 'sample standard deviation'