## Lesson 4: Variability

# Lesson 4: Variability

- Quanify spread == work out "spreadoutness" of distribution
- Outliers increase the variability of data
- Dealing with outliers:
- Cut off tail of data - lower 25% and upper 25%
- Called the 'Interquartile Range' or IQR

- Cut off tail of data - lower 25% and upper 25%
- IQR
- 50% of data falls within IQR
- It's not affected by every value in dataset like outliers

- Outlier formula:
`outlier < (q1 - 1.5 * iqr)`

`outlier > (q3 + 1.5 * iqr)`

- Boxplots (aka box-and-whisper plots)
- Variance:
- Mean of squared deviations

```
sum(each deviation_from_the_mean**2) / sample_count
```

- Standard deviation (lower-case sigma)
`sqrt(variance)`

- Properties of std dev
- ~68% of data falls within 1 std devs of the mean in either direction
- ~95% of data falls within 2 std devs of mean in either direction

- Bessel's correction
- Samples tend to be values in the middle of population
- Variability in sample will be less than in population
- Instead of dividing by n, divide by n-1 when calculating variance and std dev of a sample
- Called 'sample standard deviation'