The statistical mean is used to describe the central tendency of a data set in question. When working with a large data set, it can be useful to represent the entire data set with a single value that describes the “middle” value of the entire set. This type of calculation allows to achieve a more accurate result than a result obtained from a single experiment.
Calculate the mean
In mathematics and statistics, the term arithmetic mean is preferred over simply “mean” because it helps to differentiate between other means such as geometric and harmonic mean. An arithmetic mean is calculated as following:
Hence, to calculate the mean, add up the values in the data set and then divide by the total count of values added. For instance, to find the mean of the following set of numbers: 21, 23, 24, 26, 28, 29, 30, 31, 33
- First add them all together:
21 + 23 + 24 + 26 + 28 + 29 + 30 + 31 + 33 = 245
- Then divide by the number of items in the set:
245 / 9 = 27.222
The statistical mean can be affected by extreme values in the data set and therefore be biased. For instance, let’s suppose we have the values:
10, 10, 20, 40, 70.
If we add 1500 to the set, the mean becomes:
10 + 10 + 20 + 40 + 70 + 1500 / 6 = 275, which is a poor reflection of the center of the set.
“Means” commonly used in Statistics
- Mean of the sampling distribution: used with probability distributions, especially with the Central Limit Theorem. It’s an average of a set of distributions.
- Sample mean: the average value in a sample.
- Population mean: the average value in a population.
Mean from a Frequency Table
For large data sets in which values are often repeated, the data is usually reported in a frequency table that shows how often each value occurs. For instance, let’s consider the following frequency table.
The sum of the frequencies is 100, which is the total number of data values. If we would compute the mean in the classical way, we should add up all 100 separate values and then divide by 100.
A quicker method is to consider the frequencies as weights for each value. For instance, the data value 10 has a weight of 3, because it occurred 3 times.
By extending the frequency table to include the weighted values and the totals, we compute the mean using only the table, instead of having to add all 100 separate values.
The mean for this data set is:
If X is a nonnegative random variable and a > 0, then the probability that X is at least a is at most the expectation of X divided by a:
, where and cover all possible values of .
, shows lower and upper bounds on and , respectively, for which the equation is still true.
There is a practical motivation behind Markov’s inequality, and it can be posed in the form of a simple question: How often is the random variable X“far” away from its “central value”?
Intuitively, the “central value” of X is the value that is most frequently observed. Thus, as X deviates farther and farther from its “central value”, we would expect those distant-from-the-central values to be less frequently observed.
The expected value, E(X), is a measure of the “central” of X. Thus, we would expect that the probability of X being very far away from E(X) is very low. Indeed, Markov’s inequality rigorously confirms this intuition:
As c becomes really far away from E(X), the event X becomes less probable.
We can prove this by substituting several key values of c.
- If c=E(X), then ,this is the highest upper bound that can get. So X is going to be frequently observed near its own expected value.
- If , then . By Kolmogorov’s axioms of probability, any probability must be inclusively between 0 and 1, so . Thus, there is no way that X can be bigger than positive infinity.