In my previous , I mentioned the z-scores method for valuing players. In this column, I explain the concepts underlying z-scores and show why z-scores are useful. This will be part one of a two part series.
The formula to calculate the z-score of
is
where
is the raw score (e.g. a score on a test),
is the population mean (or true mean), and
is the population standard deviation (or true standard deviation)*.
*Technical Note: In statistical theory, there is an important distinction between true values and sample values. In short, we almost always do not know the true mean or true standard deviation; instead, we approximate the true mean and true standard deviation by the and the sample standard deviation (defined , section 2.4.2).
The z-score of
involves taking
’s distance from the mean (the
-
part), and dividing it by the standard deviation. Thus, the z-score of
is the number of standard deviations from the mean
is.
To understand the implications of the z-score, it helps to consider the normal distribution. For data following a normal distribution, approximately 68% of all data points will fall within 1 standard deviation of the mean (i.e. between
-
and
+
). This means 100% - 68% = 32% of all data points fall outside of that range (see rule).
The data points outside of that range are either more than one standard deviation below the mean or more than one standard deviation above the mean. Since the normal distribution is symmetric, half of those data points (16%) are more than one standard deviation above the mean (i.e. have a z-score > 1).
A similar process shows that 2.5% of all data points will be more than 2 standard deviations above the mean (i.e. have a z-score > 2), and 0.15% of all data points will be more than 3 standard deviations above the mean (i.e. have a z-score > 3). Hence, the z-score of
gives a sense of the percentage of data points below
.
Consider the following situation. A student takes two tests (tests A and B), both of which are scored from 0 to 100. The scores for both tests follow a normal distribution with mean 60. However, test A has a standard deviation of 15, while test B has a standard deviation of 20. If the student scores an 80 on both tests, which test did he do better on?
If we define better in terms of his score compared to other people’s score, then we can answer the question using z-scores. The student’s z-score for test A is 1.33, while his z-score for test B is 1.00. Thus, he did better on test A. As this example illustrates, z-scores are a way of turning raw performance into relative performance.
In my next column, I will describe how z-scores can be used for fantasy baseball, followed by a walk-through for calculating a player’s value.