There are several measures of centre but here we consider three:
i. Median
ii. Mean
iii. Mode
What is the median?
The median is the middle value
of a ranked data set - so that half of the data falls above it and
half below it.
This is easy if there is an odd number of data, but when there
is an even number of data you need to find the two central values,
add them together and then divide them by two to obtain the median.
Once again, let’s look at the student grades from the beginning
of this module.
T o restate the process: in this example, the total number of marks
is 33. The middle value of the ranked set is the 17th mark and so
the median mark is 62. Note that there are 16 data
values below 62 and 16 data values above 62 (16 + 1 + 16 = 33)
Consider a smaller data set of 8 values:
$32
$30
$40
$25
$22
$18
$31
$37
Rank the data (smallest to largest):
$18
$22
$25
$30
$31
$32
$37
$40
When there are an even number of data, the data set splits evenly
and the median is not a member of the data set.
In this case, the median will be at position 4_ - halfway between
the data in the 4th position ($30) and the data in the 5th position
($31). Therefore, the value (size) of the median is
Note that there are 4 values lower than $30.50 and 4 values higher
than $30.50
1. Identify the size of the data set (n).
2. Rank the values of the data set (usually lowest to highest).
3. Locate the position of the median – it is found at position
4. Last, identify the size (value) of the data value at that position
and quote it as the median..
Stem-and-Leaf Plot
Return to the student marks data set.
The structure that has been drawn here is a table but it is also
drawn in graphical form. In this type of frequency histogram
(frequency is really just another name for counts), data have been
collected into cells. This allows you to get an idea of the shape
of the distribution of the data. A stem and leaf plot [15]
is a shorthand way of doing the same thing without sacrificing information.
With a stem-and-leaf plot you must always include a statement about
the size of the data. In the example above, the stems are tens,
as shown in the key and the leaves are units (values of one). This
means the size of the ‘5’ in the stem is actually ‘50’.
And that stem really includes all marks from 50 to 59 inclusive.
But a ‘5’ in the leaf position is really.
Creating a Histogram from a Stem-and-Leaf Plot
If you were to convert each leaf beside a stem (also called a class)
into a rectangle, it would look something like this:
If you then rotated the histogram and removed the horizontal lines
separating the rectangles in each class, you would end up with a
classical graphical display of a histogram.
The height of the rectangles above each class is proportional to
the number of data values that fall into that class.
ii. Finding the mean
The mean can be described as the arithmetic average.
Statisticians use symbols and equations to show how the mean can
be calculated.
Don’t be put off by this equation. Remember, to calculate
the mean is to calculate the mathematical average. Therefore, essentially
you are adding together all the measurements and then dividing that
total by the number of measurements. For this set of student marks
the total number of measurements is 33. The sum of these 33 measurement
values is 1964. The mean is calculated by dividing 1964 by 33 and
is 59.58. Rounding gives a mean of approximately 60. It is often
useful to round statistics, especially summary statistics such as
the mean, for presentation purposes.
Question:
If your mark for the subject was 76, are you above or below the
mean for the class?
Answer:
You are above the mean of 62.
NOTE: In this case the mean of 60 is slightly
smaller than the median. This is because the mean is affected by
the numerical value of every measurement, so a very low score like
16 affects the mean. Likewise, a very large data will drag the mean
upwards. The median is affected only by the relative position of
measurements and so 16 has the same effect on the median as any
other number below 62. The median is not affected by the size of
extreme data values; it is affected by the number of data in the
data set.
What is the Mode?
The mode is the most common value in a data list.
It is the value with the highest frequency. In the example of student
marks, the mode is 68 because it occurs three times (i.e. three
students obtained 68). The mode can be useful with categorical or
discrete variables. For example, if you managed a shoe shop you
might find the mode a useful concept because it could tell you which
men's and women's shoe sizes are the most common among your customers.