| |
|
Describing, Clarifying and Presenting Data
4. Summarising data
4.3. Identifying an outlier

Let’s look again at the stem-and-leaf plot and the histogram
that were developed for the student marks data set.


Notice how the measurement of 16 falls outside the pattern of the
frequency histogram and therefore deviates from the overall shape
of the histogram. This is an indication that 16 is an outlier
in this set of marks.
Why did this one student score so poorly in this subject in comparison
with his/her peers? Can this deviation be explained? Several explanations
come to mind:
- The student missed the final examination due to illness.
- The student did not attempt any assessment tasks throughout
the session.
- The student and the lecturer/tutor would be in a position to
assess whether one of these explanations is appropriate, but anyone
else would not have the relevant information.
Outliers can be significant or they can be a mismeasurement
An outlier can be an unusual, important observation. Alternatively,
it can be a mismeasurement. Understanding the context and checking
the data might resolve questions associated with the outlier, but
often there is a dilemma about how outliers should be treated. They
could be rubbish or the most important information, and often you
do not know which.
Outliers can distort the mean of a set of data. Data involving
income or pricing is often summarised using the median. For example,
in the real estate section of the newspaper, the median house price
for a suburb is often used rather than the mean price because an
outlier such as a very high priced mansion will have less effect
on the median price than they would on the mean house price. You
might also note that the highest and lowest prices will also be
reported so that potential buyers or sellers have some idea of the
range of prices paid for a house in that suburb.
Next - Menu
|
|
|
|
|
| |