5.4 Scatterplots: How can you interpret a scatter plot?
To read a scatter plot you need to look for the overall pattern. This tells you something about the direction, form and strengthof the relationship.
Positive gradient: When the larger values of the horizontal (explanatory) variable are associated with larger values of the vertical (response) variable. As the explanatory variable increases, so does the response variable. Can you see how the data, as we move from left to right, are gradually rising?
Negative gradient: When the larger values of the explanatory variable are associated with smaller values of the response variable. As the explanatory variable increases, the response variable decreases. Can you see how the data, as we move from left to right, are gradually decreasing?
(In both cases we always use a consistent method - "explanatory variable increases" means that we move from left to right - what mathematicians call 'moving in the positive direction'.)
The strength of the pattern is related to how tightly clustered the points are around the underlying form. We often use phrases like those following to describe the strength of the relationship, whether negative or positive. These phrases are of course, subjective.
|(near) zero correlation
|"moderate" positive correlation
||"strong" positive correlation
|"moderate" negative correlation
||"strong" negative correlation
(d) Outliers and influential points
You can also look for individual points that fall outside the overall pattern of the scatter plot. Outliers can have a big influence on correlation. These should be examined (as far as possible) to determine whether they are real data values, or some kind of data error. It is quite common for a researcher to perform two analyses - the first analysis with outliers remaining in the data set, the second with them removed.
The implications of removing/retaining the outlier must be clearly stated (it is unethical to simply erase a data point because it is not in the mainstream pattern!). Reasons and justification for any action must be clearly enunciated.
If the blue outlier were to be removed, we would have a data set with a high level of association. As it is, the outlier has a significant effect on the level of association.
In this graph, influential points lie in the same direction as the major part of the data set, but are a long way removed.
For the graph at left questions would have to be asked as to why there is a gap, and whether there are special characteristics causing the two clusters to arise.