UOW
Excellence - Innovation - Diversity
University of Wollongong
Site Search
Advanced Search  
 
 

Interpreting Data

3. Questioning the data

Many important and costly policies can flow from the results of a study. So it is important to be sure that a study is conducted in such a way that it produced (as much as possible), unbiased data.

Eliminating bias can be harder than you think.

3.1 Was the variable well defined?

Consider the following scenario.

4102.0 - Australian Social Trends, 1995

PREVIOUS ISSUE Released at 11:30 AM (CANBERRA TIME) 20/06/1995


SCENARIO

The extract below is from the Australian Bureau of Statistics (ABS) site, describing data collection in household crime. The ABS collected their data from a survey of 10,000 people.

Housing Stock: Safe as houses?
In 1993, over half a million households were victims of household crime. 62% of these households were not members of neighborhood or rural watch.

Since World War II most western countries have experienced increases in recorded crime as measured by police statistics. However, increases in official crime rates may merely indicate an increase in the number of offences recorded by the police rather than an actual increase in the number of offences committed. There is a public perception that crime, particularly violent crime, has increased in Australia in recent years. But 1993 data indicate that there has been only a marginal change in the level of violent crime compared to 1983. While the victimisation rate for robbery has doubled from 0.6% to 1.2%, the rate for sexual assault has remained virtually unchanged at 0.6% and the rate for other types of assault has decreased from 3.4% to 2.5%. It should be noted that these 1983 and 1993 data are 'snapshots' of the incidence of violent crime and that criminal activity may have fluctuated at times between these two years.


(Australian Bureau of Statistics 1995) [2]

Comments on the ABS crime surveys

While ABS Crime and Safety Surveys can be used to measure changes in patterns of crime, care must be used in using their results because of methodological and definitional differences between the surveys. Important differences between the 1983 and 1993 surveys include:

  • in 1983 data were collected using face-to-face interviewing while in 1993 self-completed questionnaires were used;
  • although both surveys had a 12-month reference period, the 1993 survey was conducted in April while the sample for the 1983 survey was spread over 12 months, from February 1983 to January 1984.

The victimisation rate is the number of people or households in a particular category who reported being victims of crime, expressed as a percentage of all people or households in that category. Victims were counted only once for each type of offence, regardless of the number of incidents of that type.

Household crime consists of break and enter, attempted break and enter and motor vehicle theft. The latter includes the theft of a motor vehicle, owned or used exclusively by a household member, which may have occurred away from the home

Comparisons with police statistics
Responses obtained in ABS Crime and Safety Surveys are based on the respon- dents' perceptions that they have been the victim of an offence. Data on crimes not reported to police are collected. The terms used summarise the wording of questions asked of respondents and may not correspond with legal or police definitions.

This scenario shows the complexities involved in defining and using a variable, especially when there are two or more agencies using the same concept.

Problems in discussing data on crime include:

  1. different collection methodologies within ABS - face to face and then unsupervised questionnaire
  2. different time frames for ABS collection
  3. police data is based only on reported crime (the ABS tries to identify such discrepancy in interviews)
  4. Police definitions may differ from ABS definitions
  5. Some household crime may not occur in the home (car theft)
  6. Victimisation rates may be lower than the real rate of incidents because multiple counting is not used.

The above scenario also suggests how important it is to keep in mind that in the print media, on TV and on the internet you are possibly reading not just the researchers' reports but other people's ideas of how the data were produced, presented and interpreted.

In reading about the results of a study the first thing that you must think about is how the data were produced. This relates to identifying the population of interest for a study.

SCENARIO

From the following list select one method that can be used to eliminate bias from research studies.

3.2 How were the data produced?

Consider the following case study done on a sensitive political issue:

Public schools versus non-public schools

Coleman, Hoffer and Kilgore (1982) [2] analysed data from the study, High School and Beyond. The study examined the gains in scores on Reading, Vocabulary and Mathematics tests between Years 10 and 12 of students attending Public, Catholic and Other-Private high schools. Their question was "whether private schools bring about - for comparable students - higher achievement in basic cognitive skills". The students involved in the study were required to complete a battery of tests. Table 3.2.1 provides a simplified version of their results.

table

Table 3.2.1. Estimated Year 10 to 12 Gains in Test Scores and Learning Rates, with Corrections for Dropouts Missing from Senior Distribution

1Estimated learning rate was based on the average number of items learned and the number remaining to be learned.

Do the data support this interpretation?

In looking at these data you need to ask, "Are the data produced good enough to support this interpretation?". Coleman et al were criticised as being pro-private school (Goldberger and Cain 1982). Evidence for this claim was provided from an analysis of the way that the data were produced. The major criticisms included:

  • Although the authors of this paper implied that the same numbers of schools from each sector were involved in the study, the sample sizes for each sector were very variable - the number of schools involved in the study were as follows:
    table
    This might lead you to question the representativeness of the private-sector sample.
  • Also, in the private schools, only 79% of Year 10 students participated compared with a participation rate of 90% for public schools and 95% for Catholic schools. Such variation might lead you to wonder whether some selection of students for the survey had occurred at the participating private schools so that the "better" students were selected for the study. Such an approach would increase the likelihood of an unrepresentative sample for the private school sector.
  • The study was designed to support the argument that certain types of schools improved the learning rates for students more than other schools did. However, the questions on the test were elementary. Did they really assess learning at high school - was the study measuring what it claimed to do?
  • When students were stratified according to curriculum stream (i.e., academic, general, vocational), scores for academic public school students were about the same as those students in the private and Catholic sectors where the emphasis is on an academic curriculum. Thus, socioeconomic differences could also be a contributing factor which needs to be taken into account. (This is due to the perception that upper class or middle class homes will be more likely to put children into academic studies - an assumption which needs testing!)

How valid would be the inferences made from the data? The questions that have been raised above are typical of the approach we need to take when examining data.

How can you decide whether or not the data are good data? Here are some guidelines:

  1. Ask yourself if there are other variables that have not been considered but which could affect the data.
  2. Is a context provided for the data? i.e., is the source of the data clearly described? This would allow you to decide if the variables were well defined and if the measurements were valid, reliable and accurate.
  3. How was the sample used in the study selected? Was there any bias in the way that the sample was selected?
  4. Were the data produced by a recognised and respected agency such as the Australian Bureau of Statistics? (Although this might not always guarantee that the data are good, in general you would have greater confidence in the reliability of the data when compared with data produced by an 'unknown' organisation by a body with 'vested interests'.)
  5. Are the variables well defined and do these actually measure the property that is the focus?

Next - Menu

 
 
 
 

University of Wollongong
Wollongong NSW 2522 Australia
UOW Switchboard: +61 2 4221 3555

Prospective Student Enquiries
Australia: 1300 367 869
International: +61 2 4221 3218

CRICOS Provider No: 00102E
Privacy, Disclaimer and Copyright
Feedback: webmasters@uow.edu.au