Qualitative Data Analysis Notes

This page is where I'm putting up notes about my experience with the qualitative data analysis aspects of my PhD project.  It countains suggestions for other qualitative researchers, and some of my opinions about existing qualitative data analysis software and techniques.

Voice Recognition Software is a Critical
Productivity Tool for the Serious Qualitative Researcher

The first thing I have to say is that the number one tool for a serious qualitative data analyst should be voice recognition software (VRS).  As a researcher, it's much preferable to transcribe your own interviews, as you have much more contextual information available because you conducted the interview than a third party transcriber has.  This gives you better recall of your experience in the interview at the time.  However, transcribing is tedious and time consuming.  Having said this, shadowing a recording (i.e. speaking back what you hear in the headphones)  with VRS (Dragon Naturally Speaking and it's Mac counterpart Mac Speech Dictate are the only choice here as far as I can see) is an extermely efficient way of transcribing.  The minimum handling time for manual transcription would be 4 times the length of the recording and is frequently up to 8 times, depending on the quality of the recording, the speed at which people talk at, and the typing speed of the transcriber. Shadowing with the use of VRS brings handling time down to around three times the length of the recording. This may not seem like much, but 4 times is a minimum, and at minimum $120 (Australian) or more per transcribed hour of speech manually, costs quickly add up to the cost of a licence fee for the VRS.

Of course, time constraints mean that you may want to use a third party transcriber even with these time savings.  However the increase in speed and transcript quality means that you should train your transcribers to shadow and give them the software too.  It pays itself back very quickly indeed, in time, money and transcript quality.

I use the programmer's text editor emacs to do the transcripts, together with the emacs extension transcript.el which I found on the web and modified a bit.  Using this means I get plain text transcripts in the following format:

Fred: [00:00:00]
Hi Barney, how are you?
Barney: [00:00:00]
I'm pretty good Fred, have you been to work today at all?

I'm pretty happy with this, as the timestamp is put in by the software automatically, I can control playback straight from the keyboard (control-return to toggle stop and start, control-left and control-right to move the transcript forward and back by short steps, and control-up and control-down for long steps).  Once I've entered the two speakers, then the software automatically remembers who should be speaking next and inserts their name and timstamp automatically when you press return (or say "newline" with the VRS).

VRS is also useful for producing field notes.  While I don't use the software at all for normal written communication (like this article), it's extremely useful for field notes.  I think that this is because field notes tend to be a bit stream of consciousness, and being able to type as fast as you can talk is pretty useful for this kind of situation.  However if you are going to make field notes in the field with VRS you will need a quiet place, because aside from looking like a loony talking to your computer, accuracy will be impaired with too much background noise.

Qualitative Data Analysis Software,
or How I learned to Stop Worrying and Love Plain Text

Finally my thoughts on qualitative data analysis software.  Well this is a can of worms.  Qualitative analysis software is a tool, not an end.  As such it should save you time, increase accuracy and generally not get in the way of your raw data.  Wollongong's preferred qualitative data analysis package certainly doesn't fit this description.

  The only game in town at Wollongong University is NVIVO, although Atlas.ti looks like an excellent competitor (and with a more open file format which is important - see later on).  Now I've always been very suspicious of NVIVO, because once your data is in the software it's impossible to export the raw data for use by other software.  While you can export things as Microsoft Word, or plain text documents, it's clunky and makes secondary analysis outside of NVIVO very time consuming.  This is because you still have to re-reconcile your secondary data back (coded text) back to the primary data (raw transcript).  The other thing that really bugs me is that the only way that you can import properly structured text (for auto-coding) into NVIVO is via a Word document.  Even version 8 which has better HTML support doesn't allow importing properly structured documents from anything other than Word format.  Inputting properly structured text into NVIVO is important, as it allows you to auto-code by heading.  I used it to auto-code the interview speakers, and then import the nodes into case nodes.  HTML import is really important, as it's just about the easiest way to produce a properly structured document that a very wide range of software can understand.

These were my concerns before starting to use NVIVO.  I had more concerns appear with greater use though.  Here's a summary of all of them:

An NVIVO advocate may say I'm using the software wrong.  If that's the case, it's too hard to use.  I followed the tutorial, and referred back to the documentation whenever I had a problem.  I can see how NVIVO is a good tool for people who aren't completely comfortable with computers and software, but for people who are highly computer literate, it gets in the way more than it helps.

There are probably other problems as well.  So I'm going back to basics.  All of my data is stored in plain text, closely associated with it's corresponding audio file.  I can code text with a pseudo-xml format, like this (note the curly brackets rather than the angle brackets.  This will probably be useful one day):

Fred: [00:00:00]
{q:how}Hi Barney, how are you?
Barney: [00:00:00]
I'm pretty good Fred{/q:how}, {t:activity}have you been to work today at all? {/t:activity}

And from there it's a pretty straightforward task to write perl code to extract the tags, or groups of tags that I'm interested in.  My preliminary tests tell me that hand typing these codes is faster than using NVIVO, and while there are scope for errors (e.g. typos, miscoding etc) I can itentify them much more clearly and close-up in the raw data.  I can do this all in the same text editor from which I do the transcribing as well.  Even though it means more typing (until I program some automatic detection of the text selection and prompting for the prefered code), I think the benefits outweigh the cost for me.  I also note that I don't have to worry about auto coding by heading any more, as the Name: [00:00:00] format is distinctive enough, and is just more fairly simple code.

For version tracking, I've found that the git version control software is very good for my needs.  Once you're used to it, reading text diffs is very simple and flexible. Text based diffs brings great clarity to understanding what you've been working on, which is why they're the stock in trade of every good computer programmer.

More to come on qualitative analysis using text based tools.  Watch my RSS feed for updates.

View Comments
blog comments powered by Disqus

Atom feed
Office: 39.109 | Tel: 0403 929 895 | Email: kd21@uow.edu.au
Original content is
© Kieren Diment 2009