CSCI213/ITCS907/MCS9213
Autumn Session, 2007

Assignment 1: Beginning Java and learning to use the NetBeans Development environment

You should complete "Laboratory Exercise 1" before starting this assignment.


Aims

This assignment aims to establish a basic familiarity with the NetBeans development environment and the Java class documentation. It also introduces the use of some simple Java library (package) classes including I/O reader/printer classes, strings, collections.

You will soon learn that the biggest difference from C++ is that you write very little of your Java programs and you almost never start from scratch. Instead, you construct your programs mainly through the use of library classes. Typically, you start by taking an existing Java program that is similar to what you want, you strip the bits that you don't need and then build up your new program. You gradually acquire a collection of reusable code fragments for things like simple standard graphical user interfaces.

The assignment involves several versions of essentially the same program; these versions introduce increasingly object-based, Java-oriented styles of solution to a problem. All versions of the program count toward your assignment mark.


Objectives

On completion of this assignment you should be able to:


Overview of assignment

The programs for this assignment involve processing a collection of simple data records. The data are read from a file, manipulated in various ways, sorted, and listed. Several programs have to be implemented; they are increasingly sophisticated in the way in which they use Java to manipulate the data.

The actual data records supposedly describe medical data for patients. The data records (lines in the text data file) contain information like the patient's name and initials, insurance number, and results from a number of medical tests (blood-pressure, cholesterol-level, etc, etc). The programs produce reports such as listings of patients with the highest risks (obesity, high blood-pressure, etc).

Four versions of the program are implemented. The versions start with procedural code manipulating simple data records and move toward more object-based styles that make increasing use of features of the Java language and its libraries.

The assignment involves the following parts:

  1. Writing some simple procedural style code that reads the contents of a file; storing the data in simple arrays, and sorting data. This part introduces String, PrintWriter, BufferedReader and similar classes.
    The program works entirely with instances of classes from the standard libraries. Your own code is procedural, rather like the code for you C/C++ programs; so the overall program is hybrid - a procedural main-line making some use of instances of simple classes.
    You have to implement a sorting function. You are to implement an "insertion sort" (the data sets are small so there is no need for sophisticated algorithms like quicksort). The lecture notes for CSCI213 illustrate a "selection sort"; you are to implement the slightly more efficient "insertion sort" algorithm. (On typical data sets, insertion sort is reputed to be about 40% faster than selection sort.)
  2. For the second stage, you define an application specific class. This PatientRecord class defines a type of object that can hold the data for a single patient record. The program is modified to use an array of these PatientRecord objects rather than separate arrays with patient name, insurance number etc.
  3. In the third stage, you make better use of Java. The Java system already comes with sophisticated sorting algorithms implemented in its libraries. You must modify your program to utilize these features. More details of sorting in Java are contained in a supplementary document.
    You will also take advantage of Collection classes. Instead of defining arrays to store the data, you will use a dynamically resizable collection (you can choose Vector, ArrayList, or LinkedList).
  4. In the fourth and final stage, you change the code to a more clearly object-based style, and make other improvements to the code.
    In this stage, you cease to have the procedural style with its static main calling static functions that work on quasi-global (static) data. Instead, your main program will create an instance of a "PatientRecordAnalyzer" class and invoke operations of this object.
    Your analyzer class will have methods to read data, and to produce a variety of reports. It will store PatientRecord data in a collection class object that is an instance data member.

Resources

Two datafiles are provided. The main datafile is should be used for most of the test runs of your program. The second smaller datafile is used in the fourth part of the assignment where one of the operations involves merging two sets of data.
Each line in these files contains the following data elements:

  1. Insurance number
  2. Name
  3. Initials
  4. Gender (M/F)
  5. Age in years.
  6. "Body Mass Index" (a measure of whether a person is under-weight < 19, over-weight > 25, or obese > 30)
  7. A blood-sugar reading
  8. Systolic blood pressure
  9. Diastolic blood pressure
  10. A blood cholesterol reading
  11. Results from three tests on blood hormone levels (if a test was not performed, the text record contains "-" in the corresponding field).
Example data are:

308970906	Ebbers	WE	F	61.1	41	5.7	145	74	3.7	-	46	14.7
102153708	Edmonds	SAV	F	56.7	32	7.1	130	92	6.5	-	49	12.5
108190991	Edash	BJ	F	61.3	30	4	147	83	7.3	-	41	21.6
218409054	Ederm	LO	M	56.9	35	5.8	135	79	5.2	3.2	-	-

(W.E. Ebers, a female patient, has insurance number 308970906. She is just over 61 years. She is significantly obese having a body mass index of 41. Surprisingly, her sugar and cholesterol levels are both quite good. The systolic blood pressure is slightly elevated. Apart from her obesity, she looks in reasonable condition for her age. Two of the three blood hormone levels were measured with results as shown.)

There are several hundred such records in the file.

The following code fragment illustrates how such data may be read and processed. It produces an output listing showing each patient's insurance number, name, initials, gender, and age. Warning tags are appended to those patient records where there are clearly problems with obesity, blood-pressure (hyptertension), blood-sugar (hyper- glycemia), or blood-cholesterol (hypercholesterolemia). (This code fragment is the starting point from which you develop your solutions to the assignment - as noted earlier, you very rarely start a Java program from scratch, you almost always modify existing code.)

public class DemoCode {
    
    /**
     * Demonstration program reads the data file, splitting
     * input lines into component fields, printing patient ids, names, initials
     * and flagging those most at risk.
     * @param args the command line arguments, args[0] is name of input file
     */
    public static void main(String[] args) {
        if(args.length<1){
            System.out.println("Need name of data file");
            System.exit(1);
        }
        BufferedReader input = null;
        try {
            input = new BufferedReader(new FileReader(args[0]));
        }
        catch(FileNotFoundException fnfe) {
            System.out.println("Didn't find the file");
            System.exit(1);
        }
        /*
         *Read input file line by line until get either empty line
         *or null (end-of-file indicator); split line into components
         *at whitespace, print id, name, initials, gender and age
         *Flag "at risk" persons - (check for any one of following conditions)
         *  body mass index > 35,
         *  systolic blood pressure > 175
         *  fasting blood sugar > 9
         *  total cholesterol > 8
         */
        System.out.println("Insurance #\tName\t\tInitials\t(M/F)\tAge\tAlerts");
        for(;;) 
        {
            String line = null;
            try {
                line = input.readLine();
            }
            catch(IOException readfail) {
                System.out.println("Read failed on input file");
                System.exit(1);
            }
            if(line==null) break;
            if(line.equals(""))break;
            String[] items = line.split("\\s+");
            System.out.print(items[0]+"\t"+items[1]+"\t\t"+items[2]+"\t\t"+items[3]+"\t" + items[4]+"\t");
            String bmiStr = items[5];
            String systolicStr = items[7];
            String sugarStr = items[6];
            String cholesterolStr = items[9];
            try {
                int bmi = Integer.parseInt(bmiStr);
                int systolic = Integer.parseInt(systolicStr);
                double sugar = Double.parseDouble(sugarStr);
                double cholesterol = Double.parseDouble(cholesterolStr);
                if(bmi>35) System.out.print("Obese, ");
                if(systolic>175) System.out.print("Hypertension, ");
                if(sugar>9.0) System.out.print("Hypoglycemia, ");
                if(cholesterol>8.0) System.out.print("Hypercholesterolemia");
            }
            catch(NumberFormatException nfe) {
                System.out.println("Invalid numerical data for patient" + items[1]);
            }
            System.out.println();
         }
        }

}

You should be able to cut and paste the code from this page into a Demo project that you create in NetBeans.

You should get the code to run in NetBeans, configuring your project so that its Properties/Run settings identifies the data file from which input is read.

The following is a fragment from the output that you should get if you run this program:

109642375	Denby		CR		M	57.9	Obese, Hypertension, Hypercholesterolemia
502263058	Eagham		GH		M	55.2	
128529487	Easlee		JA		F	52.6	
308970906	Ebbers		WE		F	61.1	Obese, 

The main program creates a FileReader object ("input") that can be used to extract character data from a file. The FileReader object is wrapped inside a BufferedReader object. A BufferedReader is more useful in that it allows the program to read input data line by line. The program then has a loop to read all the lines in the file; the readLine operation of a BufferedReader returns the next line. If the physical end of file is encountered, readLine returns null. (The program checks for an empty line,""; there may be some blank lines at the end of the data before the physical end of file.) The loop terminates when all input data have been processed.

The data in a line of the file is obtained as a java.lang.String object. String objects cannot be changed after creation. The String class defines methods for obtaining substrings, finding characters, and for splitting up a line. Here, the split function is used. The somewhat strange argument for split "\\s+" is a "regular expression" that means "a sequence of one or more white space characters". So the input line should be split into substrings at the white space gaps.

In this demo code, the strings containing insurance number, name, initials, etc are simply listed. Strings that encode numeric values, like the blood sugar level, are converted into actual numbers for further processing.

The demo code consists of a single main function. Your programs will need to use private auxiliary functions.


Tasks

  1. Part1 Procedural code using basic Java classes.
  2. Part2 Defining your own class.
  3. Part3 Defining a Comparable class and a Comparator for using Java sorting, and using a collection class.
  4. Part4 Defining a "Patient Analyzer" class.

Listings of each program are required in your final report so keep the different versions in separate projects. The code that you write to complete one part of the assignment represents a starting point for the code for the next part. NetBeans allows you to have several projects open simultaneously and provides a mechanism for copying a class definition file from one project to another. (There is also a mechanism for sharing class files between projects. Don't use this. It would result in any changes you make in later stages corrupting the code for earlier projects.)


Procedural code using basic Java classes

Write a program, derived from the demonstration code shown above, that reads the data from the file, storing each patient's insurance-number, name, initials, gender, and age in separate arrays (long[], String[], String[], String[], double[]). (The other data elements in each record are ignored in this part of the assignment.) When all data have been read, the contents of the arrays are to be sorted in parallel. using an insertion sort algorithm, and listed in order of increasing insurance-number. (Your insertion-sort code should check the insurance-numbers of compared records to determine whether records should be swapped; if the records must be swapped, you must make appropriate changes in each of the data arrays.) Output from your program should appear something like:

...
107438863	TP Ash	F	60.3
107482636	IK Holly	M	78.3
107576725	P Hartly	F	76.9
107761288	B Lim	M	53.1
107864861	TT Bailes	M	69.9
...

Your program will:

Your program will have a single public class that contains the declarations of all static data members and static functions.

(When editing this class, you will begin to see the use of the Navigator pane in the NetBeans window. It will show a listing of the class members with little icons that distinguish public from private members, and static from instance members - in this case all members are static).


Defining your own class

Implement a new version of the program that has a main class (that has the main() function, input function, sort function, and report function) and a simple "Patient record" class

Part2

Your "Patient record" class will:

All data and function members are to be public. At this stage, a "Patient record" object is really being used as a simple "struct" just to hold data needed by other parts of the program. Normally, data members are private.

Your main class is similar to that implemented in part1. The main difference is that it owns a single array of "Patient record" objects instead of arrays for separate String and double variables. This should result in substantial simplification of your insertion sort code.


Defining a Comparable class and a Comparator for using Java sorting, and using a collection class.

A new version of the program should be created as a separate project. This version will have three classes. A main class with the driver code; a revised version of your "Patient record" class, and a Comparator class.

Part3

Your "patient record" class will now "implement Comparable<PatientRecord>. The definition of the compareTo function will define the natural ordering of PatientRecords. They are to be ordered in increasing order by insurance number.

You will also define a class that implements Comparator<PatientRecord>. This will define a compare function that will arrange that PatientRecords are ordered so that the oldest patients are listed first (if several patients are the same age thay are to be ordered in increasing order by membership number).

You will remove your insertion sort code and eliminate the use of fixed size arrays. Instead, your main program will use a collection class (one of those implenting the List interface) and make use of the Collections.sort functions.

Your program is now to generate two report listings. Once all the data records have been read and stored in the "list", the main program is to use the Collections.sort() function to sort them into their natural order and then call the first report generating function.

The first report function prints details of those patients where an alert is needed (blood pressure too high etc - as defined in the demonstration code given above). A fragment from this report is as follows:

...
108267661	BG Ramses M 64.6 Obese, 
108278713	JP MacMann F 79.2 Obese, 
108429007	C Ng F 55.3 Hypertension, 
109055281	EP Foz F 76.1 Obese, 
109219564	H MacTavish M 65.4 Hypercholesterolemia
109494855	P Jagdish F 73.1 Hypercholesterolemia
...

After invoking the first reporting function, the main program is to re-sort the array so that the patients will now be listed with the oldest patient first. A second reporting function, that takes an argument defining the number of records required, will list the first few records in the sorted collection. (The main program will use some fixed number, ~15, in its call to this second reporting function). A fragment from this second report is as follows:

...
107338692	PI Geharn	88.1
100434625	X Xu	87.5
429990610	SD Lafna	87.1
118740547	CD Lott	86.3
419265801	FG Kay	84.3
...

(You will use the PatientRecord class and its comparators in later assignments. Keep a copy of your code.)


Defining a "Patient Analyzer" class.

For the final part of the assignment you will define a "Patient Analyzer" class and another "Comparator".

Part 4.

Instances of the PatientRecordAnalyzer class will:

The main() driver function (in a separate driver class distinct from class PatientRecordAnalyzer) will:

  1. Read two filenames from the command line;
  2. Create two PatientRecordAnalyzer objects;
  3. Use these to load the data from the named files;
  4. Get the first PatientRecordAnalyzer to list its first ten entries ordered by insurance number;
  5. Get the first PatientRecordAnalyzer to list its first eight entries ordered by name;
  6. Get the first PatientRecordAnalyzer to list its first five entries ordered by age;
  7. Get the second PatientRecordAnalyzer to list its first five entries ordered by age;
  8. Get the first PatientRecordAnalyzer to print the statistics characterizing its contents;
  9. Get the second PatientRecordAnalyzer to print the statistics characterizing its contents
  10. Create another PatientRecordAnalyzer by merging the two created earlier;
  11. Get this new PatientRecordAnalyzer to print the statistics characterizing its contents.

A fragment from the output that should be obtained is:

...
Second collection, 5 oldest
863802608	K Iverson,	86.7
333829360	J Zhu,	83.0
447864861	T Cao,	79.9
590736520	LT Prince,	79.9
...
First collection, statistics
Statistical report for dataset data2
Number of patients 245
Minimum age 38.9
Maximum age 95.2
Average age 66.3
Average bmi 30
...

Submission

The due date for submission will be announced in lectures; the date will probably be around the end of week 4 of session (currently set at Friday March 23rd). For this assignment, and most of the other assignments in CSCI213, you will be writing a report summarizing your work and submitting this report for assessment. The report must be prepared in a word processor, converted to PDF, transferred to your Unix account by ftp, and then submitted.

For CSCI213, asignments are submitted electronically via the turnin system. The turnin command only works if you are logged in on banshee (it has to access files that are only available on the banshee machine). Check that you are logged in on banshee (the Unix command uname -n will tell you what machine you are connected to); if you are not logged in on banshee, you must use ssh to remotely login on banshee before using turnin. (If you try to use turnin when not on banshee, you will get some enigmatic reply - most typically turnin will reply that it doesn't know anything about the assignment that you are trying to submit.) The turnin system is not chatty. You won't receive any happy little emails thanking you for your work. It just takes your file and saves it. You can use the turnout command to check that your submission was received. The turnin and turnout programs have Unix man pages describing their use.

You have four projects whose code is to be included in the report.. You will use a word processor to create a report document and convert this document to a PDF file prior to submission. Your report will have an index and a titled section for each part. In each part, you will have your code, copied and pasted from your development editor. (Some people try capturing screen shots of the code formatted in the integrated development environment. This isn't a good idea. It is laborious and somewhat error prone - easy to miss out part of the code etc. However, you might find screenshots of things like the NetBeans "project view" or "navigator view" to be quite useful as outlines at the start of each section documenting a program or a class.) After pasting code into a word processor, you should fix up any formatting problems (inserts of newlines etc where a long source line was badly split). With each part, you should provide some evidence for correct operation - you can capture screen shots showing a your application, and you can capture output (which should be edited down to a few lines) and paste these into your report.

The pdf report (named A1.pdf) should be submitted using the following command on banshee:

turnin -c csci213 -a 1  A1.pdf

One again, just to emphasise,

  1. turnin isn't chatty you don't get email back
  2. you can check that your submitted file is safely submitted via turnout program.
  3. turnin only works if you are logged in to banshee - use ssh if you are on a different machine or workstation. (If turnin complains that it doesn't know about the assignment, you aren't logged in on banshee!)
  4. Medicals submitted through official university SOLS channels cancel late penalty marks

Late submissions do not have to be requested. Late submissions will be allowed for three days after close of scheduled submission. Late submissions attract a mark penalty; this penalty may be waived if an appropriate request for special consideration (for medical or similar problem) is made via the university SOLS system before the close of the late submission time. No work can be submitted after the late submission time. If submitting late, use the command:

turnin -c csci213 -a 1late  A1.pdf

(That is 1late, 1 - l - a - t - e, all as one word.)


There is an example PDF report in the /share/cs-pub/csci213 directory - it illustrates how a report can be formatted with code, commentary and evidence for correct operation of a program. (It is a rather long report; it was produced by a student for an assignment worth 25% of a 400-level subject).

The /share/cs-pub/csci213 directory has a couple of PDF "driver" programs. These can be installed on a Windows PC and will allow "printing to PDF" file from any application.

(It is possible to create PDF files on the Unix system. You must first create a postscript file - use a "print/save to file" option - that should give you postscript. There is a ps2pdf program that converts that to PDF. This approach is clumsy and may not work completely.)

The /share/cs-pub/213 directory has Windows/Linux copies of the JDK and the Java documentation for those wishing to work on their own machines.


One mark of the ten marks for the assignment is for overall presentation of the report, inclusion of evidence of operation, well formatted code etc. You can only get this mark if your report presents solutions to at least two parts of overall assignment.

Another mark is for general Java style - you can only get this mark if you complete all four parts, but may then lose it if you are making incorrect use of basic Java constructs.

Part1 is worth 2 marks - if it works, and the data elements are defined appropriately, and overall structure is correct with the separate functions as were specified, and the implementation of the insertion sort was done correctly.

Part2 is worth only 1 mark - and to get that mark you must have code that works and that properly defines the "patient record" class with its data members, constructor, and toString method, and have modified the other code to use your patient record objects appropriately.

Part3 is worth 2 marks - if it works, and you have defined the comparator and "implements comparable" aspects appropriately and made correct use of Collections.sort functions.

Part4 has 3 marks - again, you only get the marks if it works and does all that it is supposed to do.