CSCI213/ITCS907/MCS9213
Autumn Session, 2007
Sorting in Java
Sorting in Java is quite sophisticated.
There are sort functions provided (the algorithm is
some form of merge sort). These are defined in the java.util.Arrays
class; and also in the java.util.Collections class. Arrays
and Collections are not classes from which you instantiate Arrays
or Collections objects;
their role is to define a kind of namespace where Sun could put useful
functions like those for sorting arrays/collections or for doing binary searches
on sorted arrays.
The java.util package includes many useful
"collection classes" (Vector, LinkedList, Hashtable, ... ) including
some like TreeSet that automatically sort data elements as they
are added to the collection. The interface Collection defines
some functions that all kinds of collection classes can perform, e.g. add(element),
remove(element), isEmpty() etc. There are specializations like the
interface java.util.List; this adds the concept of an
ordering so that there are methods to get the element at a particular
point in a list-collection. Actual concrete classes - like Vector, LinkedList etc -
implement the List interface or one of the other specialized interfaces
that extends Collections.
(From Java 1.5, collections are defined as "templates"; these are much
less complicated than C++ templates. A Java template collection class simply
requires that you define the class of data elements that will be stored in that
collection. So, if you want use a LinkedList to store String
objects, you must declare a "linked list of Strings" - LinkedList<String>
mylist = new LinkedList<String>();. If you look at older text
books that use dialects of Java before 1.5, they will use "type unsafe" collections -
you simply declared that you needed a LinkedList and then used it to store
references to objects of whatever kinds you chose.)
Arrays.sort(), Comparable, and Comparators
If you look at the Arrays class (or the Collections class) in the Java documentation you will see functions like:
static int binarySearch(double[] a, double key)
Searches the specified array of doubles for the specified value using the binary search algorithm.
static int binarySearch(float[] a, float key)
Searches the specified array of floats for the specified value using the binary search algorithm.
static int binarySearch(int[] a, int key)
Searches the specified array of ints for the specified value using the binary search algorithm.
static void sort(double[] a)
Sorts the specified array of doubles into ascending numerical order.
static void sort(double[] a, int fromIndex, int toIndex)
Sorts the specified range of the specified array of doubles into ascending numerical order.
static void sort(int[] a)
Sorts the specified array of ints into ascending numerical order.
static void sort(int[] a, int fromIndex, int toIndex)
Sorts the specified range of the specified array of ints into ascending numerical order.
Of course, Java knows how to sort primitive type data, numbers like int and double. But what about objects that are instances of classes? The following functions are defined:
static void sort(Object[] a)
Sorts the specified array of objects into ascending order, according to the natural ordering of its elements.
static void sort(Object[] a, int fromIndex, int toIndex)
Sorts the specified range of the specified array of objects into ascending order, according to the natural ordering of its elements.
static void
sort(T[] a, Comparator super T> c)
Sorts the specified array of objects according to the order induced by the specified comparator.
static void
sort(T[] a, int fromIndex, int toIndex, Comparator super T> c)
Sorts the specified range of the specified array of objects according to the order induced by the specified comparator.
What is "the natural ordering" of a set of objects? What is a "comparator"?
There has to be a mechanism for specifying how objects are compared.
Essentially, a comparison function has to be supplied
that will be used by the sort function every time it
compares two elements (if you ever met
the qsort function in C/C++ stdlib you will remember providing
a function that compared elements of an array). The compare function
used for the java.lang.String class obviously compares
two strings so that they are sorted alphabetically. The
compare function for the java.util.Date class will
arrange dates with the oldest date first. These comparison
functions are built in with the classes that Sun provides.
If you are defining a class, you must define the comparison function. There are two ways of doing this.
The first mechanism is for the case where you are designing a new class. In this case, you arrange that your new class implements the "Comparable" interface - which really means that it defines a compareTo() function.
The following example code illustrates this approach. The main program (main() in public class OrderDemo) creates a number of "Boxes" and stores them in an array. The array is then sorted using the Array sort function.
A box is something with width, breadth, height, and color. Boxes
are sorted first by volume, if their volumes are equal they are sorted
by color name, if their color names are the
same they are sorted by unique box identifier numbers.
This box class was designed for use with
a sorted collection so it implements the Comparable interface
and defines its compareTo method. (To work properly, a compareTo
function should only return "equal" if the "two" objects being
compared are actually the same object.)
Here are the class definitions:
import java.util.*;
class Box implements Comparable<Box> {
static int idCount = 0;
int id;
int width;
int length;
int height;
String color;
public Box(int w, int l, int h, String c) {
width = w;
length = l;
height = h;
color = c;
id = ++idCount;
}
public long getVolume() { return width*length*height; }
public int compareTo(Box other) {
int val = (int) (other.getVolume() - getVolume());
if(val != 0) return val;
val = color.compareTo(other.color);
if(val != 0) return val;
return id - other.id;
}
public String toString() {
StringBuffer sb = new StringBuffer();
sb.append("Box ");
sb.append(id);
sb.append(", Volume ");
sb.append(getVolume());
sb.append(", Color ");
sb.append(color);
return sb.toString();
}
}
public class OrderDemo {
public static void main(String[] args) {
Box[] theBoxes = new Box[10];
theBoxes[0] = new Box(7, 8, 9, "red");
theBoxes[1] = new Box(3, 13, 12, "yellow");
theBoxes[2] = new Box(11, 14, 22, "blue");
theBoxes[3] = new Box(14, 13, 33, "black");
theBoxes[4] = new Box( 28, 1, 18, "green");
theBoxes[5] = new Box(14, 13, 33, "black");
// Invoke Arrays.sort
// The boxes are sorted "by their natural ordering"
// as defined by their compareTo function.
Arrays.sort(theBoxes, 0, 6);
for(int i= 0;i<6;i++)
System.out.println(theBoxes[i]);
}
}
A compareTo() function for a class that implement Comparable<T> must have the signature:
public int compareTo(<T> o)
where T is the type of the object, here Box
The compareTo code compares by volume (and if the volumes are equal, compares by color, then by box identifier).
Apart from comparing itself with another Box, a Box object can report its volume and turn itself into a String so that it can easily be printed. It is possible to build a String by concatenating shorter Strings; it is slightly more efficient to use the helper StringBuffer class to build a complex String. Here, the toString() method makes use of a StringBuffer while building its String representation. (When you call System.out.println(obj), the obj object executes its toString method to produce a String that the println function can print.)
The other mechanism used with sorted collections is for cases where you have already implemented a class, and it doesn't "implement Comparable", or when you want to sort by some criteria that is different from the "natural ordering".
In this situation, you explicitly pass a comparison function
as an extra argument when you call the sort function
(just as you did when using qsort in C/C++).
However, there is a problem. Java doesn't allow the definition
of "free functions". The comparison function that you define
must belong to a class. A little trickery is used to get
around this.
The trick is to use a "Comparator". This is a fake class (it isn't a real class in the sense of defining a type of data object that owns resources and provides functions related to those resources). Such classes exist simply to provide a collection of functions (just one function here).
The compare function in a Comparator compares two objects (of the same kind). You define different comparators with different compare functions for comparing different kinds of data object.
The Boxes code has been reworked to use Comparator as well as a
"natural ordering" defined by implementing Comparable. The
"natural ordering" has been redefined to place the boxes in
order of their identifier numbers. The Comparator class has
been defined to provide a compare function
that does the comparison based on volume, colour etc.
import java.util.*;
class BoxComparator implements Comparator<Box> {
public int compare(Box b1, Box b2) {
int val = (int) (b2.getVolume() - b1.getVolume());
if(val != 0) return val;
val = b1.getColor().compareTo(b2.getColor());
if(val != 0) return val;
return b1.getId() - b2.getId();
} }
}
class Box implements Comparable<Box> {
private static int idCount = 0;
private int id;
private int width;
private int length;
private int height;
private String color;
public Box(int w, int l, int h, String c) {
width = w;
length = l;
height = h;
color = c;
id = ++idCount;
}
public int compareTo(Box other)
{
return id - other.id;
}
public int getId() { return id; }
public long getVolume() { return width*length*height; }
public String getColor() { return color; }
public String toString() {
StringBuffer sb = new StringBuffer();
sb.append("Box ");
sb.append(id);
sb.append(", Volume ");
sb.append(getVolume());
sb.append(", Color ");
sb.append(color);
return sb.toString();
}
}
public class OrderDemo2 {
public static void main(String[] args) {
Box[] theBoxes = new Box[10];
theBoxes[0] = new Box(7, 8, 9, "red");
theBoxes[1] = new Box(3, 13, 12, "yellow");
theBoxes[2] = new Box(11, 14, 22, "blue");
theBoxes[3] = new Box(14, 13, 33, "black");
theBoxes[4] = new Box( 28, 1, 18, "green");
theBoxes[5] = new Box(14, 13, 33, "black");
System.out.println("Boxes sorted by the volume/colour comparator");
Arrays.sort(theBoxes, 0, 6, new BoxComparator());
for(int i= 0;i<6;i++)
System.out.println(theBoxes[i]);
System.out.println("Boxes sorted by the 'natural ordering' i.e. ids");
Arrays.sort(theBoxes,0,6);
for(int i= 0;i<6;i++)
System.out.println(theBoxes[i]);
}
}
If you run this demo, you should get output like the following:
Boxes sorted by the volume/colour comparator Box 4, Volume 6006, Color black Box 6, Volume 6006, Color black Box 3, Volume 3388, Color blue Box 5, Volume 504, Color green Box 1, Volume 504, Color red Box 2, Volume 468, Color yellow Boxes sorted by the 'natural ordering' i.e. ids Box 1, Volume 504, Color red Box 2, Volume 468, Color yellow Box 3, Volume 3388, Color blue Box 4, Volume 6006, Color black Box 5, Volume 504, Color green Box 6, Volume 6006, Color black
Sorted collections: TreeSet
Sometimes arrays of data elements are useful, but often the collection classes are more convenient. In order to create an array, you must either know or guess the number of data elements that you will have. If you guess wrongly, your program either dies with an "array subscript out of bounds" exception or you waste memory. Instances of collection classes "grow" to hold the data that you add. (You can find more details about the collection classes here.)
The most common collection class is java.util.Vector;
this is basically a dynamic array that grows as you add
elements. You can in effect use "array subscripting"
to place elements at particular points in a Vector. Once you
have filled your Vector with data elements, you can
apply the Collections.sort() function to that
Vector. If you use the simple sort() function, the elements
are sorted in their "natural order"; you can use a
version of sort() that takes a Comparator as an extra
argument and sort your elements by whatever criteria
you wish.
Clearly, sorting a LinkedList could be costly! (It
would probably be N*N*lgN - you did some of this
algorithm complexity stuff in CSCI103, you will do
more in CSCI203.) In practice, the Collections.sort()
function copies the data in a List into
a temporary array; sorts the array using Arrays.sort() in
N*lgN time; copies the sorted data back into the collection.
A TreeSet object sorts data as they
are added. It is based on a standard data structure
called a "red-black" tree (those who have taken an
algorithms course will probably have encountered
red-black trees).
In your program, you simply create an instance of class TreeSet (with Java 1.5 style type security you must specify the type of element that will be stored in the TreeSet). The TreeSet object that you get will then arrange elements in their "natural ordering" as they are added.
If you want to use some different ordering, you can create another TreeSet providing it with a Comparator. This TreeSet will then order data elements using the Comparator.
You can copy data from one collection to another, and get the data re-ordered as you wish. The following fragment of code illustrates the approach:
public static void main(String[] args) {
TreeSet<Box> data = new TreeSet<Box>();
data.add( new Box(7, 8, 9, "red"));
data.add( new Box(3, 13, 12, "yellow"));
data.add( new Box(11, 14, 22, "blue"));
data.add( new Box(14, 13, 33, "black"));
data.add( new Box( 28, 1, 18, "green"));
data.add( new Box(14, 13, 33, "black"));
System.out.println("The boxes in their natural order");
for(Box b: data)
System.out.println(b);
TreeSet<Box> data2 = new TreeSet<Box>(new BoxComparator());
data2.addAll(data);
System.out.println("The boxes in their volume/colour order");
for(Box b: data2)
System.out.println(b);
}