Sorting in Java – Letters First
In my last post there was some Java code for bundling up PDF files. In this post, it’s back!
This time with the focus being on the comparison method of the filenames. The problem with the default sort is that a number will always be ranked higher than a letter.
The files I’m trying to join are named hierarchically, like this:
- Page 0
- Page 0.0
- Page 0.0.Llama
- Page 0.1
- Page 0.1.Snake
- Page 0.2
- Page 0.2.Fish
- Page 0.0
- Page 1
- Page 1.0
- Page 1.0.Carrot
- Page 1.0.Parsnip
- Page 1.1
- Page 1.1.0
- Page 1.1.0.Honeydew
- Page 1.1.0.Watermelon
- Page 1.1.1
- Page 1.1.1.Bacon
- Page 1.1.1.Sausage
- Page 1.1.Lemon
- Page 1.1.0
- Page 1.2
- Page 1.2.Lamp
- Page 1.0
- Page 2
- Page 2.0
- Page 2.0.Golf
- Page 2.0.Rugby
- Page 2.0.Skiing
- Page 2.0
There are title pages for categories and subcategories, and information pages for items within those categories. The problem with standard naming and the naming of these files can be demonstrated by the position of the highlighted item ‘lemon’. It is a direct descendant of Page 1.1. Yet It ends up positioned behind the subcategories because it ceases to have a numeric name.
This is the updated Merge.java file that forces letters to come before numbers, yet still keep them sorted correctly within themselves.
import java.io.File;
import java.util.Arrays;
import java.util.Comparator;
import org.apache.pdfbox.util.PDFMergerUtility;
public class Merge {
public static void main(String[] args) {
if(args.length > 0 ) {
Merge m = new Merge(args[0], args[1]);
} else {
System.out.println("Usage: ");
System.out.println("Merge inputFolderName outputFileNamePrefix");
}
}
public Merge(String from, String to) {
// load util
PDFMergerUtility ut = new PDFMergerUtility();
// get files
File dir = new File(from);
String[] pdfs = dir.list();
Arrays.sort(pdfs, new AlphaComparator());
for (int i=0; i<pdfs.length; i++) {
// add to pdf
ut.addSource(from + File.separator + pdfs[i]);
}
// save
ut.setDestinationFileName(to + "_out.pdf");
try {
ut.mergeDocuments();
} catch (Exception e) {
e.printStackTrace();
}
}
class AlphaComparator implements Comparator {
public int compare(Object o1, Object o2) {
String s1 = (String)o1;
String s2 = (String)o2;
char[] c1 = s1.toCharArray();
char[] c2 = s2.toCharArray();
for(int i=0; i<((c1.length < c2.length) ? c1.length : c2.length); i++) { if(c1[i] == c2[i]) { // both same, skip continue; } else { if(Character.isDigit(c1[i])) { if(Character.isDigit(c2[i])) { // both numeric return (c1[i] > c2[i]) ? 1 : -1;
} else {
// number vs a letter, promote letter
return 1;
}
} else if(Character.isDigit(c2[i])) {
// letter vs number, promote letter
return -1;
} else {
// both letters
return (c1[i] > c2[i]) ? 1 : -1;
}
}
}
// matched to length of short string, shortest wins
return ( s1.length() > s2.length() ) ? 1 : -1;
}
}
}
which sorts such that I get a list like this:
- Page 0
- Page 0.0
- Page 0.0.Llama
- Page 0.1
- Page 0.1.Snake
- Page 0.2
- Page 0.2.Fish
- Page 0.0
- Page 1
- Page 1.0
- Page 1.0.Carrot
- Page 1.0.Parsnip
- Page 1.1
- Page 1.1.Lemon
- Page 1.1.0
- Page 1.1.0.Honeydew
- Page 1.1.0.Watermelon
- Page 1.1.1
- Page 1.1.1.Bacon
- Page 1.1.1.Sausage
- Page 1.2
- Page 1.2.Lamp
- Page 1.0
- Page 2
- Page 2.0
- Page 2.0.Golf
- Page 2.0.Rugby
- Page 2.0.Skiing
- Page 2.0