Scripting

Sorting in Java – Letters First

In my last post there was some Java code for bundling up PDF files. In this post, it’s back!

This time with the focus being on the comparison method of the filenames. The problem with the default sort is that a number will always be ranked higher than a letter.

The files I’m trying to join are named hierarchically, like this:

  • Page 0
    • Page 0.0
      • Page 0.0.Llama
    • Page 0.1
      • Page 0.1.Snake
    • Page 0.2
      • Page 0.2.Fish
  • Page 1
    • Page 1.0
      • Page 1.0.Carrot
      • Page 1.0.Parsnip
    • Page 1.1
      • Page 1.1.0
        • Page 1.1.0.Honeydew
        • Page 1.1.0.Watermelon
      • Page 1.1.1
        • Page 1.1.1.Bacon
        • Page 1.1.1.Sausage
      • Page 1.1.Lemon
    • Page 1.2
      • Page 1.2.Lamp
  • Page 2
    • Page 2.0
      • Page 2.0.Golf
      • Page 2.0.Rugby
      • Page 2.0.Skiing

There are title pages for categories and subcategories, and information pages for items within those categories. The problem with standard naming and the naming of these files can be demonstrated by the position of the highlighted item ‘lemon’. It is a direct descendant of Page 1.1. Yet It ends up positioned behind the subcategories because it ceases to have a numeric name.

This is the updated Merge.java file that forces letters to come before numbers, yet still keep them sorted correctly within themselves.

import java.io.File;
import java.util.Arrays;
import java.util.Comparator;
import org.apache.pdfbox.util.PDFMergerUtility;

public class Merge {
   public static void main(String[] args) {
           if(args.length > 0 ) {
                Merge m = new Merge(args[0], args[1]);
            } else {
                System.out.println("Usage: ");
                System.out.println("Merge inputFolderName outputFileNamePrefix");
            }
   }
        
   public Merge(String from, String to) {
      
      // load util
      PDFMergerUtility ut = new PDFMergerUtility();
   
      // get files
      File dir = new File(from);
      String[] pdfs = dir.list();
                
                
      Arrays.sort(pdfs, new AlphaComparator());
                
      for (int i=0; i<pdfs.length; i++) {
         // add to pdf
         ut.addSource(from + File.separator + pdfs[i]);
      }
      
      // save
      ut.setDestinationFileName(to + "_out.pdf");
      
      try {
         ut.mergeDocuments();
      } catch (Exception e) {
         e.printStackTrace();
      }
   }
   
   class AlphaComparator implements Comparator {
      public int compare(Object o1, Object o2) {
         
         String s1 = (String)o1;
         String s2 = (String)o2;
         
         char[] c1 = s1.toCharArray();
         char[] c2 = s2.toCharArray();
         
         for(int i=0; i<((c1.length < c2.length) ? c1.length : c2.length); i++) { if(c1[i] == c2[i]) { // both same, skip continue; } else { if(Character.isDigit(c1[i])) { if(Character.isDigit(c2[i])) { // both numeric return (c1[i] > c2[i]) ? 1 : -1;   
                  } else {
                     // number vs a letter, promote letter
                     return 1;
                  }
               } else if(Character.isDigit(c2[i])) {
                  // letter vs number, promote letter
                  return -1;
               } else {
                  // both letters
                  return (c1[i] > c2[i]) ? 1 : -1;
               }
            }
         }
         // matched to length of short string, shortest wins
         return ( s1.length() > s2.length() ) ? 1 : -1;
      }
   }
}

which sorts such that I get a list like this:

  • Page 0
    • Page 0.0
      • Page 0.0.Llama
    • Page 0.1
      • Page 0.1.Snake
    • Page 0.2
      • Page 0.2.Fish
  • Page 1
    • Page 1.0
      • Page 1.0.Carrot
      • Page 1.0.Parsnip
    • Page 1.1
      • Page 1.1.Lemon
      • Page 1.1.0
        • Page 1.1.0.Honeydew
        • Page 1.1.0.Watermelon
      • Page 1.1.1
        • Page 1.1.1.Bacon
        • Page 1.1.1.Sausage
    • Page 1.2
      • Page 1.2.Lamp
  • Page 2
    • Page 2.0
      • Page 2.0.Golf
      • Page 2.0.Rugby
      • Page 2.0.Skiing