jjathman jjathman - 1 month ago 14
Java Question

Fastest way to tell if a string is a valid date

I am supporting a common library at work that performs many checks of a given string to see if it is a valid date. The Java API, commons-lang library, and JodaTime all have methods which can parse a string and turn it in to a date to let you know if it is actually a valid date or not, but I was hoping that there would be a way of doing the validation without actually creating a date object (or DateTime as is the case with the JodaTime library). For example here is a simple piece of example code:

public boolean isValidDate(String dateString) {
SimpleDateFormat df = new SimpleDateFormat("yyyyMMdd");
try {
df.parse(dateString);
return true;
} catch (ParseException e) {
return false;
}
}


This just seems wasteful to me, we are throwing away the resulting object. From my benchmarks about 5% of our time in this common library is spent validating dates. I'm hoping I'm just missing an obvious API. Any suggestions would be great!

UPDATE

Assume that we can always use the same date format at all times (likely yyyyMMdd). I did think about using a regex as well, but then it would need to be aware of the number of days in each month, leap years, etc...




Results

Parsed a date 10 million times

Using Java's SimpleDateFormat: ~32 seconds
Using commons-lang DateUtils.parseDate: ~32 seconds
Using JodaTime's DateTimeFormatter: ~3.5 seconds
Using the pure code/math solution by Slanec: ~0.8 seconds
Using precomputed results by Slanec and dfb (minus filling cache): ~0.2 seconds


There were some very creative answers, I appreciate it! I guess now I just need to decide how much flexibility I need what I want the code to look like. I'm going to say that dfb's answer is correct because it was purely the fastest which was my original questions. Thanks!

dfb dfb
Answer

If you're really concerned about performance and your date format is really that simple, just pre-compute all the valid strings and hash them in memory. The format you have above only has ~ 8 million valid combinations up to 2050


EDIT by Slanec - reference implementation

This implementation depends on your specific dateformat. It could be adapted to any specific dateformat out there (just like my first answer, but a bit better).

It makes a set of all dates from 1900 to 2050 (stored as Strings - there are 54787 of them) and then compares the given dates with those stored.

Once the dates set is created, it's fast as hell. A quick microbenchmark showed an improvement by a factor of 10 over my first solution.

private static Set<String> dates = new HashSet<String>();
static {
    for (int year = 1900; year < 2050; year++) {
        for (int month = 1; month <= 12; month++) {
            for (int day = 1; day <= daysInMonth(year, month); day++) {
                StringBuilder date = new StringBuilder();
                date.append(String.format("%04d", year));
                date.append(String.format("%02d", month));
                date.append(String.format("%02d", day));
                dates.add(date.toString());
            }
        }
    }
}

public static boolean isValidDate2(String dateString) {
    return dates.contains(dateString);
}

P.S. It can be modified to use Set<Integer> or even Trove's TIntHashSet which reduces memory usage a lot (and therefore allows to use a much larger timespan), the performance then drops to a level just below my original solution.