Chris Chris - 4 months ago 10
Java Question

Parsing "1.5 hours" from Stanford Core NLP

Core NLP is parsing strings like:


1.5 hours


as a one hour duration with the following code:

def getPeriods(text: String): Seq[Period] = {
parse(text).filter(timexAnn => {
val timeExpr: TimeExpression = timexAnn.get(classOf[TimeExpression.Annotation])
timeExpr.getValue.getType == duration
}).map(timexAnn => {
val timeExpr: TimeExpression = timexAnn.get(classOf[TimeExpression.Annotation])
val period= timeExpr.getTemporal.getDuration.getJodaTimePeriod
log.debug("Parsed period: " + TimeUtils.getHourMinutePeriodFormatter.print(period))
period
})


}

I am taking the first and only member of the resulting Seq[Period]. I've been playing around with the online demo and this behavior seems to be expected. Perhaps I have missed something? If not, is there a better alternative?

Answer

It appears Core NLP and SuTime do not parse decimal hours. I wrote a simple function in Scala to convert a string like "1.5 hours" into a string SuTime understand like "1 hour and 30 minutes". I then pass this string to the parser and everyone is happy.

def getReadableDurationString(durationString: String): String = {
    val hoursAndMins = "([0-9])(\\.[0-9]+) hour[s]?".r
    val minsOnly = "[0-9]?(\\.[0-9]+) hour[s]?".r
    durationString match {
      case hoursAndMins(hours: String, mins: String) =>
        s"${hours.toDouble} hours and ${Math.round(mins.toDouble * 60)} minutes"
      case minsOnly(mins: String) =>
        s"${Math.round(mins.toDouble * 60)} minutes"
      case _ => durationString
    }
  }
}
Comments