Darth.Vader Darth.Vader - 1 month ago 10
Java Question

extract date from string in Scala efficiently

I want to extract the date (e.g.

2015-01-01
) from multiple strings that have the form in Scala:

val s = "basedir/somedir/tmp/BLAH/2015-01-01.txt"


I know I can do basic string split-trim-strip operations to achieve that, but is there a more cleaner way to do it in Scala? Can I use some nice regex "hidden features" that Scala offers to do this?

I tried this without much success:

val s = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
val regex = "(\\d+)-(\\d+)-(\\d+).txt"
val regex(year, month, date) = s

Answer

Use Pattern matching using regex extractor

val regex = ".*/(\\d{4}-\\d{2}-\\d{2}).txt".r //remove / after .* if you think its not needed.

str match {
  case regex(date) => Some(date)
  case _ => None
}

Use above code instead of below because below one will cause match error at runtime.

val regex(a) = "basedir/somedir/tmp/BLAH/2015-01-01.txt"

Instead of .* in front of the regex. You can use unanchored.

val regex = "(\\d{4}-\\d{2}-\\d{2}).txt".r.unanchored

Scala REPL

scala>  val regex = "(\\d{4}-\\d{2}-\\d{2}).txt".r.unanchored
regex: scala.util.matching.UnanchoredRegex = (\d{4}-\d{2}-\d{2}).txt

scala> val regex(a) = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
a: String = 2015-01-01

Scala REPL

scala> val regex = ".*/(\\d{4}-\\d{2}-\\d{2}).txt".r
regex: scala.util.matching.Regex = .*/(\d{4}-\d{2}-\d{2}).txt

scala> val regex(a) = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
a: String = 2015-01-01

Scala REPL

scala> val str = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
str: String = basedir/somedir/tmp/BLAH/2015-01-01.txt

scala>  val regex = ".*/(\\d{4}-\\d{2}-\\d{2}).txt".r
regex: scala.util.matching.Regex = .*/(\d{4}-\d{2}-\d{2}).txt

scala>
     |     str match {
     |       case regex(date) => Some(date)
     |       case _ => None
     |     }
res21: Option[String] = Some(2015-01-01)

If you want to match directory as well then

scala val regex = ".*/(.*)/(\\d{4}-\\d{2}-\\d{2}).txt".r
regex: scala.util.matching.Regex = .*/(.*)/(\d{4}-\d{2}-\d{2}).txt

scala> val s = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
s: String = "basedir/somedir/tmp/BLAH/2015-01-01.txt"

scala> val regex(dir, date) = s
dir: String = "BLAH"
date: String = "2015-01-01"