1290 1290 - 3 months ago 23
Scala Question

NLP Email Validation

If I have a set of emails which I retrieve from a Hive Table called users in this spark code below:

val sparkConf = new SparkConf().setAppName("YOUR_APP_NAME").setMaster("local[10]")
val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
val hiveContext = new HiveContext(sc)

hiveContext.setConf("hive.metastore.uris", "METASTORE_URI_NAME_HERE")

val df = hiveContext.sql("SELECT email FROM USERS")


Now df consists of a dataframe with one row of all email addresses. Is there a way in scala where I can validate the email addresses for example something like this: (https://pypi.python.org/pypi/validate_email) except this one is in python I need one in scala. Or would NLP be a good use case for this as well?

I am stuck on how to validate these email addresses and I need more than some Regex. I need a way to check if the domain of the email address has an SMTP server.

Something like this (except in scala):

is_valid = validate_email('example@example.com',check_smtp_connection = True)

Answer

You definitely don't need natural language processing to validate email. You should use javamail for that, it supports SMTP validation.

Also note that the only possible way to check if email really exists -- send user unique link and ask to follow it.