stepwise_refinement stepwise_refinement - 3 months ago 9
Javascript Question

Ignoring carriage returns in regular expressions

I am currently attempting to parse a conversation file in Javascript. Here is an example of such a conversation.

09/05/2016, 13:11 - Joe Bloggs: Hey Jane how're you doing? 

Answer

Try

(\d{2}\/\d{2}\/\d{4}),\s(\d{1,2}:\d{2})\s-\s([^:]*):\s+(.*(?:\n+(?!\n|\d{2}\/).*)*)

(https://regex101.com/r/sA3sB8/2) which scans to the end of the line, then uses a repeated group to first check that the new line doesn't start with \d\d/ (which is the start of a date on the next line(s)), and if it doesn't, to capture that entire line as well.

You can make the negative look-ahead a little more specific if you fear that two digits followed by a forward slash could hit any edge cases. It increases the number of steps, but would make it slightly safer.

If a user actually entered a newline followed by a date in that syntax, you might have problems as it would stop matching at that point. I doubt they would also include a comma and a 24-hour time, though, so that could be one way to handle that scenario.

Example:

09/05/2016, 23:36 - Jane Doe: Great! Let me give you my travel details:

10/01/2016 @ 6am - Arrive at the station
10/01/2016 @ 7am - Get run over by a drunk horse carriage (the driver and the horse were both sober; the carriage stayed up a bit late to drink)
10/01/2016 @ 7:15am - Pull myself out from under the carriage and kick at its wheels vehemently.

09/05/2016, 23:40 - Joe Bloggs: Haha, sounds great.

This is just an example (with the corresponding fix of adding more specifics to the look-ahead to handle it) just to show how a user might add text that could break that particular revision of the regex.