stepwise_refinement stepwise_refinement - 2 months ago 7
Javascript Question

Ignoring carriage returns in regular expressions

I am currently attempting to parse a conversation file in Javascript. Here is an example of such a conversation.

09/05/2016, 13:11 - Joe Bloggs: Hey Jane how're you doing? 




( which scans to the end of the line, then uses a repeated group to first check that the new line doesn't start with \d\d/ (which is the start of a date on the next line(s)), and if it doesn't, to capture that entire line as well.

You can make the negative look-ahead a little more specific if you fear that two digits followed by a forward slash could hit any edge cases. It increases the number of steps, but would make it slightly safer.

If a user actually entered a newline followed by a date in that syntax, you might have problems as it would stop matching at that point. I doubt they would also include a comma and a 24-hour time, though, so that could be one way to handle that scenario.


09/05/2016, 23:36 - Jane Doe: Great! Let me give you my travel details:

10/01/2016 @ 6am - Arrive at the station
10/01/2016 @ 7am - Get run over by a drunk horse carriage (the driver and the horse were both sober; the carriage stayed up a bit late to drink)
10/01/2016 @ 7:15am - Pull myself out from under the carriage and kick at its wheels vehemently.

09/05/2016, 23:40 - Joe Bloggs: Haha, sounds great.

This is just an example (with the corresponding fix of adding more specifics to the look-ahead to handle it) just to show how a user might add text that could break that particular revision of the regex.