Dan Smith Dan Smith - 1 year ago 237
PHP Question

PHP preg_match match consecutive newline chars

I am trying to prevent certain kinds of posts on my site, which are mostly meant to make it look like they contain some content but are just spam. Specifically, the posts are a few random words, some newline characters, and a random character.

So, I know some legit users might have use for using two newline chars (to create a blank line between paragraphs), but I figure 3+ can be marked as spam.

I tested this regex on regex101 and it works fine, but is never triggered when I test on my site, any ideas as to why? When I uncomment the echo line, it will show me the number 4 for my test data, so I know it sees the newlines.. is my regex formed incorrectly?!

Test data:

This is a potential

spam post


//echo substr_count($lowercaseBody, "\n");
if (preg_match('/\n{3,}./', $lowercaseBody)){
error("Stop Spamming my chan you .");

Answer Source

The data likely contains CRLF's, not just LF's.

The substr_count test does not care about the interleaving CR's, but your regex patterns does.

Use (\r?\n) instead of the \n to allow both CRLF's and LF's:

if (preg_match('/(\r?\n){3,}./',  $lowercaseBody)){
    error("Stop Spamming my chan you .");