Minding Minding - 7 months ago 23
PHP Question

PHP Strange Whitespace is created while writing files. (blocks CSS)

I'm trying to save code via PHP in a *.php file to run it later in an iframe.

Problem:

For some reason the CSS isn't working (HTML + JS are working!), after some experimenting i noticed that the spaces in the file aren't really saperating the CSS code (replacing them with regular keyboard spaces fixed the problem), so i tried the find the difference, but their unicodes were the same (UTF-8: 20).
Even the strict PHP operator (===) says they're the same and 'ord()' returns 32 for both.
As far as I can tell is it created while writing the file.

Example: >JSFiddle<

HTML:

<!--head--><meta http-equiv="content-type" content="text/html; charset=UTF-8">
<input type="text" name="name" />
<textarea name="HTML" [...] spellcheck="false">[HTML in the JSFiddle]</textarea>
<textarea name="CSS" [...] spellcheck="false">[CSS in the JSFiddle]</textarea>
<textarea name="JS" [...] spellcheck="false">[JS in the JSFiddle]</textarea>


PHP:

//mb_internal_encoding() ---> "UTF-8"
$php = fopen($myPath.'/'.$_POST['name'].'.php', 'w');
fwrite($php, '<html><head><style type="text/css">'.$_POST['CSS'].'</style><script type="text/javascript">'.$_POST['JS'].'</script></head><body style="margin: 0;">'.$_POST['HTML'].'</body></html>');
//adding "\xEF\xBB\xBF" doesn't help (all files are in UTF-8)
fclose($php);


Questions:


  • Why are those spaces different?

  • How can I fix it?

  • Has someone an regex for replacing whitespace with a normal space (/\s+/g generates lots of "Â" and spaces)?



UPDATE 1:

All data is in UTF-8:

echo mb_detect_encoding($_POST['HTML'], 'UTF-8, ASCII', true); //UTF-8
echo mb_detect_encoding($_POST['CSS'], 'UTF-8, ASCII', true); //UTF-8
echo mb_detect_encoding($_POST['JS'], 'UTF-8, ASCII', true); //UTF-8


Thank you! - Minding

Answer

Regex for the whitespace part: $str = preg_replace('/[\s\pZ]+/u', ' ', $str); - \s is Posix whitespace, \pZ is Unicode whitespace, and modifier /u is to make PCRE understand UTF-8.

The fact that it worked means that the actual data must have contained some other whitespace. Maybe it is the infamous NBSP (Unicode U+00A0, UTF-8 0xc8a0). But would need to see a hexdump of the original data - not the JSFiddle snippet.