Find Chinese text in HTML using preg_match

I'm attempting to get the text string from a string of HTML.
I would like to capture only the text between tags and skip over any empty tags.

My attempt is current attempt can be found here:

  • I can't use \w since I need to capture Chinese characters

  • I would like only text and not a lot of empty results

I have tried:


//I will fail on nested tags, it capture the first nested tag

And this:


//Finds me all the string, but also includes loads of empty strings
//for adjacent tags ><

Is there any way to exclude < from \X? Or is there a better way to write this so it returns only the text parts?

Answer Source

Try a regex like


This simply matches all text between > and < that isn't all whitespace. See

