DomingoSL DomingoSL - 20 days ago 5
PHP Question

Counting words on a html web page using php

I need a PHP script which takes a URL of a web page and then echoes how many times a word is mentioned.

Example



This is a generic HTML page:

<html>
<body>
<h1> This is the title </h1>
<p> some description text here, <b>this</b> is a word. </p>
</body>
</html>


This will be the PHP script:

<?php
htmlurl="generichtml.com";
the script here
echo(result);
?>


So the output will be a table like this:

WORDS Mentions
This 2
is 2
the 1
title 1
some 1
description 1
text 1
a 1
word 1


This is something like the search bots do when they are surfing the web, so, any idea of how to begin, or even better, do you have a PHP script which already does this?

Answer

The one line below will do a case insensitive word count after stripping all HTML tags from your string.

Live Example

print_r(array_count_values(str_word_count(strip_tags(strtolower($str)), 1)));

To grab the source code of a page you can use cURL or file_get_contents()

$str = file_get_contents('http://www.example.com/');

From inside out:

  1. Use strtolower() to make everything lower case.
  2. Strip HTML tags using strip_tags()
  3. Create an array of words used using str_word_count(). The argument 1 returns an array containing all the words found inside the string.
  4. Use array_count_values() to capture words used more than once by counting the occurrence of each value in your array of words.
  5. Use print_r() to display the results.