Fred J. Fred J. - 1 month ago 21
HTML Question

Extracting text from page td element with Cheerio

This Meteor server code uses Cheerio/jQuery to get the value "44 years" from the sixth

td
element in a web page which contains the following html;

It gives undefined, Any idea how to do it? Thanks



<tr>
<td class="label" style="white-space:nowrap">Nmae:</td>
<td>&nbsp;</td>
<td colspan="2" class="bodyText">male</td>
<td colspan="2" class="label">Age:</td>
<td class="bodyText" width="1%">&nbsp;</td>
<td colspan="2" class="bodyText">44 years</td> <--------------
</tr>




$('td[class=label]').each((i, elem) => { //<------ $ is cheerio object
let str = elem.innerHTML;
console.log(str); //<---------- undefined
if (str === '44 years') {
console.log('found it');
let age = elem.nextSibling.nextSibling.innerHTML;
console.log(age);
return false;
}
});

Leo Leo
Answer

In here this selector: $('td[class=label]').each((i, elem) => {

is actually saying "Cycle every TD DOM elements which has the class label", and in your HTML, the only columns that will cycle will be Name, and Age:

<td class="label" style="white-space:nowrap">Nmae:</td>
<td colspan="2" class="label">Age:</td>

So when you do this code:

let str = elem.innerHTML;
if (str === '44 years') {

It would never go inside the "if statement", because the only columns they are cycling doesn't have '44 years', they will be "Nmae:" and "Age:" only.

Also I noticed that you are putting the class attribute of the HTML element first, and on the second element after the "colspan" attribute, that might be confusing when you are writing your code.

So the solution is to change the selector to cycle through each element like this:

//Select all "td" within "tr"
// vvv 
$('tr td').each((i, elem) => { //<------ $ is cheerio object
  let str = elem.innerHTML;
  console.log(str);    //<---------- undefined
  if (str === '44 years') {
    console.log('found it');
    let age = elem.nextSibling.nextSibling.innerHTML;
    console.log(age);
    return false;
  }
});

If you leave it like that it will find the years, but it will also throw an error because the last "td" element will look for its siblings, but they are none because its the last element.

So, if you already found it, then you only have to show the element once found, like this:

//Select all "td" within "tr"
// vvv 
$('tr td').each((i, elem) => { 
  let str = elem.innerHTML;
  console.log(str);    //<---------- String for each column
  if (str === '44 years') {   
    console.log('found it');
    let age = elem;    
    console.log(age);
    return false;
  }
});

Hope it helps.

Leo.