Montenegrodr Montenegrodr - 8 months ago 46
Java Question

How to define the coverage of my nutch crawl?

I've been collecting/crawling a website over the last two weeks. I've used the

command setting
iterations. The process has just finished. How can I know the coverage of the data crawled? I really don't expect an exact number, but I'd really like to know approximately how much information remains un-crawled in the website.

Answer Source

Thanks, @Jorge. Based on what you've said:

Nutch has no idea of how big/small is the website(s) you're crawling

So, there's no way to calculate that unless you know the size of the website in advance.

Thanks, again.