Montenegrodr Montenegrodr - 15 days ago 5
Java Question

How to define the coverage of my nutch crawl?

I've been collecting/crawling a website over the last two weeks. I've used the

crawl
command setting
100
iterations. The process has just finished. How can I know the coverage of the data crawled? I really don't expect an exact number, but I'd really like to know approximately how much information remains un-crawled in the website.

Answer

Thanks, @Jorge. Based on what you've said:

Nutch has no idea of how big/small is the website(s) you're crawling

So, there's no way to calculate that unless you know the size of the website in advance.

Thanks, again.