link checker

June 8th, 2006 matthias

I was looking around for an easy-to-use, no-fuss command line tool to check the links on a web site. First I tried wget:

wget -o wget.log -nv -r -p <site>

The resulting wget.log contains all the links that were followed. It’s easy to spot the errors but there is no obvious way to get hold of the referrer.

Next was linkchecker:

linkchecker -t3 --no-warnings -Fblacklist/blacklist.out http://<site> > linkchecker.log

This produces a list of broken links in blacklist.out. There is no referrer information in that, but one can get hold of it by cross-referencing the full log in linkchecker.log. That is not entirely trivial though; it’s certainly beyond grep. More significantly, linkchecker seems to run forever and checking the same links over and over again - I gave up after it had spent 1 hour and checked 100,000 links on a site that contains no more than a few hundred actual links.

Finally, I tried linklint:

linklint -error -warn -xref -forward -out linklint.out -net -http -host <site> /@

This completed in a few minutes and produced a nice report in linklint.out. The report contains a summary of the kinds of links, files and errors found, a per-referrer break-down of all broken links, and a list of all moved URLs referenced by the site. This is pretty much exactly what I was after!

All three tools are available as debian packages. linklint development seems to have stopped a few years ago, yet it was the best of the bunch for what I was trying to achieve. YMMV.

Entry Filed under: Technology, Tools

1 Comment Add your own

  • 1. nosebreaker.com  |  September 28th, 2006 at 9:53 pm

    Linklint is available at www.linklint.org

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed

Calendar

June 2006
M T W T F S S
« May   Jul »
 1234
567891011
12131415161718
19202122232425
2627282930  

Most Recent Posts