Mark Fowler mark at
Tue Jan 7 11:05:45 GMT 2003

On Tue, 7 Jan 2003, James wrote:

> It's possible to programatically parse for URLs with regexs, but
> unless this is a bulk exercise...

Parsing text for URLs is really easy with the URI::Find module.  Rather
than explaining it all again here, people wanting to see how this is done
might want to check out my mini-tutorial on URI::Find as part of
this year's Perl Advent Calendar:

Of course you're looking at the text in a rather dumb manner when you do
this.  It won't help you if someone is programatically creating URLs with
javascript, only if the whole url is encoded somewhere in the text.  Of
course, if it is almost nothing will help you if it is, as it's a Hard
problem (i.e. it's NP-Complete) and the only way to find out what the
URLs will actually be is to execute (or simulate executing) the
javascript itself.  This been said, partial solutions like using
URI::Find often work well enough.


