Meeting/Flooding/Lists

Nik Clayton nik at freebsd.org
Fri Aug 9 21:18:15 BST 2002


--OxDG9cJJSSQMUzGF
Content-Type: multipart/mixed; boundary="THYEXwetZJOK3OLY"
Content-Disposition: inline


--THYEXwetZJOK3OLY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Aug 09, 2002 at 06:58:38PM +0100, Nik Clayton wrote:
> On Fri, Aug 09, 2002 at 09:49:38AM +0100, Jonathan Belson wrote:
> > On Thursday 08 August 2002 9:47 pm, Nik Clayton wrote:
> >=20
> > > Here's the one that draws package graphs.  It's definitely not rocket
> > > science.
> >=20
> > Nice, ta.
> >=20
> > Another thing: what was the name of the perl package that spidered
> > web pages and gave you a list of referenced urls?
>=20
> WWW::Sitemap;

Here's some scarier, and definitely unfinished code.  I originally wrote
it targetted at the FreeBSD site -- if you point it there, you probably
want to renamed %deflinks2 back to %deflinks. =20

The general problem that tried to solve was that most pages link to a
set of 'core' pages -- the home page, the search page, the copyright
page, and so on.  If you include all those default links then your graph
rapidly becomes useless, because of all these default edges flying
around the place.  So the deflinks stuff was to list pages that were the
target of these default links, and ignore them, unless they were linked
to at least twice.

This needs considerably more work before it's generally useful.

Other useful graph related things I've found:

http://cpan.valueclick.com/modules/by-module/Graph/Graph-ReadWrite-1.07.rea=
dme

    A generic Perl interface for reading/writing graph config files.

http://www.chaosreigns.com/code/sig2dot/

    PGP/GPG keyring 'web of trust' graph generator

http://www.chaosreigns.com/code/cinclude2dot/

    Graphs C #include file dependencies

http://www.chaosreigns.com/code/apache2dot/

    Graphs Apache log files, and shows the routes that visitors are
    taking through the site.  More popular routes are shown thicker,
    and in different colours.

    I love this community.  When I spoke about this on Wednesday I
    suggested it as useful application.  I had no idea someone had
    already written it :-)

N
--=20
FreeBSD: The Power to Serve      http://www.freebsd.org/               (__)
FreeBSD Documentation Project    http://www.freebsd.org/docproj/    \\\'',)
                                                                      \/  \=
 ^
   --- 15B8 3FFC DDB4 34B0 AA5F  94B7 93A8 0764 2C37 E375 ---         .\._/=
_)

--THYEXwetZJOK3OLY
Content-Type: application/x-perl
Content-Disposition: attachment; filename="w2d.pl"
Content-Transfer-Encoding: quoted-printable

#!/usr/bin/perl -w=0A#  Copyright (c) 2002 Nik Clayton=0A#  All rights rese=
rved.=0A# =0A#  Redistribution and use in source and binary forms, with or =
without=0A#  modification, are permitted provided that the following condit=
ions=0A#  are met:=0A#  1. Redistributions of source code must retain the a=
bove copyright=0A#     notice, this list of conditions and the following di=
sclaimer.=0A#  2. Redistributions in binary form must reproduce the above c=
opyright=0A#     notice, this list of conditions and the following disclaim=
er in the=0A#     documentation and/or other materials provided with the di=
stribution.=0A# =0A#  THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUT=
ORS ``AS IS'' AND=0A#  ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NO=
T LIMITED TO, THE=0A#  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FO=
R A PARTICULAR PURPOSE=0A#  ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR O=
R CONTRIBUTORS BE LIABLE=0A#  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL=
, EXEMPLARY, OR CONSEQUENTIAL=0A#  DAMAGES (INCLUDING, BUT NOT LIMITED TO, =
PROCUREMENT OF SUBSTITUTE GOODS=0A#  OR SERVICES; LOSS OF USE, DATA, OR PRO=
FITS; OR BUSINESS INTERRUPTION)=0A#  HOWEVER CAUSED AND ON ANY THEORY OF LI=
ABILITY, WHETHER IN CONTRACT, STRICT=0A#  LIABILITY, OR TORT (INCLUDING NEG=
LIGENCE OR OTHERWISE) ARISING IN ANY WAY=0A#  OUT OF THE USE OF THIS SOFTWA=
RE, EVEN IF ADVISED OF THE POSSIBILITY OF=0A#  SUCH DAMAGE.=0A# =0A#  $Id$=
=0A=0Ause strict;=0A=0Ause WWW::Sitemap;=0Ause LWP::UserAgent;=0Ause GraphV=
iz;=0A=0Amy $root =3D "http://ukug.uk.freebsd.org/";=0Amy $ua   =3D new LWP=
::UserAgent;=0A=0Amy $sitemap =3D new WWW::Sitemap(EMAIL      =3D> 'nik at Fre=
eBSD.org',=0A			       USERAGENT  =3D> $ua,=0A			       ROOT       =3D> $ro=
ot,=0A			       DEPTH      =3D> 2);=0A=0A$sitemap->url_callback(=0A		      =
 sub {=0A			   my ( $url, $depth, $title, $summary ) =3D @_;=0A			   print =
STDERR "URL: $url\n";=0A			   print STDERR "DEPTH: $depth\n";=0A			   print=
 STDERR "TITLE: $title\n";=0A			   print STDERR "SUMMARY: $summary\n";=0A		=
	   print STDERR "\n";=0A		       }=0A		      );=0A=0A$sitemap->generate();=
=0A=0Amy $graph =3D GraphViz->new(node =3D> {shape =3D> 'box'},=0A			  rank=
dir =3D> 0,=0A			  concentrate =3D> 1,=0A			 );=0A=0Amy @links;=0Amy %title=
s =3D ();=0A=0A# These are URLs that almost every page links to.  Later on,=
 we only=0A# include a link in the graph if it's the second link on the pag=
e, to=0A# ignore the default links.=0Amy %deflinks =3D ();=0Amy %deflinks2 =
=3D ($root . "index.html"                     =3D> 1,=0A		$root . "ports/in=
dex.html"              =3D> 1,=0A		$root . "support.html"                  =
=3D> 1,=0A		$root . "docs.html"                     =3D> 1,=0A		$root . "co=
mmercial/commercial.html"    =3D> 1,=0A		$root . "search/search.html"      =
      =3D> 1,=0A		$root . "search/index-site.html"        =3D> 1,=0A		$root=
 . "mailto.html"                   =3D> 1,=0A		$root . "copyright/index.htm=
l"          =3D> 1,=0A		);=0A=0Amy %ignore =3D ($root . "search/index-site.=
html" =3D> 1,=0A	      );=0A=0Aforeach my $url ($sitemap->urls()) {=0A    i=
f($sitemap->is_internal_url($url)) {=0A=0A	next if exists $ignore{$url};=0A=
=0A	@links =3D ();=0A	@links =3D $sitemap->links($url);=0A=0A	my %linkcount=
 =3D ();=0A=0A	if(! exists $titles{$url}) {=0A	    $titles{$url} =3D $sitem=
ap->title($url);=0A	}=0A=0A	$graph->add_node($url);=0A#	$graph->add_node($t=
itles{$url});=0A=0A	next if $url =3D~ /org\/ports/;=0A	next if $url =3D~ /o=
rg\/doc\//;=0A=0A	foreach my $link (@links) {=0A	    # Skip a few default l=
inks=0A#	    next if $link eq $root;=0A#	    next if $link eq $root . "inde=
x.html";=0A=0A#	    if($url ne $root) {=0A#		$linkcount{$link}++;=0A#		if((=
exists $deflinks{$link}) && ($linkcount{$link} =3D=3D 1)) {=0A#		    next;=
=0A#		}=0A#	    }=0A=0A	    if(! exists $titles{$link}) {=0A		$titles{$link=
} =3D $sitemap->title($link)=0A	    }=0A=0A            # Add the page's tit=
le, or it's URL?=0A            $graph->add_node($link);	=0A	    $graph->add=
_edge($url, $link);=0A=0A#	    $graph->add_node($titles{$link});=0A#	    $g=
raph->add_edge($titles{$url} =3D> $titles{$link});=0A	}=0A    }=0A}=0A=0Apr=
int STDERR "Generating graph...\n";=0A=0Aopen(F, '>graph.ps'); print F $gra=
ph->as_ps(); close(F);=0Aopen(F, '>graph.png'); print F $graph->as_png(); c=
lose(F);=0Aopen(F, '>graph.dot'); print F $graph->as_canon(); close(F);=0A
--THYEXwetZJOK3OLY--

--OxDG9cJJSSQMUzGF
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (FreeBSD)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAj1UI4YACgkQk6gHZCw343V8nACghTqITpTycCoJxT7tV/lxZH1/
rHsAn3eHk9g/jbgrKEMbDz2f6OgvMM6Q
=Qpxe
-----END PGP SIGNATURE-----

--OxDG9cJJSSQMUzGF--




More information about the Ukfreebsd mailing list