Redirected from Malcolm Farmer/Perl script to find new topics
The script assumes that its data files are in the directory /home[?]/scripts. Change these references as required.... The file existing.pages holds the full list of pages from the last time the script ran; to generate your own, do "touch existing.pages", and run the script. First time round, every page is new, and their names are appended to existing.pages, together with the date that the script ran; subsequent runs just append the new topics since the last run of the script.
New pages are listed on STDOUT so I can see that the script did something; the actual usable output goes to the file called new.topics, in a form suitable for a quick cut and paste to the New Topics page.
Alan Millar sent me his script for autoposting pages, so my next step is to automatically post the results as a cron job....
#!/usr/bin/perl
use LWP;
# first we read yesterdays list of all pages
open(INFILE, "/home/scripts/existing.pages");
while (<INFILE>) { $temp=$_; chomp($temp); if ($temp=~/^####/){} #ignore lines with the date else {$existing{$temp}++;} #make a hash of other lines } close (INFILE);
# now retrieve todays list of all pages
# tell anyone browsing the log files what you're up to $queryname="running a script to find New Topics. Queries to: farmermj@XXX.XX.XXXX.xxxx";
$browser->agent($queryname);
$webdoc=$browser->request(HTTP::Request->new(GET => $url)); if ($webdoc->is_success) #...then it's loaded the page OK { print STDOUT "Page loaded OK";
open (OUTFILE,">/home/scripts/new.topics");
open (NEWTOPIC,">>/home/scripts/existing.pages"); $now="#### ".`date`; print NEWTOPIC $now; # log the date
@listing=split(//,$webdoc->content);
$lines=$#listing; for ($i=0; $i <$lines ; $i++) { if ($listing[$i] =~ //wiki//) # find a page record { # extract the pagename ($dummy,$pagename)=split(/">/, $listing[$i]);
if ($existing{$pagename} != 0) {} #ignore if we've already got it else { print $pagename,"";
print OUTFILE "- [[",$pagename,"]] -";
print NEWTOPIC $pagename,"";
} } } close (OUTFILE); close (NEWTOPIC); } else { print STDOUT "Couldn't get it";
}
Other approaches might be to use wget and run diff on the result from the previous day, but this script should be a bit more portable to non-Unix systems.
The above script is pretty trivial, but I hereby make the standard declaration that it is released under the GPL.
Search Encyclopedia
|
Featured Article
|