That which was old is new again

Well, the cable modem has been working for the past few days. It's hard to decide whether it's because Adam is in the United Kingdom for the week (on bidness), and cable modems don't like Adam, or whether it's because the DSL modem I ordered last week is going to arrive shortly, and cable modems don't like me.

One way or another, I've had access, and I've decided that the biggest bang for my blogging buck would be effected by finally importing all my old Blosxom posts into Blogspot. Blogspot uses the Atom API (Application Programming Interface; a set of commands that you can use to talk to a system from a programming language) for doing stuff with their blogs, and while there are some Perl modules on CPAN providing wrappers around Atom, I found it much easier just to construct the XML using string manipulation and talk to Blogspot's servers using LWP.

I've uploaded a fifty or so posts so far. Blogspot seemed to start choking after those initial uploads, and I have a feeling that they might think I'm a spambot doing a bulk-upload, so I'm going to push out the rest later. (Speaking of spambots, I've turned on the word verification feature for comments as the comment spam has gotten a little ridiculous.) I was able to maintain the original dates for the post, so they'll show up as everything before March 2005 in the Archives section along the right hand side. Unfortunately, Blogspot doesn't let me divide posts up by tagged section, so you won't know if you're reading something from News News or Dining In or whatever other sections I stole from the Times. And I couldn't transfer over the comments, though I have them saved if anyone is interested in what they used to think.

Anyway, what you've all been waiting for: the Perl script that does the importing. I've even commented it so everyone could play along.

#!/usr/bin/perl


use strict;
use warnings;


use HTTP::Request;
use LWP::UserAgent;
use Getopt::Long;
use File::Find;
use Readonly;
use Date::Format;
use File::stat;
use Encode;


#CONSTANTS
Readonly my $FILE_EXTENSION => '.txt';


#Globals
my $PostURL;
my $UserName;
my $Password;


#main
{
my ($sourceDir);

GetOptions(
"username=s" => \$UserName,
"password=s" => \$Password,
"posturl=s" => \$PostURL,
"sourcedir=s" => \$sourceDir,
) or die("Couldn't read command line options for some reason.");

die(usage() . "\n") unless($UserName and $Password and $PostURL and $sourceDir);
die("'$sourceDir' isn't a valid directory.") unless(-d $sourceDir);

find(\&processFile, $sourceDir);
}


sub processFile
{
if(-f $File::Find::name and $File::Find::name =~ /$FILE_EXTENSION$/o)
{
postFile($File::Find::name);
}
}


sub postFile
{
my ($file) = @_;

unless(open(FILE, $file))
{
warn("Couldn't open file '$file' for reading: $!");
return;
}

my $title = <FILE>;
chomp($title);
$title = escapeTitle($title);

my $content = join("", <FILE>);
$content = escapeContent($content);

my $statter = stat($file) or die("Couldn't File::state '$file': $!");
my $dateString = time2str("%Y-%m-%dT%H:%M:%SZ", $statter->mtime());

my $xml = qq(<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<entry xmlns="http://purl.org/atom/ns#">
<title mode="escaped" type="text/plain">$title</title>
<issued>$dateString</issued>
<generator url="http://thirtyfour.blogspot.com/">$0</generator>
<content type="application/xhtml+xml">
<div xmlns="http://www.w3.org/1999/xhtml">$content</div>
</content>
</entry>);

my $userAgent = new LWP::UserAgent();

my $request = new HTTP::Request(POST => $PostURL);
$request->content_type('application/xml');
$request->authorization_basic($UserName, $Password);
$request->content($xml);

my $response = $userAgent->request($request);

if($response->is_success())
{
print("Created entry '$title'.\n");
}
else
{
warn("Couldn't create entry '$title'. POST to $PostURL failed: " . $response->status_line() . "\n");
}
}


sub escapeTitle
{
my ($title) = @_;

$title = escapeGeneral($title);

$title =~ s!<!&lt;!g;
$title =~ s!>!&gt;!g;

return $title;
}


sub escapeContent
{
my ($content) = @_;

return escapeGeneral($content);
}


sub escapeGeneral
{
my ($text) = @_;

$text =~ s/&/&amp;/g;

return encode("utf8", $text);
}


sub usage
{
return qq(usage: $0 --username [name] --password [password] --posturl [url] --sourcedir [blosxomsrcdir]);
}

Comments

Popular posts from this blog

Are you acquainted with our state's stringent usury laws?

Beep Beep Beep Beep, Yeah!

Some small updates