Reven’s blog

Fighting hotlinking -the bad and mean way-

A lot has been written on hotlinking. One would think that by now those leechers would have got the idea and that they would have learned enough netiquette to stop SHOUTING IN FORUMS and to not steal your images (and your bandwidth). But, hey, they haven’t. No fuss, we’re smarter. Let’s fight back.

One of the best articles on how to prevent hotlinking was published by “A list apart” (Smarter Image Hotlinking Prevention). It’s a really professional, complete and accessible solution based on a little php coding and some htaccess hacking. I really recommend this approach. But I wanted something meaner, because we want to fight it but we also want to get this idea trough: “Hey! Stop doing that!”. And besides, we want to have fun ;) This can alos be done with htacess rules, but we are not *that* bad, so we are going to tolerate a little bit of hotlinking if it’s a fair use (i.e. the person mentions our site or links back to us).

So here’s my approach: we’ll write a script that checks our logs for hotlinks, we’ll set a cron job to run that script just before our logs are rotated and then mails us the results. After we check the hotlinks we’ll decide if it’s fair use and if it isn’t we’ll give the thief a little surprise (“goatse” is the keyword here). Let’s go.

This is the dirty perl script I use. It’s not clean and it’s not nice. There is plenty of space for improvement:

#!/usr/bin/perl
######################################################################
#      Hotlink Reporter 0.1       $LastChangedRevision: 0.1 $        #
#                 (C) 2005 Reven Sanchez                             #
#         Released under a Creative Commons sa-by-nc license         #
######################################################################

use strict;
use diagnostics; # just for debuging. Should not be in final
use Getopt::Std;
use Time::HiRes qw( gettimeofday tv_interval );

# define some vars most for future use
my (%option,$t0,$elapsed,$archive,@matches,@buffer);
getopts("H", %option);

# we're going to time this script
$t0 = [gettimeofday];

# kernel
print "Hotlink Finder v0.1\n";

# open log file
open (LOGFILE,"/var/log/httpd/access.log")
    or die "Error opening log file: $!"; # adjust to your path
while(<LOGFILE>){
    my $line = $_ ;
    # this mad regex gets all the vars we need and even some we don't
    # nasty line breaks! Should go on one line
    if (/(.*)\[\d{2}\/\D{3}\/\d{4}:\d{1,2}:\d{1,2}:\d{1,2}\s.\d
        {4}\]\s\"\S*\s(\S*)\s\S*\"\s\d{1,3}\s\S*\s\"(.*)\"\s\"(.*)\"/i){
        my $host = $1;
        my $get = $2;
        my $referer = $3;
        my $agent = $4;
        # if $get contains an image extension or any other extension
        # we want to track, we catch it here
        if (($get =~/png/i) or ($get =~/gif/i) or ($get =~/jpg/i) or
            ($get =~/zip/i)or ($get =~/tar\.gz/i)){
            # so it does so we check the referrer
            unless (($referer =~ /^http:\/\/www\.theorangeduck\.com/i)
                or $referer eq "-"){
                # unless the referrer is our site or is blank (direct
                # access or some proxies that block referrers)
                    print "hotlink caught\n";
                    print "$host\n$get\n$referer\n$agent\n\n";
             }
        }
    }else{
        print "Uncaught Regex $line";
        # Some weird error here. More error handling yet to be done
    }
}
close LOGFILE;

# some functions we'll use
if ($option{H}) { print "\n\nHotlink reporter help goes here\n\n"; }

# the time elapsed
$elapsed = tv_interval ($t0, [gettimeofday]);
print ("report generated in $elapsed seconds\n");

Ok. Now we have a script that does the dirty work, but we need a cron job to ease it. So copy this to a file and drop it in /etc/cron.d/ (on debian, other systems may vary):

# /etc/cron.d/hotlink: crontab fragment for hotlink reporter
# By reven

# every day at 3 check for hotlinks. Adjust this time to about
# 30 minutes before your logs are rotated

0 3  * * *   root /path/to/your/script/hotlinkfind.pl

In a normal debian setup all cron output is mailed to the user. In this case root (I’m root on this box; you may have to adjust something if you’re not).

If all goes well we’ll get something like this in our mail:

Hotlink Finder v0.1

hotlink caught
192.168.0.126 - -
/images/my_stolen_image.jpg

http://www.example.com/some_nasty_bandwidth_leeching_page.html

Mozilla/5.0 (Windows) Gecko/20050908 Firefox/1.4

report generated in 0.534186 seconds

So now we have a presumable nasty page. We go visit it and see if the author has linked back to our site or has mentioned the source of the image. If he hasn’t, we look for his contact info and we send him a friendly and polite mail asking him to please not hotlink our images and we explain why hotlinking isn’t a good thing.

If we get a friendly reply and the author changes the image or links to our site, then all is good that ends good. We have contributed to make the web a better place. Feel good about it. Write about it on your blog. Give yourself some treat.

If we get a nasty reply or no reply at all, then we have to go to the wikipedia Goatse definition page and look for an incredibly shocking image (Explore and experimet. Goatse is only a starting point; you can use other sources, like your favourite zoophilic porn site, but my experience says there is nothing so shocking and horrible as goatse) and download it to your server (Don’t hotlink it! Haven’t you heard a word I said?).

Then we change it’s name to the the name the hotlinked file had. Remember to backup your image with another name and change references to that image in your site (I didn’t say we didn’t have to do any work at all, but it’s worth it).

Then the fun begins.

You don’t even have to mail him again. Just check his site. The link to your image will be changed pretty fast.

Just imagine his face.

[Mood: evil :twisted: ]

1 comment

1 Comment so far

  1. Reven October 31st, 2006 20:42

    Due to a problem in WordPress (Bug 2059), the code in this post doesn’t always show up correctly. I’ve patched my actual version of WordPress to fix this problem, but this problem might reappear if I update WordPress (unless, of course, it is addressed).

Leave a reply

Powered by WP Hashcash