English

Twitter introduces sane URL wrapping

Up to now, every time I decided to look into using Twitter I was quickly annoyed by all those people posting shortened URLs, giving no hint about what server you will be redirected to after clicking the link. Being a person who looks at every URL before clicking it, I got bored by having to install tools to verify those URLs, and came to the conclusion that Twitter just sucks. I always thought: "Is it so hard to extend the 140 character limit just for URLs?"

Seems like Twitter finally read my thoughts. From their newsletter:

In the coming weeks, we will be expanding the roll-out of our link wrapping service t.co, which wraps links in Tweets with a new, simplified link. Wrapped links are displayed in a way that is easier to read, with the actual domain and part of the URL showing, so that you know what you are clicking on. When you click on a wrapped link, your request will pass through the Twitter service to check if the destination site is known to contain malware, and we then will forward you on to the destination URL. All of that should happen in an instant.

The time they are activating this change will be the time I start twittering. What a pity that it is Twitter who is implementing this and not Status.net, but hey, at least someone had the brainwave.

She rules my life


Completely!

ExiFlow - Designing a workflow for digital photographers on Linux

This is one of the things I put my hands on for the past few months. Now that we have released a public alpha version, you are invited to test the whole thing:

ExiFlow

ExiFlow is a set of tools (command line and GUI) to provide a complete digital photo workflow for Unixes. EXIF headers are used as the central information repository, so users may change their software at any time without loosing any data.

Read on to learn more about ExiFlow

Gallery RSS feed reborn

You may have wondered why my gallery's RSS feed wasn't updating for quite some time now. Instead of boring you with explanations for that, I just want to announce that the feed is now working correctly, providing links to new and updated albums.

My extended del.icio.us post bookmarklet for Firefox

Don't panic, I won't join the masses in praising del.icio.us for it's greatness. I have just used it for quite some time now and got slightly addicted to it. Of course I don't use it without any alterations to the standard workflow — that would be dishonourable for any programmer. One thing that I use almost daily is an extended post bookmarklet. It differs from the default posting bookmarklet mainly in these aspects:

  • Selected text get's quoted and inserted into the extended description. Thus, if you select two words on a page and click the post bookmarklet, the extended description will be "two words", including the quotes.
  • If no text is selected, the bookmarklet tries to find a page description in the HTML meta information. If something usefull is found, it get's used as the extended description.
  • Exceptions are caught and displayed as an alert. Sometimes the JavaScript functions used to get text properties are blocked by security features of your browser. If this happens, the results are displayed and the bookmarklet continues to do it's job.

Feel free to use this bookmarklet. Updated will be announced at this page. To use this bookmarklet you will need to create a bookmark with the following link. One of several ways to do that is to simply drag it to your bookmarks toolbar:

del.icio.us post Version 1.1

Update: Fixed the quotation. Now you should get the complete bookmarklet when copying it.

Additional note: Of course, if you want to use the bookmarklet, you will have to replace my del.icio.us username with yours after copying it. To do that, drag and drop the link to your bookmarks toolbar, right click the newly created bookmark and choose "Properties". In the properties dialog, find the name "rompe" in the bookmark address (it's near the end) and replace it with your username. Press "OK". That's it!

Rescuing data from defective flash media using the Sleuthkit under Linux

Do you have a corrupted flash media and want to rescue as much of your data as possible? Or do you want to be prepared, just in case this happens to you one day? Do you use Linux? Then read on.

Yesterday I was given a defective flash card. It was said to contain several photos, but every attempt to see more than the first DCIM directory resulted in unrecoverable errors on Windows. So I inserted the card into a Linux box, created an image of it using

dd if=/dev/sdb1 of=/tmp/sdb1.img bs=8k

and gave the card back to it's owner. That's important: If you want to experiment with defective or suspect media, then dump it and remove it as soon as possible. Otherwise chances would be high that you only worse the state of the media.

Once I got rid of the original, I created a copy of the dump. It's always a good idea to work on a copy, so:

cp /tmp/hdb1.img /tmp/hdb1.work

Done that, I first tried to mount the copy as a loop device:

mount -t vfat -o loop /tmp/hdb1.work

But the image was so broken that I wasn't able to get mount to accept it as a FAT partition of any type. I also gave the mtools collection a try, but to no avail.

Thinking about other tools to try I remembered the Sleuthkit that I once used to exermine a compromised system. If it is good enough for the big job, it should handle this one with ease, I thought. Sleuthkit is a collection of file system and media management forensic analysis tools. And this great toolkit didn't let me down. Here's what to do:

  • Install Sleuthkit (Doh!). Many distributions already contain packages for it, at least Debian Sid does.
  • Sneak a peak of the file system with fls. In this case it looks like this:
    $ fls -f fat16 -r /tmp/sdb1.work
    d/d 3: DCIM
    + d/d 517: 100MLT19
    ++ r/r 1029: PICT0001.JPG
    ++ r/r 1030: PICT0002.JPG
    ++ r/r 1031: PICT0003.JPG
    ++ r/r 1032: PICT0004.JPG
    ++ r/r 1033: PICT0005.JPG
    ++ r/r 1034: PICT0006.JPG [...]
  • Now we want to extract everything that looks like an picture. We can use the icat tool to do that, and because I like oneliners, I did it with a pipe:
    fls -f fat16 -r /tmp/sdb1.work | fgrep PICT | while read bla; do
    set `echo $bla | tr -d ':+'`
    icat -f fat16 sdb.tmp $2 >/tmp/$3
    done

And voila, that's all! The first nine pictures contain random data caused by the filesystem damage, but the other seventy or so are fine. Even if you happen to get a media that in worse condition than the one I was given — never despair, Sleuthkit contains other tools that you can use to find your data, no matter if it is deleted. It also lets you search for binary signatures in case the filesystem is completely screwed. For example you could want to search for some Exif information found in every picture, like "MINOLTA DIGITAL CAMERA" in this case. Just read around on the web site and in the man pages, try out as you like, you can't damage anything since you are working an a copy that you can create again in seconds.

Good luck, or better, don't buy cheap flash cards. ;-)

Creative fried chicken restaurant owners

Would you ever think about digging into the deeper psychology of naming a fried chicken restaurant without getting in contact with KFC lawyers? Well, some people do, and it's amazingly funny to view the loads of pictures they took of more or less original restaurant signs. Good starting points:

Debian GNU/Linux Desktop Survival Guide

In the recent Debian Weekly News I found a reference to a book project called Debian GNU/Linux Desktop Survival Guide. It seems that the author's intent was to write down everything he would otherwise forget. Other people use weblogs for that. Anyway, I tend to like the book a lot and will recommend it to everyone who is bugging me with questions about Debian all the time. A german translation is missing, but I don't want to complain about it because the logical answer would be "Do it yourself!" and it would be hard to object that.

How to configure spamassassin to use an inexistant address as a trap for spammers

As I promised in a comment to my previous entry I want to show you how I configured spamassassin to use an inexistant mail address as a spam trap to gain better quality of the database it uses for the bayesian filtering. One day I was fed up with the false negatives I had to feed to sa-learn manually to keep the bayes database up to date. So I decided to publish a mail address that I will never use for anything other than poisoning spambots and then let spamassassin score mail to this address so high that it will ever be classified as spam. As a result, spamassassin will remember these messages as spam in its database. In my case the trap address is s.pemtrep@rompe.org. I realized that there are some other addresses that seem to exist in some spammers address lists only, so I simply applied them to the same rule. The rule for spamassassin is quite simple:

header  ROMPE_BADRECIPS      To =~ /(kuk|s.pemtrep|ballepromp)\@rompe\.org/i
score ROMPE_BADRECIPS 9.0
describe ROMPE_BADRECIPS Spam trap recipient

Add something like this to your /etc/spamassassin/local.cf and you are done. Publish the address on the web (but don't forget to mark it as a spam trap since you don't want humans to write to this address!) and soon the spambots will begin to feed your database with high quality spam. This will, of course, increase your traffic a bit, but it will definately lower your amount of false negatives.

The next logical step would be to combine this with something like teergrube and/or temporary host blocking, but we will have to accept the first mail for our database before we can start sanctioning. I will have to think about this. Comments are welcome.

pdfgrep

Inspired by the comments to this NewsForge article about concatenating PDF files I wrote this little script which allows you to find patterns in PDF documents in exactly the same manner that you know from the grep utility with plain text files. If you never heard about grep, then you probably won't be interested in this script.

The syntax is as follows:

pdfgrep [grep options] <pattern>  [file ...]

A comment poster hardcoded the grep options "--context=4 --color=always", but I don't hardcode these because I want to be able to give the context size myself (for four lines just try "pdfgrep -4 ...") and for the colour I have this line in my .bashrc anyway:

export GREP_OPTIONS="--color=auto"

Finally, here comes the script:

#!/bin/sh
# 2004-06-22 Ulf Rompe <ulf@@@@rompe.org>
# Updated for filenames containing whitespace 2005-07-18
if [ $# -lt 2 ]; then
        echo 'Syntax: pdfgrep [grep options] <pattern> <file> [file ...]'
        exit 1
fi
grepopts=""
while [ `echo $1 | cut -c1` == "-" ]; do
        grepopts="$grepopts $1"
        shift
done
pat="$1"
shift
if [ $# -gt 1 ]; then shownames=1; else shownames=0; fi
while [ $# -gt 0 ]; do
        [ "$shownames" == 1 ] && echo $1":"
        pdftotext -layout "$1" - | egrep $grepopts "$pat"
        shift
done

Syndicate content