weblog by Urs Gehrig

A weblog about libre software, law, technology, politics and the like.

22. April 2008

How to OCR multipage PDF files
@ 16:46:53

The OCR applied here only serves for reasons of indexing PDF files. The page layout will get lost. Nevertheless, the following three steps help you to convert multipage PDF files to a single text file:

$ convert -density 150 foo.pdf ./tesseract/tmp/p%02d.tif
$ montage.exe ./tesseract/tmp/*.tif -tile 1x -mode concatenate ./tesseract/tmp/foo.tif
$ tesseract.exe ./tesseract/tmp/foo.tif output -l eng

For reasons of simplicity the TIF files p00.tif to pXY.tif will get concatenated together to a single TIF file, that has the width of a single page and the height of XY pages. In such a way at least the order of the text or the text flow respectively will be preserved. But one could also concatenate a mosaic of all the TIF files. The density of 150 (dpi) gives reasonable results with tesseract.

Comments (4) Permalink

06. April 2005

Freeflux - Content Management at its best
@ 09:39:05

Good news from Bitflux [1]:
You can now preregister for your free BxCMS. Go to and fill in the simple form.

We will inform you, as soon as it's ready for you.

[Updated on: Tue, 05 April 2005 10:31]

Comments (0) Permalink

08. February 2005

Wikipedia auf CD-ROM
@ 23:45:20

Heute habe ich mich mal der Version von Wikipedia auf CD-ROM - auch etwa "offline" Version genannt - angenommen [1]. Die CD wird bereits nicht mehr neu aufgelegt, da am 17. März 2005 die Wikipedia-DVD erscheint. Dennoch lohnt sich der Blick in die freie Enzyklopädie, die dann direkt ab der Harddisk läuft. Der Reader für Windows (Digigbib4), MaxOS X (MacDigibib) und Linux (Digibux) ist zum Offline-Genuss Voraussetzung.

Unter Windows lässt sich das ISO auch mittels den Daemontools [2] direkt mounten, so dass sich ein vorübergehendes Brennen auf eine CD-ROM erübrigt.

Parallel laufen auch Projekte, Wikipedia mittels eines Knoppix-ähnlichen Systems direkt ab CD-ROM laufen zu lassen (e.g. via Morphix oder Lamppix) [3,4].


Comments (0) Permalink

Upcoming DRM-less music store
@ 16:16:51

[via Steve, 1] Michael Robertson - founder of Lindowsspire - is going to announce a new, DRM-less online music store next week called MP3tunes [2].


Comments (0) Permalink

31. January 2005

Why does the web work?
@ 22:52:15

Tell me if I am wrong, but wasn't it one of the key features of HTML at the time of its introduction to have pages linked by <a href=''>link</a>? Linking from inside of page A.html to a page B.html. But if one renames page B.html to C.html and does not rewire it by hand or use rewrite tricks, one gets what currently the NZZ online is playing with. Is it intention or are they just not aware of? Please give us the web back. ;)

Unfortunately [2] is not a working substitute to [1] as one would expect:


Addon: Really positive is the new redirection to the print-edition of a certain page [3]:


Update: Cool URIs don't change - just refound that link again [4] ;)


Comments (0) Permalink

24. January 2005

s9y podcast plugin by Hannes
@ 11:09:31

Hannes [1] hacked a PodCasting plugin for s9y and has it up and running on his test blog [2]. There are on the other hand some good arguments, why PodCasting is not always cool [3].


Comments (0) Permalink

24. November 2004

University of Bern with Apache Lenya
@ 09:57:38

To motivate all faculties of a University to change their websites [e.g. 1] to a corporate style is quite a bit of work. The University of Bern has chosen a two-step program towards a CMS handled solution. Firstly, all the pages have been redesigned to the new style; most of that part is already realized. Secondly, they intend to move those pages to the CMS, which is said to be Apache Lenya [2]. My first idea was, that this is some kind of Sysiphus-work but while redesigning the wiso page, I recognized that I rather like the two-step concept ;)


Comments (2) Permalink

16. November 2004

Outlook calendar to RSS
@ 23:22:31

Yet another way to share Outlook calendar events is to have a look at Blogwave [1] in conjunction with a calendar adapter [2]:
BlogWave is an "RSS Generator": a tool which can pull information from a variety of sources and publish it as RSS. This process is very easy to configure and can be scheduled to run automatically. For example, using BlogWave you can create an RSS feed from Sharepoint announcements on your company's internal site. Or you can publish event logs as RSS.
Other approaches are outlined in CalendarTools.


Comments (0) Permalink

12. November 2004

New EditThisPagePHP Micro-CMS version
@ 14:25:27

EditThisPagePHP got released as Version 0.5b3 [1].


Comments (0) Permalink

11. November 2004

Click-fix a PDF with Shimbun
@ 18:12:01

Shimbun offers a pretty simple way to create PDF files [1]. If you rather like to convert HTML tables to PDF have a look at this project [2].


Comments (0) Permalink

05. November 2004

Bitflux is hiring
@ 10:09:49

chregu [1] of Bitflux GmbH is looking for a developer with strong skills in XSLT and PHP. Bitflux has strong competences in realizing custom client projects while mostly using their own Bitflux CMS (BxCMS). An other interesting product is their Bitflux Editor, a Wysiwyg XML Editor. Most if not all of their tools are Open Source.


Comments (0) Permalink

22. October 2004

Googles Desktop Search privacy policy
@ 12:09:57

Read at [1]
What information does Google receive?

By default, Google Desktop Search collects a limited amount of non-personal information from your computer and sends it to Google. This includes summary information, such as the number of searches you do and the time it takes for you to see your results, and application reports we'll use to make the program better. You can opt out of sending this information during the installation process or from the application preferences at any time.

Personally identifying information, such as your name or address, will not be sent to Google without your explicit permission.
The last paragraph seems the tricky one. From reading that sentence it looks as if Google has access or is able to access or/and collect address information on your computer. The term "without your explicit permission" looks like the enduser is only a click away from letting Google effectively use that information. We will probably see/read more in the blogosphere sooner or later. ;)

Update: has just another view of the circumstances; on an OS that only pretends security, there is nothing really protected.


Comments (0) Permalink

26. August 2004

Book on Demand
@ 10:33:17

I am dropping this link here as a reminder for myself: Book on Demand [1]. If this is the way of straighter bookmaking for authors, why not. There is another service in Germany as far as I know, called [2].


Comments (0) Permalink

12. August 2004

GMX with 1GB of Mailbox size
@ 14:20:59

Amazing; while having a look at my GMX mail account I found indicated:
1,6 MB von 1 GB in Mailbox und MediaCenter belegt...
Nice gesture of GMX for a free mail account. I just found out that I can file this post under "Content Management" now ;-)

Comments (0) Permalink

09. August 2004

Minimalistic CMS
@ 14:40:13

Edit This Page PHP [1] is a PHP script that can be uploaded to any webhost that supports PHP. It allows for the HTML content of a page to be edited by a link on that page. Only two files are required: the core PHP script (editthispage.php) and a data file for each page. The core file can support as many pages as desired. Get more background information on that lightweight Wiki from Christopher Allen [2], one of its authors.


QR Code for your mobile phone.
Comments (0) Permalink

29. June 2004

Easypay by Swisscom for micro content
@ 22:04:09

Swisscom has a cool product called Easypay [1]. You buy a prepayed card with denominations of CHF 25.00, 50.00 and 75.00 at local stores, Kiosks, Post Offices etc. If you like to buy a product via an online store or like to use an online service you simply enter the Easypay number in the window on the provider’s website and press "buy". The Easypay service is also available in combination with mobile phones.

This is very very practical for providers of microcontent such as blogging services with MMS or mp3 music shops. I have no idea how simple it is to implement that service on the provider's side; but it is worth a try.


Comments (2) Permalink

10. March 2004

CSS by monorom
@ 19:30:28

monorom [1] launched her CSS knowledge site intensivstation; cool and clear designs with a lot of resources about CSS.


Comments (0) Permalink

09. March 2004

Zulu - a website assembler
@ 23:33:26

The Zulu [1] web generator implementation consists of an Excel table and some Visual Basic code. The layout and structure of the website is adjusted by editing the Excel table. Zulu's main idea is the separation of layout, structure and content of a website. By processing them the Zulu web generator creates static websites which can be sent to the web server.


Comments (0) Permalink


Beitrge von Dritten:

Nachfolgende Titel verweisen auf von mir gelesene Weblogs.


Blog Content
Blog Comments



Good question, but...
Hi, thank you very...
Unter http://www.s...
Ich weiss mir nich...
ThanQ matthias. Th...
in case you just w...
ich liebe dir, urs...
hi there, sorry i...
Hoi Leo. I haven'...
Do you know the si...


Blog stack:

Bill Humphries
Wendy M. Seltzer
Christian Stocker
Roger Fischer
Sandro Zic
Wez Furlong
Ben Hammersley
George Schlossnagle
Joichi Ito
Lawrence Lessig
Derek Slater
Karl-Friedrich Lenz
John Palfrey
Bernhard A.M. Seefeld
Gregor J. Rothfuss
Rainer Langenhan
Elke Engel
Sebastian Bergmann
Simon Willison
Jeremy Zwaodny
Udo Vetter
Axel A. Horns
Miguel de Icaza
Andreas Halter
Silvan Zurbrügg
Hannes Gassert
Markus Koller

$Date: 2005/11/05 11:14:30 $