FAIL of the week: uTidyLib unicode error

This week I’ve been having fun using uTidyLib, a python wrapper for HTML tidy. All was working swimmingly until I hooked it up to a custom form validation function in Django. The Python process on my mac kept crashing and I was wondering what the cause was since it was working fine from the CLI.

After looking at the type of the data coming Django and seeing that it was unicode I realised it might be something to do with what was being passed in:

Python 2.5.1 (r251:54863, Feb  4 2008, 21:48:13) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tidy
>>> tidy.parseString(u'EPIC FAIL')
Bus error

So it would seem that uTidyLib unicode handling is somewhat sub par. Naturally I’ll raise a bug report as soon as I get a mo’! A bug report has already been raised

Another quick tip if you’re using uTidyLib on a mac – it can’t find the tidy library by default until you symlink the built-in dynamic library file as a .so file or apply the patch found here e.g (credit for Evil Rob for finding this out):

ln -s /usr/lib/libtidy.dylib /usr/lib/libtidy.so