Muffinresearch Labs by Stuart Colville

Titlecase.py: Titlecase in python | 22 Comments

Posted in Code on 27th May 2008, 8:29 am by

John Gruber recently published a perl script to convert strings into title case avoiding capitalizing small words based on rules from the New York Times Manual of style as well as catering for several special cases.

Before porting the perl script I tried out Python’s inbuilt title string method to see how well it handled text according to the various rules:

>>> test="this is zed's favorite outburst"
>>> test.title()
"This Is Zed'S Favorite Outburst"

As you can see it really can’t cope with anything remotely complicated.

I originally knocked up a direct port of the script, but I found it a tad unwieldy and so I decided on a fresh approach which processes the text after splitting the strings on whitespace characters.

Before I started both approaches I wrote test cases for all of the examples John gave in his post, and added a few of my own some based on John’s archive of posts later. This made coding a lot easier as it meant once the tests stopped failing I could stop coding. Of course with something like this there’s likely to be more cases that need to be catered for, if you have some ideas as to new test cases please add them in the comments.

The script is flexible from the point of view that you can either import it and use the function itself directly e.g:

>>> from titlecase import titlecase
>>> titlecase('a thing')
'A Thing'

You can pass a file of text to stdin:

$ ./titlecase.py  < ~/title-case-examples 
Q&A With Steve Jobs: 'That's What Happens in Technology' 
What Is AT&T's Problem? 
Apple Deal With AT&T Falls Through 
 
This v That 
This vs That 
This v. That 
This vs. That 
 
The SEC's Apple Probe: What You Need to Know 
 
'By the Way, Small Word at the Start but Within Quotes.' 
 
Small Word at End Is Nothing to Be Afraid of 


Starting Sub-Phrase With a Small Word: A Trick, Perhaps? 
Sub-Phrase With a Small Word in Quotes: 'A Trick, Perhaps?' 
Sub-Phrase With a Small Word in Quotes: "A Trick, Perhaps?" 

"Nothing to Be Afraid Of?" 
"Nothing to Be Afraid Of?" 

A Thing 

2lmc Spool: 'Gruber on OmniFocus and Vapo(u)rware'

Finally running the script from the command line without args results in the tests being run:

$ ./titlecase.py
Testing: a thing ... ok
Testing: Apple Deal With AT&T Falls Through ... ok
Testing: The SEC's Apple Probe: What You Need to Know ... ok
Testing: What Is AT&T's Problem? ... ok
Testing: this is just an example.com ... ok
Testing: this is something listed on an del.icio.us ... ok
Testing: Generalissimo Francisco Franco: Still Dead; Kieren McCarthy: Still a Jackass ... ok
Testing: iTunes should be unmolested ... ok
Testing: "Nothing to Be Afraid of?" ... ok
Testing: "Nothing to Be Afraid Of?" ... ok
Testing: Q&A With Steve Jobs: 'That's What Happens In ... ok
Testing: Seriously, ‘Repair Permissions’ Is Voodoo ... ok
Testing: Sub-Phrase With a Small Word in Quotes: "a Trick, ... ok
Testing: Small word at end is nothing to be afraid of ... ok
Testing: 'by the Way, small word at the start but within ... ok
Testing: Sub-Phrase With a Small Word in Quotes: 'a Trick, ... ok
Testing: Starting Sub-Phrase With a Small Word: a Trick, ... ok
Testing: this v that ... ok
Testing: this v. that ... ok
Testing: this vs that ... ok
Testing: this vs. that ... ok
Testing: Reading Between the Lines of Steve Jobs’s ‘Thoughts on ... ok
Testing: 2lmc Spool: 'Gruber on OmniFocus and Vapo(u)rware' ... ok

----------------------------------------------------------------------
Ran 23 tests in 0.019s

To get the code check it out with bzr with the command:

bzr branch lp:titlecase.py

OR:

easy_install titlecase

Or download it from here: https://launchpad.net/titlecase.py/trunk/0.2

Post Tools

  • Pingback: ≈ Relations › links for 2008-08-14

  • http://tincorporated.com Tom Watson

    Very cool! Do you know of a Django filter version of this script anywhere?

  • James

    I’m somewhat of a novice but I seem to observe that the tests pass because the inputs are already in title case. If i add text=text.lower() at the start of the function the test cases fail.

  • http://muffinresearch.co.uk Stuart Colville

    @James: It’s not quite that simple as lowering all of the text at input is likely to break quite a lot of tests as some of the functionality takes into account the case of the input.

    However you do have a point that a few of the test cases had the same input as the expected output and in the process of addressing that I found a small bug where a special quote was missing from the punct regex.

    I’ve updated the tests, cleaned up the code and released it as version 0.2.

    https://launchpad.net/titlecase.py/trunk/0.2

    Thanks for your comment — it was definitely a worthwile observation!

  • http://benspaulding.com/ Ben Spaulding

    @Tom Watson: Have a look at Christian Metts’ Typogrify library. He recently added support for Titlecase.py.

  • Todd E. Bryant

    Sweet! I’ve been looking for this…

    Two questions: (1) If I pass in unicode strings, is any damage done to the unicode characters? (2) Is titling logic applied to the unicode characters where capitalization makes sense?

  • nerkles

    Please put this in the cheese shop.

  • http://muffinresearch.co.uk Stuart Colville

    @nerkles: Done – you can now get it from the cheese shop with:

    easy_install titlecase

  • http://honestpuck.blogspot.com/ Tony Williams

    Stuart,

    You might want to add my name is o'brien to you test, since at the moment it’s a fail :-)

    // Tony

  • http://www.network-labs.org Diederik

    Hi!

    handy util! i was wondering, would it be possible to add support for cases like WASHINGTON D.C.?

    the script has come very handy so far!
    best,
    Diederik

  • brian

    Seems like this script is missing things like 34Th Street (capitalizing the T)

  • george

    newlines (more specifically, all characters which satisfy “\s” in regex) are replaced by spaces.

    Otherwise, this is great! Thanks a ton!

    @brian, I really don’t think that that T should be capitalized. At least, I’ve never seen it that way.

  • george

    also, “word/word” becomes “Word/word”. I think it should be “Word/Word”.

  • http://muffinresearch.co.uk Stuart Colville

    @george: I’ve fixed the newlines issue in 0.5.1.

    The biggest problem with this script is that it becomes necessary to special case lots of edge cases. As a result of that I’ll probably look at providing some way for the user to easily add their own customisations.

  • michael

    Further to George’s point. Songs in the titles of medleys are customarily joined with a slash, so it not a particularly esoteric case. “dance with me/let’s face the music and dance” fails. A disappointment to me, ‘cos song-titles was what I was hoping to use it for.

  • Troy

    Titles with numbers in them fail (for example “HELLO 10 WORLD”).

  • http://muffinresearch.co.uk Stuart Colville

    @george and michael: The slashed case has now been addressed – see the version in the trunk branch.

  • http://muffinresearch.co.uk Stuart Colville

    @Troy: Based on the rule-set HELLO 10 WORLD will be untouched as it’s been explicitly capitalised.

  • Alix Axel

    Shouldn’t:

    SMALL_LAST = re.compile(r’\b(%s)%s?$’ % (SMALL, PUNCT), re.I)

    Be instead:

    SMALL_LAST = re.compile(r’\b(%s)(%s)?$’ % (SMALL, PUNCT), re.I)

  • Alix Axel

    Oh… m.group(0).capitalize(), Nevermind! xD

  • Alix Axel

    Fix for “Step-by-Step”: http://codepad.org/e671lRhl

  • Guest

    egrep -i ^ma?c /usr/share/dict/words:

    macaroon
    Macbeth
    Macduff
    MacDonald
    machine
    macro
    Mcintosh
    etc

Insert a tab character in vim when expand tabs is on|(0)

I have vim set-up to use spaces in place of tabs. Sometimes you need to use an actual tab e.g. editing a Makefile. Now whilst it’s possible to change settings so that tabs are used for specific files, a quick tip to remember is to simply type in insert mode:

Ctrl+v tab

That is Ctrl and “V” and hit the tab key, et voila you’ve entered an actual tab.

GNU screen: open tab in current working directory|(1)

A nice trick for having screen open a new tab in the same directory as the one you’re currently in. To use it add it to your .screenrc

# Open new window in current dir.
bind c stuff "screen -X chdir \$PWD;screen^M"
bind ^c stuff "screen -X chdir \$PWD;screen^M"

Hat tip: mteckert on SuperUser.com

Photos on Flickr

© Copyright 2004-13 Stuart Colville, all rights reserved. May contain traces of Muffin. Powered by WordPress. Hosting by Slicehost.com This page was baked in 0.504s.