Ph: 34711608
Feb 2012 16 Thu

Syntax Highlighting for a Rails Blog

Sparing you the boring details of the holy language war between Python and Ruby, the Python world has one great thing going for it: Proper syntax highlighting libraries. While there are notable Ruby implementations like Coderay, even popular Ruby-based shops like GitHub rely on the elephant in the room: Pygments – which is, if you couldn’t tell from the name, based on Python.

Now, my blog is written in Ruby on Rails and runs on Heroku and, let’s put it this way, Heroku’s Ruby VMs and Python aren’t exactly sending each other Valentine’s cards.

However, there’s an unofficial API for Pygments, accessible over HTTP, that you can use from your Rails application to do the syntax highlighting for you. You just POST a blob of text to it and get back a blob of text with syntax highlighting classes applied.

With that in mind, you can totally go down the route of passing your blog post’s body to that service when somebody views a page on your blog, just like Austin Vance blogged about1. And while I adore the moving clouds, it’s not something I would advocate for performance and scalability reasons. Plus, as @hmans put it, you don’t really want to be directly dependent on a third-party service to render a page on your site2.

So I’ve done things in a slightly different way around here.

The Background

My posts are written in Markdown, rendered to HTML through Redcarpet 2 in a before_save callback whenever a post is created or updated. The syntax highlighting through the aforementioned unofficial Pygments API happens right after my Markdown prose has been transformed into HTML and stored in a separate database column along with the original, unhighlighted Markdown.

That way, I’m killing multiple birds with one stone.

First of all, displaying a page on my blog displays pre-generated HTML. It doesn’t slow down the request with calling third-party services or executing expensive regular expressions. While there’s a way to counteract many of these side-effects with caching, this approach also makes sure that future changes in APIs or syntax won’t change any of my old posts without me noticing. I try to be very attentive with my posts and old posts being a potential moving target isn’t an option.

Lastly, doing all the syntax highlighting server-side saves clients the execution of expensive syntax highlighting in JavaScript for example, using plugins like jquery-syntax, improving rendering times of your pages noticeably.

The Implementation

As I mentioned before, the transformation from Markdown to syntax-highlighted HTML happens in a before_save callback3 in my Post model.

class Post < ActiveRecord::Base
  # [..]
  before_save :generate_body_html

  protected

  def generate_body_html
    return if body.blank?
    markdown = Redcarpet::Markdown.new(
      Redcarpet::Render::HTML.new(:hard_wrap => true),
      :no_intra_emphasis => true,
      :autolink => true,
      :fenced_code_blocks => true)

    self.body_html = Redcarpet::Render::SmartyPants.render(
      SyntaxHighlighter.new(markdown.render(body)).to_s)
  end
end

Basically, a new renderer is instantiated with a variety of my preferred options (fenced_code_blocks being the most important, more on that in a bit). The contents of the post (stored in the body attribute) are then rendered to HTML, passed to the to-be-disclosed SyntaxHighlighter class, and, lastly, the result of that gets the SmartyPants rules applied.

The whole shebang finally gets assigned to the body_html attribute, which will henceforth contain the fully-baked, ready to display HTML code of the post body.

Now, for the meat of it, let’s take a look at the SyntaxHighlighter class.

require 'net/http'
require 'uri'

class SyntaxHighlighter
  PYGMENTS_URI = 'http://pygments.appspot.com/'

  def initialize(html)
    doc = Nokogiri::HTML(html)
    doc.search("pre > code[class]").each do |code|
      request = Net::HTTP.post_form(URI.parse(PYGMENTS_URI), 
                  { 'lang' => code[:class],
                    'code' => code.text.rstrip })
      code.parent.replace request.body
    end
  end

  def to_s
    doc.search("//body").children.to_s
  end
end

I’m using Nokogiri (in my book still the de-facto standard in HTML parsing) to disassemble the post’s HTML structure (remember that we’re passing in a copy of the body that has already been transformed from Markdown into HTML) to find blocks of code wrapped in <pre><code></code></pre> tag pairs, which also have a value assigned to the class attribute of the <code> tag.

And this is where the fenced_code_blocks option to Redcarpet comes in. With this enabled, you can write your code-blocks like this in Markdown:

``` ruby
class Foo
end
```

That means, no more 4-spaces indentation for every line and, more importantly, you can pass along a value for the class attribute to assign to the <code> tag it’s wrapped in, specifying the programming language of the code to highlight.

But back to the code.

Once those code blocks have been located and extracted along with their classes, they’re posted to pygments.appspot.com individually (because you could have multiple code blocks in a post, each in a different programming language) and the result is inserted back into the document structure, replacing the previously extracted code block.

Lastly, the to_s method simply returns the document converted back to a string, which, as you recall, then gets saved in the body_html attribute for later use when displaying a post.

Not to beat a dead horse here, but all of this only ever happens when you create or update a post. It never happens when a post is merely viewed.

For styling, all the default Pygments styles rules apply. If you’re happy with the defaults, Trevor supplies a default.css in his pygments repository. In addition to that, more styles are available all over the web. Also, Favio Manriquez Leon has a way to preview the built-in styles on his blog.

Conclusion

So there you have it. A way to re-use Python’s great syntax highlighting for your Heroku-powered Rails blog in a way that will survive getting fireballed.

Sadly, I couldn’t really tell when, as his post doesn’t seem to have a date. But it must’ve been some time after July 2011, since he references a Railscasts episode from that time. ↩

Even when it’s being run by the commendable Trevor Turk. ↩

Brush up your callbacks knowledge in Rails with this guide ↩

Feb 2012 14 Tue

Proper Language Detection on the Web

A couple of days ago I quipped:

Which, apparently, struck some sort of nerve as it got favorited and retweeted above my usual average. And that prompted me to explore this topic a little more from a technical standpoint.

But, first, a little bit of history.

What originally angered me was the fact that some websites insist on offering me a German language experience of their site, despite the fact that I deliberately switch all of my computers, phones, televisions, and vacuum cleaners to use the English language, because, well, that’s how I roll.

The Difference Between Shopping and Information

Shopping sites like Amazon have every right in the world to send me to their local offerings if they detect that I’m coming from a certain country where they have a local presence in. (More on that detection in a bit.)

Friendly reminder to shop locally

Friendly reminder to shop locally

Yet, they don’t. They simply offer a friendly banner at the top of Amazon.com that informs me of a local offering I might be interested in. (And I am.)

Compare and contrast that with sites like Softonic, which slap me in the face with a modal dialog – requiring an immediate decision on my part – suggesting I use their site in German instead of the version that I chose to access and which matches the language of my browser.

Modal language dialog at Softonic

Modal language dialog at Softonic

While I appreciate the fact that they may be employing a CDN1 for speedy downloads from a local server, this is an implementation detail independent from the actual language used on their site, which I don’t really happen to care about to be in the language of my geographic location (as opposed to the language of my browser and operating system).

(And I’m not trying to single out Softonic here. This forced language detection based on geographic location instead of browser language seems to be common practice. Softonic just happened to be the site that prompted the tweet embedded above.)

Odd Birds and Travelers

I may well be the odd bird in this case in that my geographical location does not match the preferred language set in my browser.

But what about travelers?

Everyone traveling abroad is probably in the same boat as I am. Potentially even worse, given that their foreign language skills may not be as fluent and a foreign language version of a site (that is not a shopping site) may be of even less interest.

Technical Hurdles

Detecting the geographic location of a visitor to your site is a non-trivial task. It usually involves some kind of third-party service doing a geolocation lookup, which means a round-trip to at least one other server before you can even render the first page view to your visitor, delaying the often crucial first impression the visitor gets by a couple hundred milliseconds.

While this is a necessary evil for DRM-plagued services such as Netflix and targeted online advertising to use geotargeting as part of their offering, using a geolocation for the display language of your website is just plain overkill.

Why? Because every modern browser sends its language preference (or, more appropriately, that of its user) along with every HTTP request being made in the form of the Accept-Language header that is part of the HTTP/1.1 RFC.

The Accept-Language request-header field is similar to Accept, but restricts the set of natural languages that are preferred as a response to the request.

It’s even possible to supply multiple languages with appropriate weightings:

Each language-range MAY be given an associated quality value which represents an estimate of the user’s preference for the languages specified by that range.

So without round-tripping to (potentially non-free) third-party services in order to find out where your visitor is coming from, you can simply use the Accept-Language header supplied by the visitor in most situations to make an even more informed decision what their language preference may be.

Lead by example

In Rails, the value of the Accept-Language can be found in request.headers[‘HTTP_ACCEPT_LANGUAGE’].

This is, for example, the value supplied by Chrome 17 when set to the English language:

    Accept-Language: en-US,en;q=0.8

If you switch it to German, it’ll send:

    Accept-Language: de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4

Basically, this list specifies German as the most desired language, US-English as the middle ground, and, if all else fails, any dialect of English will do.

Other browsers like Firefox and Safari supply similar values for Accept-Language, albeit not as detailed.

Great further reading on the topic of internationalizing a Rails application is the Guide on the Rails I18n API and Iain Hecker’s http_accept_language gem.

If you’d like to use this header in JavaScript and don’t have a way to pass it to the Javascript layer from your app (I heard those cases exist), there’s a helpful jQuery plugin from Dan Singerman called jQuery Browser Language, which gives you access to this header otherwise unavailable to JavaScript.

Conclusion

Don’t force people to use your site or web application in a language they didn’t request just because they happen to sit in a spot on this planet where people usually speak it. Don’t use geotargeting for language detection just because you can. Instead, make use of established practices such as the Accept-Language header to deliver a much better user experience and much more likely matching their expectations.

A Content Delivery Network, supposedly serving files from a geographically close location with better latency to your neck of the woods. ↩

Jan 2012 11 Wed

An External SSD to Boot My iMac

I’m using a Mid 2011 27" iMac at work. When I ordered it back in May of last year, the SSD option would’ve prolonged the delivery estimate by a whopping 4 weeks, which, for an impatient person, is a hefty trade-off, so I passed and ordered the internal 1TB SATA drive only.

But, as they say, once you went SSD somewhere, you’re not going back.

While the machine was definitely fast and usable for the first couple of months, I missed the speed of an SSD soon enough when switching back and forth between my MacBook Air and the iMac at work, especially at boot time and sleep/wake. So I certainly was more than happy when LaCie announced in June that their Little Big Disk would ship with Thunderbolt-connectivity and an SSD inside. This would allow me to get back onto the sacred SSD path while not making me jump through a plethora of suction-cup laden hoops in order to replace the internal drive of my iMac with an aftermarket SSD. (And that’s not even guaranteed to work in the first place, after all.)

Not being located in the land of milk and honey, for the longest time the Thunderbolt version of the Little Big Disk was available in Germany only in its 1TB and 2TB SATA configurations. Finally, though, the 240 GB SSD version went on sale in recent weeks and I got my copy a couple of days ago.

Little Big Disk Thunderbolt

Little Big Disk Thunderbolt

Despite being advertised as a 240GB disk, it actually consists of a striped set of two 120GB SSD drives inside its aluminum chassis. It does not have any on-board RAID intelligence though and simply resorts to Mac OS X’s built-in software RAID functionality. Using Disk Utility, you can reconfigure the Little Big Disk to work as a mirrored set (RAID-1) instead, giving you only 120GB of capacity. Or you can opt to simply use the two drives as individual disks.

To connect the chassis to your Mac, you’ll need to bring your own Thunderbolt cable, since LaCie does not include any cables with its boxes1. Also of note is the fact that the Little Big Disk needs its own power brick, as, apparently, the 10W of supplied bus power through the Thunderbolt bus is not sufficient to power it2.

Of course I was curious about the performance of the cute little box, even though the theoretical speeds of 20Gbit/s of the Thunderbolt bus would hardly be the limiting factor. When I had everything wired up to one of the two Thunderbolt ports of my iMac, I fired up the Black Magic Disk Speed Test and got quite satisfactory results of 250MB/s write speed and 480MB/s read speed, respectively. My Late 2010 MacBook Air, by comparison, clocks in at roughly 100MB/s write and 140MB/s read speed on its internal SSD.

Speed Test Results of the 240GB SSD Little Big Disk

Speed Test Results of the 240GB SSD Little Big Disk

Quite happy with the results I fired up SuperDuper and cloned my internal SATA boot drive onto the Little Big Disk, the former certainly being the limiting factor in this operation.

After setting the external SSD as the new boot drive I rebooted my iMac and then designated the internal SATA drive to be the Time Machine backup drive. At 1TB capacity, it’s giving me more than enough room to store both a significant Time Machine history as well as my iTunes library files on it3.

Given the Time Machine backups I kept my setup on the default stripe-set configuration, giving me the full 240GB of SSD capacity.

At this point, applications launch with a single “bounce” in the Dock, running test suites of my various Rails and Python projects are crazy fast, and, despite the speed increase, my workplace didn’t get any noisier due to the almost complete silence of the Little Big Disk.

Conclusion

As an aftermarket option to speed up an existing system I can highly recommend the Little Big Disk Thunderbolt 240GB SSD. While it does have its price tag ($900 for the disk and $50 for the cable), adding a 256GB SSD built-to-order option to a new iMac (in addition to the default internal 1TB drive) also sets you back $600, so it’s not that much more. Plus, you could theoretically boot any Thunderbolt-equipped Mac from the external SSD should your iMac give you hardware trouble one day.

This meant another trip to the local Apple Store for me, getting one of Apple’s own $49 Thunderbolt cables. ↩

Yes, the Little Big Disk sits in the “Mobile Hard Drives” category on LaCie’s website. ↩

Since this is the work iMac and since most of my music can be streamed via iCloud or internet radio stations, my needs for a bigger media drive are pretty much zero. ↩

Jan 2012 09 Mon

Dude, Where Is My App?

In the last week, a lot has been written about (mis-)management of iOS multitasking and the potentially poor user experience of using the iOS app switcher – the one you get to by double-clicking the home button on any iOS device since the release of iOS 4. This was originally started by Fraser Speirs and was continued, among others, by John Gruber both in writing and on The Talk Show.

This got me thinking about about an annoyance that I’m running into more and more often lately, which is the management of open applications on Mac OS X 10.7 Lion.

In his article, Fraser states:

[T]he iOS multitasking bar does not contain “a list of all running apps”. It contains “a list of recently used apps”. The user never has to manage background tasks on iOS.

On Mac OS X, by contrast, the application switcher, invoked with ⌘-Tab, does contain a list of all currently running apps, and running apps only. And it’s been that way since the release of Mac OS X 10.3 Panther.

The Application Switcher on Mac OS X

The Application Switcher on Mac OS X

However, with the advent of Lion, Mac OS X inherited some of the process management features that debuted on iOS before. Namely, suspend and resume, auto-saving, and automatic termination are now part of your every day life with Lion, just like they have been for so long on iPhones and iPads.

On iOS, starting from its very first release 5 years ago1, applications get suspended from execution as soon as the user hits the home button. When you tap the application icon again, the app resumes where it left off (assuming it was a professionally developed app implementing the respective APIs) and it’d seem as if it had continued to run in the background. (Which it hasn’t, most of the time.)

When iOS is in a memory crunch, it will go through the list of suspended apps and purge them from memory, freeing it up for the active task at hand, such as a memory hungry game. This process is better shown than explained, and Fraser does a great job with a follow-up video.

Compare and contrast that with some of the new process management happening on Lion.

Apple applications like Preview and Quicktime, much to the annoyance of some of its users, use suspend and resume to the effect of having all your documents open again when you relaunch the application, be it a full system reboot or just quitting and relaunching the application.

Likewise, there’s no need to explicitly save a new document (or existing document, for that matter) to make your changes survive application relaunches. The system takes care of that behind the scenes.

Now, let’s talk about Automatic Termination. As outlined by John Siracusa in his epic Lion Review on Arstechnica:

Lion will quit your running applications behind your back if it decides it needs the resources, and if you don’t appear to be using them. The heuristic for determining whether an application is “in use” is very conservative: it must not be the active application, it must have no visible, non-minimized windows—and, of course, it must explicitly support Automatic Termination.

This description is apt (and it wouldn’t be by John Siracusa if it weren’t) — it does neglect, however, a few of the user experience niggles that come with the system’s application of the Automatic Termination feature.

If we recall, the app switcher on iOS allows you to switch between recently used apps and you need not care in which state those apps are. Backgrounded, suspended, not running, it’s all the same. If you tap an icon in the app switcher the app will become the active app. Either resumed from flash memory or started afresh.

As a heavy user of ⌘-Tab for application switching on Lion, however, you will encounter the situation where switching back and forth between applications will leave you with a double take wondering why on earth the application you were just switching from went the way of the dodo.

Preview.app for example, the all-in-one go-to application for media viewing on Mac OS X, naturally supports Automatic Termination because it’s one of the system’s default apps and as such a showcase for the implementation of new technologies2.

So assume for a moment the hypothetical scenario that you’re preparing a document to be mailed to a colleague. You know you’ve already exported a PDF of the document but since it’s a temporary document you failed to properly file it and placed it somewhere temporary.

While Preview is open you switch to it and notice that your PDF copy of the document is not open anymore (neither is any other document). Hastily you switch over to the Finder by clicking once somewhere on your desktop to bring the Finder forward and check if you placed it on the desktop itself. But you didn’t, so you decide to go back to Preview by hitting ⌘-Tab and use the “Open Recent†menu item to go looking for your document some more.

Or so you thought.

Of course, with Preview being intermittently in the background with no open windows, there’s no way to get back to Preview with ⌘-Tab because it just disappeared from the app switcher, by means of being automatically terminated by the system. Since nobody I know keeps Preview in the dock, there’s no way to relaunch it that way either. Frankly, the application just completely disappeared from the user interface without clear means to launch it again.

The end result is that you pretty much end up using the Apple menu ( › Recent Items) to relaunch Preview, launch Preview from the Applications folder again, or use one of your favorite application launchers (and you know it’s Alfred) to do the same.

Undeniably, the example is very much made up. But you can certainly see how the current behavior of the Mac OS X ⌘-Tab switcher (and Dock) can catch users off guard. Neither the Dock nor the ⌘-Tab switcher show applications that you had consciously launched and which were then purposefully quit by the system without your awareness.

And Lion, which is at version 10.7.2 as of this writing, doesn’t necessarily perform automatic termination only when system resources get scarce, either. In fact, here’s a little demonstration video where the application being quit is the only one running (besides the Finder).

[ http://player.vimeo.com/video/34711608?title=0

I do appreciate the majority of behind the scenes features in Lion and I made a conscious decision to not disable several of the new defaults like “upside downâ€-scrolling and restoring windows when quitting and re-opening apps. But this particular area sure has some room for improvement.

Nobody knows what Apple has3 up its sleeves for future point updates of Lion or even Mac OS X 10.8 and I’m curious what’s happening to both iOS app switching and Mac OS X app switching in future releases.

Pretty much to the day, as the original iPhone was introduced on January 9th, 2007. ↩

Save, maybe, for a skeuomorphic UI. ↩

Let’s not get into the other debate of the last week and call it “Apple have”. ↩

Jan 2012 06 Fri

The Nikon D4

Rob Galbraith:

Nikon has announced the D4, a new pro digital SLR that features a 16.16 million image pixel full-frame CMOS image sensor, 10fps top shooting rate (or 11fps with restrictions), a standard ISO range of 100-12,800 (and an extended range of 50-204,800), a revised 51-point AF system capable of autofocusing with f/8 lenses, all-new 91,000-pixel RGB ambient/flash metering sensor, twin memory card slots (CompactFlash and the emerging XQD format), EXPEED 3 image processing, 1080p video capture with audio monitoring and optional uncompressed video export through the HDMI port, built-in Ethernet, a new EN-EL18 battery, in-camera HDR and timelapse creation, all in a dust and weather sealed magnesium alloy body.

Looks like a great upgrade, available in February of 2012 for a retail price of $5,999.

Anyone interested in a great condition D3?

© 2012 Patrick Lenz


You are viewing a mobilized version of this site...
View original page here

Mobilized by Mowser Mowser