Armin's post about Whitespace sensitivity in Ruby piqued my interest. It points out that in Ruby, foo[42] is different than foo [42] and that foo/bar is the same as foo / bar but different than foo /bar.

So I wanted to learn more about Ruby, and looked at a bunch of tutorials, finally ending up at Mitch Fincher's Ruby Tutorial with Code Samples, which had the right breezy pace with no, "a variable is like a box for your numbers" stuff in it.

But I had originally gotten to Mitch's page from a Google search for ruby puts gets. If you try it, you'll see that when you get to Mitch's page, a small box appears near the top, saying,

Welcome. You seem to have come here from a search engine. Your search words (ruby puts gets) are highlighted on this page for your reading pleasure.

I thought "nice," then I thought, "that looks familiar," then I realized it was almost exactly the box that appears at the top of my pages when you visit from a search engine (try it: batchelder white house adventure). In fact, it used the same colors. I looked at his page, and it used near-verbatim copies of my three Javascript files, though a few years ago I consolidated them into one.

I was amused, and wondered where else the code is being used. But the search engines are smart enough not to index comments in Javascript files, or names of Javascript files referenced in HTML pages, unless there's some tricky syntax I don't know about.

PS: about whitespace sensitivity: I've decided that phrase means a programming language needs tokens consisting of only whitespace in order to be parsed properly. Python and Ruby are whitespace-sensitive, and C is not, for example.

tagged: , » 3 reactions

Comments

[gravatar]
Blake Winton 9:20 AM on 28 Jul 2010

C isn't? Does that mean that "*a / *b; // */" would be the same as "*a/*b; // */" in C?

Later,
Blake.

[gravatar]
Ned Batchelder 9:45 AM on 28 Jul 2010

My definition was that a language was whitespace sensitive if after lexical analysis, there were tokens consisting entirely of whitespace characters. The fact that whitespace is needed to separate tokens isn't interesting, after all, "int i = 9" is different than "inti=9" too.

In your example, "/ *" is tokenized as "/", "*", and "/*" is the start of a comment, but there are no tokens that are purely whitespace.

[gravatar]
Blake Winton 10:05 AM on 28 Jul 2010

Okay, I think I get it now. Thanks for the clarification!

Later,
Blake.

Add a comment:

name
email
Ignore this:
not displayed and no spam.
Leave this empty:
www
not searched.
 
Name and either email or www are required.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.