![]() | Ned Batchelder : Blog | Code | Text | Site Whitespace in Ruby and searching for code » Home : Blog : July 2010 |
Armin's post about Whitespace sensitivity in Ruby piqued my interest. It points out that in Ruby, foo[42] is different than foo [42] and that foo/bar is the same as foo / bar but different than foo /bar. So I wanted to learn more about Ruby, and looked at a bunch of tutorials, finally ending up at Mitch Fincher's Ruby Tutorial with Code Samples, which had the right breezy pace with no, "a variable is like a box for your numbers" stuff in it. But I had originally gotten to Mitch's page from a Google search for ruby puts gets. If you try it, you'll see that when you get to Mitch's page, a small box appears near the top, saying, Welcome. You seem to have come here from a search engine. Your search words (ruby puts gets) are highlighted on this page for your reading pleasure. I thought "nice," then I thought, "that looks familiar," then I realized it was almost exactly the box that appears at the top of my pages when you visit from a search engine (try it: batchelder white house adventure). In fact, it used the same colors. I looked at his page, and it used near-verbatim copies of my three Javascript files, though a few years ago I consolidated them into one. I was amused, and wondered where else the code is being used. But the search engines are smart enough not to index comments in Javascript files, or names of Javascript files referenced in HTML pages, unless there's some tricky syntax I don't know about. PS: about whitespace sensitivity: I've decided that phrase means a programming language needs tokens consisting of only whitespace in order to be parsed properly. Python and Ruby are whitespace-sensitive, and C is not, for example. | |
Comments
C isn't? Does that mean that "*a / *b; // */" would be the same as "*a/*b; // */" in C?
Later,
Blake.
My definition was that a language was whitespace sensitive if after lexical analysis, there were tokens consisting entirely of whitespace characters. The fact that whitespace is needed to separate tokens isn't interesting, after all, "int i = 9" is different than "inti=9" too.
In your example, "/ *" is tokenized as "/", "*", and "/*" is the start of a comment, but there are no tokens that are purely whitespace.
Okay, I think I get it now. Thanks for the clarification!
Later,
Blake.
Add a comment: