Stringification

One of the greatest features of modern programming environments is also the most humble: ubiquitous stringification.

toString()

When I first worked in Java, I didn’t think much about the java.lang.Object.toString method. It seemed like a good idea, and made sense, but was sort of the low-tech sibling in the method list. It was hardly “computer science” to be able to turn an object into a string. It seemed almost like a patch over a language deficiency at first: “Why not just have a way to print objects?”

Gradually, I understood. The toString method is Java’s way of printing any object, and it’s the best way to do it. By allowing each class to define its own string representation, but through a common interface, Java gets the best of both worlds: callers have the power to query any object for a stringified version of itself, and implementers can use any techniques they want to stringify their data, all without adding a wart to the language.

What’s the big deal?

This may be one of those features you don’t miss until it’s gone. I didn’t get it until I went back to C++ after using Java and Python.

Being able to always stringify objects without a lot of rigamarole makes it much easier to use those object in “natural” ways. For example, when adding richness to log messages, it’s much easier to provide more information by just squirting objects into log messages. The easier it is to use objects like this, the more it will get done. If we had to call special-purpose DescribeMe() methods all over the place, or create strings manually from the object at hand, there’d be too many places where it seemed like too much trouble, and it wouldn’t happen.

Language support

As described above, Java has the java.lang.Object.toString method, inspired by Smalltalk’s asString method.

Python provides two built-in methods for dealing with stringification: __str__() provides the “informal” (or human-readable) string for an object, and is called by the str() built-in function and the print statement. __repr__() provides the “official” (or computer-readable) string for an object, and is called by the repr() built-in function and the backquotes.

Untyped scripting languages like JavaScript and PHP provide ubiquitous stringification as a feature of the language: objects are stringified as needed to make them the right type for the operation.

Don’t be lazy

The pitfall in this ubiquitous stringification is that if the language or environment provides an implementation of toString for all objects, you might not write one yourself, and that would be a shame.

The default implementations are boring, by necessity: what information could they use to print something interesting? Generally, the default implementations will give the class and address of the object.

Write your own implementation! It isn’t hard, and once you have it, you’ll use it all over the place.

What to do about C++?

C++ doesn’t have a built-in class hierarchy to provide the toString() interface. (Its proponents claim the lack of a built-in hierarchy as an advantage, because your classes have only what you want in them).

It does have the ostream class from the Standard Template Library, though, and ostream has inserters. “Inserter” is the fancy term for the << operator. The C++ equivalent of toString() is an ostream inserter for your class. This isn’t quite as good as toString(), since it only works with ostreams, and not in other contexts where you’d like a string. I suppose you could go whole-hog into the STL and use std::string by value to emulate Java and Python. I never have.

Inserters are not actually members of your class: they are functions (which can be declared as friends of your class). This may seem a little awkward, and I suppose it is. There is one advantage: you can define inserters for classes you didn’t write. This lets you customize the appearance of someone else’s objects, or lets you add on stringification after the fact to library objects you can’t otherwise change.

std::ostream &
operator << (std::ostream & os, CThingy & thing)
{
    return os << "thingy " << thing.GetUniqueId();
}

If you have your own object hierarchy, you can provide an inserter for the base class, and put yourself in a similar position to your Java and Python brethren: a boring base implementation that you’ll be tempted to rely on rather than implementing real stringification throughout the hierarchy. If your base class has a method like toString, you can always make the ostream inserter call it (or vice-versa!).

See also

Comments

[gravatar]
While I agree the insert notation is nice, I'm curious why you delve into it particularly. You can certainly write your classes with toString() methods, and it seems like they will be more general (don't necessarily need an ostream). What makes the inserter notation better?

thanks,
-emile
[gravatar]
If java does have a class or interface called Inserter, it would be nice to have a link to whichever api or apis have the documentation about the class.

I searched Google to try to find what an Inserter class is, but so far have been unable to find it. Any suggestoins, please e-mail me.

I thought the colorization of the words was a great idea. Although finding the search words as part of another word, was a little annoying.
[gravatar]
Maybe I was unclear: Inserters are C++ constructs, not Java. And they aren't called "Inserter", they are called "operator
[gravatar]
You may want to look at string streams if you want to use the dude_string; // dude_string is "10"

In this way using overloaded stream operators (>) you can convert your objects (and lots of other things) directly to strings.

You can also use this method to convert int's to and from strings, which comes in handy.
[gravatar]
Not sure what happend with that last post. Lets try it in html:


#include
#include

using namespace std;

int i = 10;
string dude;

stringstream s;

// now you can use s as a string

s > dude; // dude will be "10"

[gravatar]
On top of the language support, I'd like to mention support by the IDE. Eclipse for instance creates a toString method just like it creates getters and setters.

Stringification comes in handy, when debugging - you don't have to inspect the object to see what's inside.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.