Two weak typing problems

Thursday 9 August 2007

Recently, I had two demonstrations of the pitfalls of weak typing.

First, my son Max was working on a simple Flash game. He asked for my help fixing it, because the character would move left, but it wouldn't move right. His code looked (roughly) like this:

if (Key.isDown(Key.LEFT)) {
    guy._x -= "10";
}
if (Key.isDown(Key.RIGHT)) {
    guy._x += "10";
}

The problem here is that the _x attribute is an integer. When subtracting the string "10" from an integer, the weak typing coerces the string to an integer, and the subtraction moves the character left. But when moving right, the integer is added to a string, which is a valid string operation, so the integer is coerced to a string, and the two strings concatenated. Setting the _x position to a string doesn't move the object, so the character doesn't move right.

Apart from the usual mystifying behavior of weak typing, the bizarre thing here is how two cases which seem completely symmetric in fact have very different results. Strings have a plus operator, but not a minus operator, so the helpful weak typing chose different paths for the two cases, resulting in the strange left-but-not-right bug.

Changing the "10" constants to integer 10's fixed the problem, of course, since it meant that all operations were the expected integer operations.

The second example was in some JavaScript code designed to speed up a slow calculation. The cache is a map from strings to lists of objects, but the calculation could return nothing, which was also important to cache, so a string object '-' was inserted in its place:

var answer = this.cache[question]; 
if (!answer) { 
    answer = long_expensive_calculation(question); 
    if (!answer || (answer.length == 0)) {  
        this.cache[question] = '-'; 
        return null; 
    } 
    else { 
        this.cache[question] = answer; 
        return answer;
    } 
} 
if (answer == '-') {
    return null;
} 
return answer;

This code speeded up the calculations, but still took much longer than it seemed like it should. The cache had a really good hit rate (99%), so we only had to look at the path where the cache found the answer. But all it does is look up a value in a hash, compare the value to a string, and return the value. How can that take too long?

The answer lies in the weak typing of that equality check near the bottom. The answer from the cache is a list of objects. To compare that against a string, JavaScript converts the list to a string, then compares the strings. That string conversion was consuming all the time, and was completely unnecessary. If the answer wasn't a string to begin with, we didn't need to do the comparison at all.

Changing the comparison to:

if (typeof(answer) == 'string' && answer == '-') {

sped up the function by a factor of about 10.

BTW: this function is more complicated than it had to be. The simpler approach, which avoids the sentinel value and its string comparison, is:

var answer = this.cache[question];
if (typeof(answer) == 'undefined') {
    answer = long_expensive_calculation(question):
    if (!answer || (answer.length == 0)) {  
        answer = null; 
    } 
    this.cache[question] = answer; 
}
return answer;

I use Python, which doesn't do these sorts of magic conversions, but it also forces me to explicitly convert ints to floats if I want a float answer, which is also a pain. I'd kind of like a middle ground: implicit conversion among numeric types is ok, but not between numbers and strings.

Comments

[gravatar]
Ian Bicking 4:31 PM on 9 Aug 2007

I think you could argue that the problem with Javascript's handling of addition with strings isn't so much weak typing as misuse of an operator and overly aggressive coercion. When doing polymorphic programming, it is important that methods have a single conceptual meaning. Addition and concatenation really aren't that similar. For instance, the inverse of addition is subtraction, and it is commutative. Neither is true for concatenation. In comparison, if you mix ints and floats and rationals, it wouldn't be a big deal. Even rectangles could probably be added and subtracted, or sets. But not sequences like a string.

[gravatar]
Rene Dudfield 4:46 PM on 9 Aug 2007

PHP uses a separate operator for joining sequences.

To join two strings:
$a = "hello " . "world";

But it converts strings to numbers if you use + - etc.
eg.
$a = "1" + 2;

A string which is not a number turns into 0.

So for a python person... (and probably others) this php is confusing.

$a = "hello " + "world";

That will not make $a === "hello world" at all.

In javascript a string which is not a number turns into NaN.

Pythons behaviour with strings and the + - operators is almost as confusing.

In a lot of cases using separate operators for adding sequences is useful.

Separate operators for adding sequences is more explicit. However using the + operator to join strings is what is expected.

[gravatar]
Nate 6:02 PM on 9 Aug 2007

Implict is always worse than explicit, and this shows exactly why.

I really really hate the implicit conversion of strings to ints... it's an awful idea. The code is assuming way too much about what I mean.

Operator overloading is almost always confusing, and when you do it with two seemingly random types for some special purpose, it's just a bad idea.

At least with two types that are the same (thus, strong typing = good stuff), then you have some expectation that the user knows what they're doing... both objects either have a + operator or not.

But with implicit casting (weak typing), one thing could get turned into something else.

What if you have foo + bar, and bar can be implicitly cast into multiple types, each of which have a different meaning when used with the + operator on class foo? Awful.

[gravatar]
Rene Dudfield 6:28 PM on 9 Aug 2007

Having an operator + that is for adding numbers, and not joining strings makes the whole strings are numbers thing a little bit more ok. It's implicit, but expected.

Given to a newbie:
2 + "3" -> 5 is the expected result.

The original example in flash by a child programmer shows this (but doesn't prove it). I think more study would prove this is expected by newbies.

implicitly turning ints into floats is a little dangerous, but sometimes expected. The whole divide int by another int implicitly turning the result into a float is maybe 50/50 expected. Definitely not expected by most int using people, but expected by people new to ints/floats and people just expecting a number result... not a int result.

[gravatar]
Nate 1:50 PM on 10 Aug 2007

See, while it may be true that for a *new* new newbie, 2 + "3" -> 5 .... for anyone with even a rudimentary knowledge of coding, they'd look at that and immediately see 3 as a string. Thus 2 + "string" makes no sense. It shouldn't matter what the content of the string is....

If you got a string from somewhere and want to try to add it to something, first do some explicit conversion, verify that it is indeed convertible to a number, and *then* add the two ints together.

The code you write will be more robust, more legible, and more maintainable.

int AddToUIValue(int x)
{
try
{
return x + Converter.ConvertToInt32( textbox.Text );
}
catch ( ConversionException )
{
MessageBox.Show( "Text in box must be a number.");
}
}


Even if you're a newbie and don't do try/catch stuff:

int AddToUIValue(int x)
{
return x + Converter.ConvertToInt32( textbox.Text );
}

At least then the programmer is forced to realize there's a conversion there, and ConvertToInt32 should have documentation telling you it can throw and under what circumstances.

[gravatar]
Calvin Spealman 9:08 AM on 12 Aug 2007

Don't forget, in Python, you can have that middle ground.

from __future__ import division

Now, division, the most common case of converting int to float, implicitly converts ints. This will be default in 3.0, and we can use the // operator to do floor division, if we really want to.

[gravatar]
Peter Goodman 11:46 AM on 11 Dec 2007

Instead of doing:
if (typeof(answer) == 'string' && answer == '-')

Try doing this:
if(answer === '-')

This should negate the usefulness of the typeof and demands identity instead of equality. PHP has the same operator for when you don't want type conversions to be done. It is especially useful when you want to make the distinction between zero and false.

[gravatar]
Venkman 1:43 PM on 11 Dec 2007

I don't know, but to me the problem is not weak typing in itself. The problem is not realizing that even though variables are weakly typed, the values they hold always have a concrete type. That makes people think that 10 and "10" can be treated the same.

[gravatar]
attila 1:47 PM on 11 Dec 2007

these are not due to weak typing, but questionable js semantics...

[gravatar]
Leo 1:59 AM on 12 Dec 2007

This seems more of a problem with overloading the + operator than with weak typing in general.
Consider that the + operator is treated as actual different operations under different conditions: concatenation for strings, addition for numbers. Concatenation and addition are not at all analogous, especially since integers certainly can be concatenated.
The bug your son encountered is a result of the design error of overloading the + operator in Javascript (and, as a result, Actionscript). Had there been a dedicated concatenation operator (say, ..), this wouldn't be an issue.
So long as operators perform analogous functions over different types, weak typing works just fine.

[gravatar]
Terr 1:45 PM on 13 Dec 2007

Nate, the problem is that more often than not those values will be stored in weakly-typed variables which you can't "just look at" unless you fire up a debugger and set a breakpoint.

Of all of the potential pitfalls of weak typing, I think that the string/integer one is the most annoying and commonly encountered.

Add a comment:

name
email
Ignore this:
not displayed and no spam.
Leave this empty:
www
not searched.
 
Name and either email or www are required.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.