The scalability of programming languages

Saturday 24 October 2009This is nearly 15 years old. Be careful.

Tabblo is written on the Django framework, and therefore, in Python. Ever since we were acquired by Hewlett-Packard two and a half years ago, there’s been a debate about whether we should start working in Java, a far more common implementation language within HP. These debates come and go, with varying degrees of seriousness.

The latest wave of “Java?” debating is upon us, and Mike Vanier’s The Scalability of Programming Languages has been entered into evidence. I found it a very interesting read, especially about static vs. dynamic typing. At one point, Mike says,

What typically happens in large projects written in these languages is that extensive unit tests are written to catch type errors as well as logical errors ...

I think Mike meant this as a negative, but I don’t see how it is. Extensive unit tests are a good thing, especially since they catch logical errors as well as type errors. The static type people either don’t have such tests, in which case nothing is catching their logic errors, or they do have such tests, in which case they didn’t need the static type checking in the first place.

Static type adherents claim that their type declarations give them both documentation of what’s expected, and automatic checking of code. But it only gives you a small amount of either.

For example, a parameter to a function has to be a string, so you declare it as String, and the compiler can guarantee that it is a String. But that’s just one small aspect of the rules about the parameter. Can it be NULL? Can it be empty? What’s it supposed to represent? An IP address? Can it be a wild-carded IP address? Can it be a comma-separated list of such addresses?

The questions beyond “String” go on and on, and static type checking gives us help with none of them. There’s the temptation to slice the universe ever more finely to get the type system to carry some of this information. So you’ll end up with IpAddress types, and WildcardableIpAddress, and so on. Those are good things, since you will likely have methods on IP addresses that you want to perform, so building classes will help. But there are always distinctions between instances that can’t be expressed in the type system. The only way to get at them is at run time. You can decide which run time you want to find them: in tests or in real use. Tests are the better answer.

The rest of the essay is interesting, especially Mike’s postscripts where his changes of viewpoint are recorded. It’s worth a read, if only for its exposition of the considerations that go into programming language design. He doesn’t get caught up in shallow issues like syntax, but gets at the deeper factors in programming languages that affect the outcome of projects that use them.

Comments

[gravatar]
For me type checking really turns out to be more spell checking than anything. When I find a problem in my python code which would have been caught in C++/Java etc its almost always that I typed the name of something wrong, not that I have a String in a variable that is supposed to be an Int. This is why I have settled on using an IDE with good autocomplete.

Also I find python is a lighter language than C++ in a physical sense. Because I know longer have several long breaks during the day while I want for compiling/linking, I eat less...
[gravatar]
Wow, Brice, I never would have thought of the connection: compile implies break implies snack implies unhealthy! Python FTW!
[gravatar]
Our perspective on the choice between the two languages has become -
a) Python maximizes the productivity of our 'good' programmers.
b) Java minimizes the damage that poor programmers can do to a system.

While working on a large project with a dozen different people of _vastly_ different styles and abilities, I'm extremely thankful that the various walls exists between packages...
[gravatar]
> "... or they do have such tests, in which case they didn't need the static type checking in the first place ..."

Type checks are a universal constraint, tests are a particular constraint - they aren't the same kind of thing, they don't give the same kind of guarantee.


> "... and static type checking gives us help with none of them. There's the temptation to slice the universe ever more finely ..."

Rather than "static type checking gives us help with none of them" your following words seem to suggest that static type checking gives as much help as you make it give.

> "But there are always distinctions between instances that can't be expressed in the type system. The only way to get at them is at run time."

Please show an example of that.
[gravatar]
@Isaac, you want an example of a distinction between instances that can't be expressed in the type system? Any state in your object is an example of this. The simplest example: you can't prevent divide by zero errors by using the type system. The distinction between zero and non-zero is not expressed in the type of the value. You have to run the code to determine if a divide by zero can happen or not. Another common solution is to use an assert, which just moves the problem from a division exception to an assertion exception, but still happening at runtime.

Any state that you hold in your objects is a distinction that isn't represented in the type system, and therefore is beyond the reach of static type checking to prove correct. Running the code is the only way to know whether the state in your object conflicts with the code, an is therefore a bug.

Automated testing is the best way to pick up where static type checking leaves off. Since static type checking can't cover all possibilities, you will need automated testing. Once you have automated testing, static type checking is redundant.
[gravatar]
Incidentally, as you're talking about Java (rather than some more exotic language) and as you're talking in the context of HP wouldn't it be appropriate to talk about Esc/Java and JML?
[gravatar]
As a practicing Python and Haskell programmer who has dabbled in statically typed languages such as Java and C, I can definitely say: mainstream type systems are selling you short. They are not giving you the full power that is possible in an expressive type system. When I want a type system, I want type inference to cut down on long type declarations and I want light-weight data structures so that I'm encouraged to use different types. I want my type system to not let me pass NULL or a string; I want my language to be expressive enough to say that in the type w/o imposing undue boilerplate.

When I write Python, I write tests to test logic. I don't want to write tests for my boilerplate; I don't want to write tests because I need to get 100% coverage on my scaffolding. Nevertheless, I have to, and my code is improved because of it, because Python doesn't have compelling static analysis tools, but it's not what I want.

Cheers,
Edward
[gravatar]
> you can't prevent divide by zero errors by using the type system

"By default, ESC/Java checks for violations of implicit preconditions of primitive Java operations, such as accessing an array out-of-bounds, dereferencing null pointers, dividing by zero, etc."

> Automated testing is the best way to pick up where static type checking leaves off.

I presume you mean repeatable testing - running hand written test code - rather than automated testing like "Check ’n’ Crash" ;-)

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.123.4597&rep=rep1&type=pdf

> Once you have automated testing, static type checking is redundant.

I guess somethings missing from that flat assertion, something about absolutely complete test coverage.

You seem to be fixed on a test versus type check dichotomy - to me it looks like an engineering trade-off, for somethings static checks are cheap and easy, and for other things static checks involve too much work and cleverness.
[gravatar]
In the context of Java vr.s Python, your comments are correct. However, there are statically typed languages that do allow you to express, for example, "this string can not be null" or "this string must be a valid IP address". Ocaml and Haskell are two such languages.

Also, there is a difference between *fewer* tests and *no* tests. Unit tests have a non-zero cost, both to create and to maintain. It's not that Ocaml and Haskell need no unit tests, it that they need significantly fewer unit tests, precisely because things guaranteed by the type system don't need to be tested.
[gravatar]
It seems Issac is assuming that on top of logic tests you also have to write type checking tests, maybe in theory, in practice you don't and you shouldn't.

You don't have to because asserting (foo > 0) or (bar.isReady()) not only covers the logic, it also covers the interface/type in a single step, you don't have to write tests to check the type of foo or bar.

You shouldn't because in Python you should be checking that the interface suffices, not what type the object is, remember duck typing.
[gravatar]
@Ken Whitesell:

Static typing is not a cure for incompetence - in fact, it's often an amplifier. As Damien Katz put it, it's like breath mints for a drunk.

Bad programmers should be fired and good programmers should be give tools that allow them to be productive.
[gravatar]
Yes, there are much more advanced type systems than Java's, but there are two things holding them back.

One is that, though they're more powerful than Java, they still aren't powerful enough to check everything you'd like to check. For example, you may have a function in Haskell: "add :: (Num a) => a -> a -> a", but all that can do is constrain the types, not the values; "add 2 2" could evaluate to 5 and it would still pass type checks. You need behavioral testing to verify that "add 2 2" evaluates to 4.

The other is that more power in the type system ultimately runs into a diminishing-returns problem, where each advance in the type system catches a smaller group of errors at the cost of a larger amount of programmer effort. And there can't be a type system that'll catch *every* error, of course, because that would require a type system which can solve the halting problem for programs written in your (presumably Turing-complete) language.

Unit tests strike me as a better solution to this issue, and as the type system becomes more powerful (and hence more complex) unit tests look even better. It's also worth noting that I don't write tests to do things like validate the types of arguments passed into my (Python) functions and methods. In a statically-typed language, one writes code and assumes that others will use it as documented -- if they don't, the compiler will throw type errors at them. In Python, I do the same thing and trust that Python will throw a TypeError at you if you mess up. The fact that you don't see that error until running unit tests is, to me, not an issue; you need to run the tests regardless of the type system of your language, so no matter what you'll see the error before you try to deploy the code.

I do write tests to verify certain trickier constraints (values which must be in a certain range or set, for example), but I look at tests like "value is in some_acceptable_set" as much simpler than setting up corresponding algebraic types and propagating them throughout my program (since, if I'm not careful, a language like Haskell can infer a more general type than I wanted). I don't write tests to verify that a string is a string, though, and tend to look with disapproval on people who do (sadly, I see a lot of that, usually from people who are coming to Python from statically-typed languages and haven't yet gotten over their paranoia).
[gravatar]
I like python and typed languages (java is ok, though it's stuck in 1999).

That being said, I think the inherent documentation of statically typed languages is a huge time saver. If I have a method that takes an interface IFoo as a parameter, I can go look at interface IFoo and know exactly what kinds of methods can be called. This is good both for initial writing of the code, other coders using my method later, and other coders maintaining my method later.

If you don't have a type on the parameter, the comments on the method have to describe every attribute the method is expecting to exist on the class, and those comments have to be kept up to date with changes in the method. That's a headache that is solved nicely by having a type on the parameter. What can be called is then clearly defined by the type.

Of course, most of my professional experience is in typed languages, and only small parts of it have been in python, so maybe in a full-on python house, I'd see how things could be different.
[gravatar]
@Nate

> If you don't have a type on the parameter, the comments on the method have to describe every attribute the method is expecting to exist on the class, and those comments have to be kept up to date with changes in the method.

Ouch! How is this a bad thing?

Personally I find the debate that types must be declared to be tiring. Not that it isn't useful but you can also work perfectly well without it. It's even more tiring considering how static languages like Java or C# are making really hard for the programmer to perform run time introspection, meaning that in fact it prevents the programmer from doing something rather powerful. It goes into your way more than dynamic languages.

What I think is most a problem these days is how I see too many Java programers incapable of writing software without their favourite IDE (Eclipse being number one). They seem to be impaired with the tool itself which is truly worrying in my mind. This isn't something I have seen too much in the Python community yet though arguably tools like Pydev could lead the way :/
[gravatar]
I agree with Nate. While I love Python and I love unit tests, I find that the static typing in languages like C, Java, and ActionScript 3 make it very easy to get a grasp of what's going on when I'm reading new-to-me source code. I prefer to write in dynamic languages, and if it's a codebase I'm familiar with then I'm fine with reading the code of dynamic languages. But when I'm reading new code (even in Python) in a dynamic language, I frequently find myself having to stop and carefully peruse the implementation of the code, rather than just "reading the intent", because it's just not always obvious exactly what's going on - the code might be very clean and easy to read, but the flexibility means that there are more possibilities to consider.

That's one reason I like AS3 quite a bit - it's a closure-capable dynamic language where you can type something as Object and leverage duck typing in those infrequent occasions when it's appropriate to do so, but for the vast majority of code I write, the inputs and outputs are straightforward, readable, statically-typed parameters and return values.

(Or maybe this is all just a matter of what I'm used to.)
[gravatar]
I'm happy to argue that Python doesn't scale because it lacks a static type system, I'm just not sure it is actually true. I know its true in Ruby, but that has more to do with global variables.

Once I get to a project that is more than 20 or so classes big I really start to miss having a 'real' IDE like Eclipse that can refactor and generate code as well as allow library introspection with a press of 'F3'. pyDev comes close, but it doesn't have enough information to be as useful as Eclipse for Java. My biggest reason for this is unit testing: if I change a method/field name etc I want to spend as little time as possible rewriting the tests.

Python is the first dynamic language where that tug for a real IDE and faster code doesn't make me re-implement in Java within the first week of development. It is usually fast enough and the coding conventions mean that bouncing through other peoples code is as easy as it can be.

Years in to my Python love affair and I'm still not entirely convinced that runtime introspection is as useful to me as static typing. What I really want is Java without the bullshit and braces, a little bit of list comprehension and closures for good measure.
[gravatar]
@rgz "It seems Issac is assuming that on top of logic tests you also have to write type checking tests..."

No.


@Tom "I'm happy to argue that Python doesn't scale because it lacks a static type system, I'm just not sure it is actually true. ... Once I get to a project that is more than 20 or so classes big ..."

Without static type checking, commercial Smalltalk applications with 1000's of classes are built, put into production, and maintained through multiple versions: for example-

A very large Smalltalk application was developed at Cargill to support the operation of grain elevators and the associated commodity trading activities. The Smalltalk client application has 385 windows and over 5,000 classes. About 2,000 classes in this application interacted with an early (circa 1993) data access framework. The framework dynamically performed a mapping of object attributes to data table columns.

Analysis showed that although dynamic look up consumed 40% of the client execution time, it was unnecessary.

A new data layer interface was developed that required the business class to provide the object attribute to column mapping in an explicitly coded method. Testing showed that this interface was orders of magnitude faster. The issue was how to change the 2,100 business class users of the data layer.

A large application under development cannot freeze code while a transformation of an interface is constructed and tested. We had to
construct and test the transformations in a parallel branch of the code
repository from the main development stream. When the transformation
was fully tested, then it was applied to the main code stream in a single
operation.

Less than 35 bugs were found in the 17,100 changes. All of the bugs
were quickly resolved in a three-week period.

If the changes were done manually we estimate that it would have taken
8,500 hours, compared with 235 hours to develop the transformation
rules.

The task was completed in 3% of the expected time by using Rewrite
Rules. This is an improvement by a factor of 36.

from “Transformation of an application data layer” Will Loew-Blosser OOPSLA 2002

Tooling matters. Tooling matters a lot.
[gravatar]
@Parand - ahhh, the familiar voice of one living in an ivory tower.

Type checking is only part of the scaffolding provided by the full Java (& J2EE) environment. If type-checking were the only difference, I would agree with you. But it, combined with features like *absolute* control over access to member variables, and you *can* prevent things like "divide by zero" errors when you can define that zero *can't* be assigned to a particular variable.

Regarding the comment referring to "bad programmers" (which I didn't say - the term I used was "poor" - and there is a difference), there are numerous resource constraints that exist in the real world today. Not every company can open up their wallets and pay 6-figure salaries for top-notch talent to waste their time on grunt-work data processing. There's a lot of stuff that needs to be done in a corporate environment that just wouldn't keep the interest of a truly creative developer.

Not everyone works on "web app development" or "airline scheduling" or "online social networking". There's still a huge percentage of our country's business and financial infrastructure that's built on COBOL and the data that passes through those systems. And those systems need people that can "keep the lights on" at a price a company can afford to pay.
[gravatar]
I dunno COBOL but if static typing means Java/C/C++/C#/VB.NET (as it does for most people) then I'd hope we agree that it is not worth it.

Not only not all of them support generics, all of them support and to a large degree require casting, the bane of static typing.

I'm writing a mobile app in java and containers are a pain, HashTables are painful, generics could help there if J2ME had them, but they won't help you with forms, Form instances accept their items in an Item[] argument, an Item[] can accept Item instances or subclasses, usually subclasses like string labels or TextField instances, except you can't call methods of TextField in an instance without casting it from Item to TextField, So *I* have to inform the IDE what methods an Item support so the IDE can tell me what methods I'm allowed to call.

The very oficial Wireless Messaging API accepts sources dynamically, based on the protocol of the connection URL string, it returns a connection instance specialized for that protocol, clever, but I have to yet again cast the instance to the class I'm expecting, again having to tell the compiler what methods I'm going to use so it let me use them.

Happens again in an ubiquitous anonymous class that I subclassed from Thread so it let's me write asynchronous closures abstracting the code that displays a screen with a waiting animation and a cancel button. If I want to jump to the parent screen I just have to do it with parent.load(), if I want to go to the parent's parent screen, that won't do.

The interface doesn't know that parent has his own parent, parent.parent.load() fails, growing a parent.loadParent() method fails too, what I need is to cast the parent to it's own class so I can use the methods/members that I know are there.

One option would be overriding the constructor of the anonymous class, specifying the parent's class explicitly, but that's just an even more verbose way of basically telling the compiler what I can do so it lets me do it.

So I'm unconvinced that static typing makes life easier, if documentation is what you want python 3 has type annotations and that's all I need.

@Parand

I hope you are not talking about getters because that's unrelated to static typing, or are you proposing a NotZeroInteger class? Shryeah, I wanna see you doing anything non trivial with that.
[gravatar]
Surprised by the lack of emphasis on language verbosity. Various studies have found that errors/LOC is fairly independent of language, meaning you should default towards less verbose languages. (I suspect modern statically typed language IDEs somewhat compensate for this.) Of course, some concise languages (notably Perl and Lisp) achieve their concision by in effect in-line compression; many Perl "one-liners" are one-paragraphers that have been squashed, which is not at all the same thing.

I do crave better expression of program structure than is given by the function->class->module->package hierarchy in Python.
[gravatar]
Note that a number of major HP software products are coded in javascript - in particular their Service Manager (ticketing solution) for enterprises is almost 100% javascript running on a V8 engine. dyanmic languages certainly aren't foreign to HP...
[gravatar]
@rgz "or are you proposing a NotZeroInteger class? Shryeah, I wanna see you doing anything non trivial with that."

Something like verify program properties?
   subtype Index is Integer range 1 .. 10;
See "Using SMT solvers to verify high-integrity programs" (pdf)


@Sam Penrose Various studies have found that errors/LOC is fairly independent of language ... (I suspect modern statically typed language IDEs somewhat compensate for this.)

afaict the studies are old and (as you note) don't seem to take any account of which tools were used to write the code - as if working with IntelliJIDEA was no less error prone or no more productive than working with Nano!
[gravatar]
@Issac Gouy "subtype Index is Integer range 1 .. 10;"
That doesn't seem to be Java/C/C++/C#/VB.NET is it? Apparently it's Ada, so at least I hope you agree that those language are not as suitable to script GUIs as Python/Ruby/Javascript.

Let's say my mobile app is being written in Ada. There is a procedure that displays a catalog page, it accepts an integer which must not be less than 1 but the upper limit is unknown, it is taken from a database. Is Ada capable of expressing such constrains in its type system? I guess not, so let's just say that the argument is subtype PageIndex is Index openrange 1 .. or whatever syntax.

The calling code must be ready to catch the exception if the upper bound is exceeded, but now you must cast the Integer index to a PageIndex type, that casting operation itself is unsafe too so you must be ready to catch that exception too.

Two exception catching and a cast later you realize this is all pointless, the calling code is an OptionMenu procedure from which the user picks a page, the menu is built from the database so it will never offer an option not in the database (which is single user) so no casting and no exception catching is necessary.

If the database is multiuser you will need to catch the exception but not in the OptionMenu but rather in the the current screen. Which means unchecked exceptions which are by definition not type safe, Actually Ada doesn't have checked exceptions so it is not type safe either.

In the end the question is not whether static typing, specially in an obscure language, knows any neat tricks or not. The question is, is static typing a requirement for scalable programming? I think not.
[gravatar]
@rgz That doesn't seem to be Java/C/C++/C#/VB.NET is it?

If you'd have read so much as the first page of the paper you'd have seen mention of ESC/Java (which I've already mentioned in this discussion).

@rgz ...at least I hope you agree that those language are not as suitable to script GUIs as...

Have you used AutomationPeers to test .Net GUIs?

@rgz ...but the upper limit is unknown, it is taken from a database...

And is it a blob in the database or an integer with a particular representation? How many bytes?
integer'last
@rgz ...specially in an obscure language...

I guess you don't travel on commercial airliners.

@rgz The question is, is static typing a requirement for scalable programming?

Is run time type checking "a requirement for scalable programming"?
Not if you're a clever BCPL programmer.

The question is whether we can free ourselves from our own prejudices long enough to learn about techniques outside our own experience (both benefits and limitations).

ESC/Java2 workshop (pdf slides)
[gravatar]
BCPL? Are you calling me old fashioned? Wait no:

"The question is whether we can free ourselves from our own prejudices long enough to learn about techniques outside our own experience (both benefits and limitations)."

You are calling me old fashioned AND prejudiced. Please drop the ad-hominem attacks.

I'm giving yo a real problem, scripting a GUI in a cast-happy statically typed language where typing is turned on its head because your static analyzer knows less about the program than I, and you give me, a GUI test tool?

If *you* read the page about ESC/Java you'd see the "Limitations" section where it admits that it only works as long as the "property" (they should call them constrains, properties already mean something in C#/Python) are well specified and that it still fails a lot and that it is mostly only really useful at (potentially) detecting (potential) NullPointer and IndexOutOfBounds exceptions.

Geez why are you so excited about? It's almost as if you are arguing against ever having to run your own code, everything must be statically proved by the compiler.

No actually, the common echo in your posts is about using tools. It seems you are getting offended because someone is not using you favorite tool.

You know there are static analyses for dynamic languages right? Take it easy, no one is taking your IDE away ok?
[gravatar]
Whoa, guys! Everyone calm down. I'm losing the thread of this debate because it's all turned into snarky sarcastic asides. If you have a point to make, then simply make it. Something along the lines of, "My programming technology works like this ...," or "Yours is not as good as mine because it can't ..."

Personally, I'm skeptical about the possibilities of a type system fully encompassing the concept of (for example) non-zero integer because there's no way to make its operations closed over the type. That is, if I have two NonZeroIntegers, how do I subtract them in a type-safe way? Is there a programming system that lets me create an EvenInteger type in a convenient and powerful way? How about PrimeInteger? If so, I'd like to learn about it.

ESC/Java looks interesting, but isn't strictly about static type checking. It builds on a statically-typed language, adding other static analysis of the code to find problems. That is made possible by the static typing of the language, but doesn't really get us to a NonZeroInteger type.

I like the idea of strong checking of code before it is run, I really do. I'd be interested to try using a language that inferred types where possible, rather than Java's approach of having me explicitly tell the compiler everything it needs to know.
[gravatar]
@Ned

My point would be that you *can* create a "NonZeroInteger" & "EvenInteger" & "PrimeInteger" types in Java.

Convenient? No.

Easy? No.

Verbose? Most definitely.

But, if that's a requirement of your problem, it _is_ possible, and you can later work with instances of that object *knowing* that the appropriate constraints are satisfied.

For example, every life insurance policy has a property commonly referred to as a "payment mode". (Difference companies may refer to it by different names, but it always boils down to the same idea.) Briefly, in our case, it refers to the number of months of coverage that a payment provides. It can never be 0.

In that context, there's a very limited range of valid operations to be performed on the mode. You may:
1) Change it
2) Multiply it by a money amount
3) Divide a money amount by it

Not only that, but the range of values is also typically constrained.

(There might be one or two others, but I think you get the idea.)

Anyway, it make no sense to do anything else with this type - so why expose any operations beyond this well-defined list?

I'm currently working with a system that has hundreds of these types of business objects defined. Each with a specific set of operations and related types that are valid for use.

Those of you solving more general problems have different needs - granted. My comments only refer to certain types of business environments. Those of us solving these types of needs find it extremely useful to use a language where these types of safeguards exist. We *don't want* to be able to pass an arbitrary object of an undefined type to a method. And, that process of "explicitly telling the compiler everything it needs to know", also informs your *programmers* what *they* need to know as well. (Code always has a higher degree of reliability/accuracy than comments or external documentation.)
[gravatar]
@Ken, I can see that in some domains, static type systems can be very powerful, and I appreciate your approach of choosing the tools based on the needs of the problem. My main point here is that static typing can only solve part of any problem. Some problems, it can solve a larger part than others. And further, the way you solve the rest of the problem with static typing is the same way you solve the rest of it with dynamic typing.
[gravatar]
@rgz You know there are static analyses for dynamic languages right?

As you provide no example, the question is - Do you know any static analysis tools for dynamic languages?


@Ned ESC/Java looks interesting, but isn't strictly about static type checking. It builds on a statically-typed language, adding other static analysis of the code to find problems.

Yes the acronym stands for Extended Static Checker.


@Ned static typing can only solve part of any problem

My guess is that car seat-belts really don't improve the chance of not drowning when a car falls into a river - but that doesn't seem a compelling reason not to use a car seat-belt.

Divide and conquer is a standard problem solving approach.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.