Dynamic Typing?
The colloquial usage of "dynamic typing" is only slightly related to
the issues discussed in the previous blog entry. The vast majority of
programmers think of "dynamic typing" as meaning a language that does
not require type declarations, what I call "non-type-declared
languages" (eg, Lisp, Perl, Python, csh, etc.).
In any case, this is what I want to talk about now, non-type-declared
languages.
There has always been a kind of a tension between people who prefer
type-declared languages and those who prefer the opposite. I know I
certainly looked down on those poor FORTRAN and C programmers as I
built beautiful things in Lisp. How could they not see what they were
missing?
Hard numbers are hard to come by for comparing languages, but I feel
confident in claiming that I could write the "same" program in Lisp
twice as fast as anyone else could write it in C.
Unfortunately, my comparisions are not entirely valid. Or at least
they don't depend exclusively on the language itself. One factor that
grows in importance as the program grows in size is garbage
collection. Keeping track of memory usage is a major issue in C that
doesn't exist in Lisp. That alone gives me my factor of two in many
programs.
A second factor is the programming environment. Lisp had first class
IDEs from the very early days. InterLisp (ca 1975), then the MIT Lisp
machines, and finally Sun's SPE (my contribution) were so superior to
the dross that was available for C, that there was no comparision. The
modern Java IDEs are only slightly better than what we had in 1980.
A significant part of the value of the Lisp IDEs was the command line
read/eval loop. It was really easy to try out bits of code without
having to write a full test harness. (Some folks refer to this as
being a "command line interpreter", but there's nothing that says it
has to be interpreted.)
Good error messages and simple tracing are other things that Lisp had
and C lacked. I don't think there's any question that a stack trace is
way more useful than a "SEGV".
So these are things that are not part of the language and ought not be
included in this discussion, other than to suggest that language X
would be easier to write in if only it these. [In my debugger, I
actually wrote a simple read/eval loop for Java.]
Now to the question: Are type declarations good features? Or are they
anti-features?
Python is, to a huge degree, Java without declarations. They are so
close in semantics that we (at the Broad) routinely mix the two and
translating from Java to Python is straight forward. I feel that
comparing these two is the best we can do for the purposes of our
discussion.
So, is it easier to write programs in Java or in Python?
For very small tasks, I believe Python gets the nod, no questions
asked. A large portion of that advantage is the read/eval loop which
most Java IDEs lack. I sure love being able to run a method directly
without messing around writing a main() and all that.
As the amount of code increases, this changes. When I start using code
packages that other people wrote, I become very concerned about making
sure I know what types of object a method wants, and what types it
returns. And that is exactly what a type declaration gives me. It
tells me, the programmer, what types may be passed to a method and it
even enforces that restriction.
For the cost of typing in a declaration, I am guaranteed that a host
of potential bugs cannot occur. I like cheap bug-checking. That means
that I can concentrate on what my program is supposed to do, not the
details of how its syntax.
Indeed, I find that the best-written Python code has type declarations
in it! Admittedly these declarations come as comments and they aren't
enforced by a compiler, but they do make the code a lot easier to use.
You'll notice that I haven't talked about efficency. While it is true
that type declarations allow the compiler to produce significantly
better binary code, I don't consider this to be of primary
importance. I am more concerned with how long it takes me to write and
debug the program, than I am about its run time.
A common idiom many Python (and Lisp, etc.) programmers use is to
accept arbitrary arguments and coerse them into the appropriate
types. For example:
def getGeneFromDatabase(session, id):
session = coerceToSession(session)
return session.get("FROM Gene WHERE id = ?", id)
def coerceToSession(session):
if session instanceof DbSession: // Uh, what's "instanceof" in Python?
return session
if session instanceof DbContext:
return session.getSession()
if session instanceof String:
return DbSession(session)
Contrast that to what I would do in Java:
List<Gene> getGeneFromDatabase(String alias, String id) {
return getGeneFromDatabase(new DbSession(alias), id);
}
List<Gene> getGeneFromDatabase(DbContext context, String id) {
return getGeneFromDatabase(context.getSession(), id);
}
List<Gene> getGeneFromDatabase(DbSession session, String id) {
return (List<Gene>) session.get("FROM Gene WHERE id = ?", id);
}
Which is better? The Python code is 14 characters shorter than the
Java code. Does that make it better?
To figure out what I can pass to the Python code, I have to look at
the coerce method. In Java I just look at the primary methods. If I
want to add a type for the coerce method to manipulate, I only have to
do it in one place for the Python code, whereas I'd have to write a
new method for every such class in Java. Of course if the callers of
coerce have different requirements, I might screw myself up.
With Java I am also assured of calling the semantically correct
methods, whereas in Python I am not. For example, I might have two
different classes that happen to use the same name for different
things.
class MicroWaveOven {
pubic void nuke(Food f) {...}
}
class NuclearWeaponsControl {
public void nuke(Country c) {...}
}
In a Java program, I cannot confuse these two because the compiler
will stop me:
MicroWaveOven oven = new MicroWaveOven();
Food turkey = findFood("Turkey");
oven.nuke(turkey); // Legal
NuclearWeaponsControl c = new NuclearWeaponsControl();
Food turkey = findFood("Turkey");
c.nuke(turkey); // Compiler error
By contrast, Python only cares if there's a method of the correct name
associated with the object. So this will run:
c = NuclearWeaponsControl()
turkey = findFood("Turkey")
c.nuke(turkey); // No compiler error!
but we have no idea if it will roast our poultry or wipe
Istanbul off the map.
The final thing I'd like to consider is that absolutely marvelous
invention known as "refactoring". The ability to change names or
layouts of a class in a single, simple command is a glorious thing. I
find that as I experiment and expand my programs, I often decide that
my original naming scheme really doesn't match what I'm actually doing.
Via refactoring, I can significantly improve the quality of my code
(making it easier to read and understand) at a very small
cost. Without type declarations, it is not possible to do this in
Python. If I wanted to rename MicroWaveOven.nuke(Food f) to
MicroWaveOven.cook(Food f), it's simple in Java. In Python, the system
wouldn't know if it should rename
c)NuclearWeaponsControl.nuke(Country
or not.
Python has a few extra quirks that make it harder to work with, but
are not type declaration issues. Using whitespace to define blocks of
code has the twin drawbacks of not being able to change blocks
reliably and not being able to read the code definitively. One of the
indirect consequences of this is that Python methods tend to be very
large and FORTRAN-like. (Java code from the same person is always much
more modular and easier to read.)
My conclusions:
- The primary value of type declarations is documentation, so the
programmer knows what a method can accept.
- Small Python programs are smaller than small Java programs and
easier to write. Large Python programs are larger than large Java
programs and are harder to write.
- Good tools can cover a host of sins. Java has good tools.
At the end of the day, the big question is "Can somebody pick up your
code and figure it out?"
Type declarations make programs more readable and that trumps
everything.
-Bil

