Latency is constant - not!
If you read this blog regularily you've probably heard/read about the 8 fallacies of distributed computing once or twice ... you know the assumptions architects and designers tend to make when designing distributed systems which prove to be wrong down the road, causing pain and havoc in the project. (indeed my paper explaining them is the second most poplar download on my site with just about 50K downloads)
Originally drafted in 1994 by Peter Deutsch (with one more added by James Gosling in 1997). These fallacies still hold true today. I still see designers make these same old mistakes in modern SOAs, RESTful designs and whatnot - but that's not the reason for this post.
What I want to talk about is the second fallacy "Latency is zero".
The more I think about it the more I think this fallacy should be updated to "Latency is zero or constant" (or add another fallacy for "latency is constant" on its own).
What's the difference?
Well, "latency is zero" fallacy means treating remote "things" as if they are the same as local "things". We can't do that - we need to build the API of remote things to take the fact the information takes time to get there into account (e.g. chatty interfaces vs. chunky interfaces). You can see more on that in a post called "Why arbitrary tier-splitting is bad" i wrote about a year ago
The "latency is constant" fallacy means thinking that if we send several batches of "stuff" to a remote "thing", they may arrive late but at least they'll arrive in order. Or to move from "things" and "stuff" to more concrete terms if you send messages over a network from one service to another they won't necessarily arrive in order.
But wait isn't it only true for asynchronous messages? if we make synchronous calls we don't really care about this, now do we? That's only true if you and the service you are consuming are alone in the world. In all other cases (i.e. most of the time) even if you make all your calls synchronous, you can't know what other messages (from other senders) will arrive in between your messages - and how it will affect its state.
Unreliable latency can also mean we'll retry a message because we think it is lost and find out that the reciever gets it multiple times later.
These are things you really have to take that into account when you make multiple related calls - like,say, in a saga. One thing you can do to help is make messages idempotent (which also helps with the "network is reliable" fallacy). You can also increase latency even more and order the messages something that happens, for example, when streaming video or audio.
What you really need to think about is ACID 2. No, I am not talking about the database transactions ACID but rather on another term I first saw in "Building on Quicksand" (paper (pdf)/ppt) by Pat Helland. In this paper Pat talks about some of the implications of unreliable conditions (such as inconstant latency, failure etc.) on fault tolerance. ACID 2 (which apparently was coined by Shel Finkelstein) stands for Associative, Commutative, Idempotent and Distributed. i.e. messages can be processed at least once , anywhere (same machine or across several machines), in any order.
That's harsh but I think that If you are building distributed systems today (SOA or otherwise) you can't ignore it.