Lisp's Mysterious Tuple Problem

David Jeske

5.00/5 (5 votes)

May 18, 2017

CPOL

12 min read

31745

re: Why Lisp doesn't catch on.

Introduction

When Lisp was created in 1958, it was a groundbreaking language. It offered capabilities that it's main alternative, Fortran, couldn't touch. Over the years, Lisp's flexibility has made it a fertile ground to explore new programming language concepts, before they invariably end up copied and put into other languages. Yet Lisp itself, even with the recent modernization in Clojure, is not nearly as popular as other languages. Why is that?

There is no doubt Lisp is powerful. It's simplicity of implementation and homoiconicity enable working at a level of abstraction akin to being able to easily write your own programming language for every task. Some articles I've read connect this to Lisp's lack of traction, claiming it's too powerful for it's own good. In part I agree with their observation -- when every programmer can trivially build their own abstractions, programming can easily become chaotically fragmented. However, I don't buy into their hubris. Lisp is not some idealistic paragon of power, like creation itself,. which mere mortal programmers just can't handle. No, Lisp has a serious shortcoming which keeps 'the rest of us' from using it.

I call Lisp's biggest shortcoming the mysterious tuple problem. It stems Lisp's idiomatic over-use of lists as product types, which are more commonly called tuples. In this essay, I'll explain the problem, how more popular languages don't suffer from it, and some ways to have the power of Lisp without it.

Here are some mysterious tuples:

(2.3 4.5 2.3)  ;; is it a 3d vector (x,y,z)? a list of 3 floats? or something else?
("Fred" "Jones")  ;; is this a tuple of (firstname,lastname)? or a list of names?

What is a Tuple?

Wikipedia describes a tuple as a finite ordered list of elements. More specifically, it is an list of items, where the position an element holds has meaning, where the types may vary, and where the individual items have no logical names. Often tuples are immutable. A review of a Lisp or Scheme reference will not produce a definition of a tuple because it has no such datatype. Lisp programmers commonly represent tuples as lists. It is only partially facetious to say that in Lisp, all tuples are lists, but not all lists are tuples.

What is an example of a tuple? 3d Vector, commonly represented as (x, y, z). When we process a 3d vector, we need to specifically handle the x,y, and z positions separately. A 3d vector with 2 or 4 elements is invalid, because we're writing code to specifically handle three values. Sorting or otherwise rearranging the values destroys the data. Operations between 3d vectors and other random triples of three numbers also destroys the data.

What type of lists are not tuples? Lists, of anything. A list is a variable length list of items. Most lists are monomorphic, meaning that the items in the list are all the same kind. Sometimes that means the elements of the list are exactly the same type. For example, a list of string names ('apple', 'pear'), or a list of numbers (1, 2, 3, 4, 5). Sometimes elements in the list share a similar quality but are not exactly the same type, such as code-objects that know how to draw themselves. This is called polymorphism. Rarely, lists are hetromorphic, meaning that the items in the list share no type or similarity. In this case, code that handles the list will have to specifically handle every type of item in the list separately. One common thing we can say of all lists is that any code which attributes meaning to a position in this list is flawed, which makes them pretty much the opposite of tuples.

Mysterious Tuple Problem

When looking at Lisp code, it is very hard to know what tuples exist and what is actually in them.

In Lisp is it not only idiomatic to use lists to represent tuples, it's common to do it everywhere. In fact, some Lisp code uses lists as the exclusive aggregate datatype. Not only does the code not carry any type-declaration saying a particular element is a 3d-vector, but the data itself doesn't even know it's a 3d vector. As far as Lisp is concerned it's just a list of three numbers (3 4 -3). Code constructing tuples often doesn't refer to logical names for the individual positions in a tuple (such as x, y, z), because it doesn't need to. In addition, it is common in Lisp for functions to repeatedly deconstruct and reconstruct tuples, often without mentioning the name of the tuple type or the logical names of it's fields. This makes it very hard to know what is supposed to go in, what comes out, or what the functions in the middle are actually doing.

If our program only contained one type of Tuple, 3d vectors, it might be managable. Throw around a five or six kinds of tuples and it becomes tricky to tell them apart. Throw around dozens and dozens of kinds of tuples (like the code for Emacs), and it's complete and utter unmitigated tuple chaos.

Of course Lisp has alternative ways to store tuples. It has structs; it has objects; it has multiple types of structs and objects. However, most Lisp programmers and programs don't use them very much, and later we'll talk about why.

Would you be surprised to learn that programs written in the world's most popular programming languages don't even use tuples? They don't. That includes programs in C, C++, C#, and Java, which are all based around some form of record. In popular dynamic languages where tuples can be expressed, including Python, they are not idomatic. Most Python programs rightly only use tuples for multiple-return arguments, preventing the mysterious tuple problem from spreading to the rest of the program.

Note: C/C++ structs are sort-of tuples. The pre-initialized data syntax can assign data-values to fields based on field position, and the order of fields in the declaration dictates their layout in memory. However, field values can only be read or set using field names, which means in practice they behave more like records than tuples.

The 'rest of us' non-Lisp programmers are not afraid of Lisp's power. We just don't like code that is an unreadable chaos of unknown tuple types. We prefer to use record types, commonly known as structs.

However, before we talk about structs, we're going to talk about something else.

Why Lisp lovers love lists ( but the rest of us don't )

Of course Lisp lovers don't think using lists for everything is chaos. They think it's beauty. The beauty that all data is a list, and thus all data can be uniformly manipulated. The freedom of never having to declare a tuple's structure, and thus be able to trivially produce a new one on every single line of code. The flexibility that you can pass any data into any transformation function, because everything is a list.

There are two reasons this viewpoint isn't the prevailing viewpoint of most software development:

The first reason, is that mysterious tuples make code very hard to read. When looking at a Lisp function, it's usually easy to see that a variable is a list -- perhaps a tuple, or perhaps a list of tuples. What's not so easy is to know what is inside the tuple. We look at where the list came from, often to find not the creation of the data, but a transformation of data from somewhere else. So we look at where that data came from, and so on and so on. Working in foreign Lisp code is like an archaeology dig, unpeeling the layers of change. Sometimes we resort to running the code and printing the data which happens to appear, hoping it'll explain what it is. Sometimes it does. Sometimes it doesn't.

The second reason, is that mysterious tuples make code very hard to change. When changing the middle of a list transformation pipeline, any change to the structure of the nested lists may inadvertently break the expectations of some other code -- and there is nothing in the Lisp system to help you know about it or find it. The best you can do is run the code to see if it breaks. Hopefully you've also written exhaustive tests, so you don't have to test it all manually. Writing tests is good. Having to write them just so you don't create tuple chaos is not so good.

If this problems sounds vaguely like some of the problems logged against dynamic languages in general, you'd be right, because it is. However, objects in Python and Ruby have a class structure; they have runtime types. Idiomatic programming in those langues does not store everything in tuples or lists. When we make a Python 3d-vector, we would often make it as a class, containing the fields x, y, and. z. Data of that type will know it's type at runtime, and code which uses that type will refer to x, y, and z. That structure guides programmers in understanding, changing, and debugging code. Using lists as the baseline data-structure of programming, the way Lisp does, amplifies the fragility of a dynamic program insurmountably.

If making everything a list isn't the answer, then what is? Well, there is no one-size fits all free lunch, but for the next part of our journey, let's talk about records.

One Alternative: Records

A record type, or struct, is a (generally un-ordered) set of fields, where each field is identified by name. Because the fields have names, every access to data carries with it some meaning about what the data is.

In dynamic programming languages like Python and Javascript, these records are usually stored in a hashtable, either as part of a hashmap or class definition which is ultimately backed by a hashmap. Python classes require definitions, while hashmaps do not.

In most static typed programming languages, like C, C++, C#, and Java, classes and structs are record types which are represented more efficiently than a hashmap. Also, they must be explicitly declared, which brings the drawback of extra typing, but the benefit that the compiler can complain when the code doesn't match the intended structure. It also gives them names..

In static type inference programming languages, like ML, OCaml, and Irken, records and objects can be created without definitions, more like dynamic hashmaps -- yet they are represented efficiently and typechecked, more like classes or structs.

Note: In C and C++, a struct is actually ordered, and so it is actually a hybrid of tuple product type and fieleded record type. However, it's order may only be used during data-initializers and when relying on it's in memory representation. All code which accesses elements of the struct must refer to their fields by name. In C#, a struct may be given the attribute [StructLayout(LayoutKind.Sequential)] to ask it to remain in order, so that it matches C's in-memory layout. However, data-initializers in C# must always be done through the constructor argument order, and fields are always accessed by name.

Why don't Lisp programs idomatically use structs?

It's hard to find an abstraction in another programming language which isn't present in Lisp, and so of course Lisp also has structs. However, they are not as commonly used in lisp programs. Let's take a look at why. Here is the Lisp code to declare our 3d vector type, create one, and then print it out:

(defstruct vec3d
  (x 0.0 :type float) 
  (y 0.0 :type float)
  (z 0.0 :type float))

(define point (make-vec3d :x 3 :y 4 :z -3))
(print (format "(~S,~S,~S)" (vec3d-x point) (vec3d-y point) (vec3d-z point)))

A few things jump out in this code snippet. First, we had to make a declaration of what a struct is. Second, our code that creates a vector is substially more wordy and unclean looking than (3,4,-3). Further, access to this struct is also very wordy and comes with the typename each time and every time. Clearly we can see why structs are not a solution to common datastructure needs in Lisp. Compare this to similar facilities in other languages:

// Javascript
point = {x=3, y=4, z=-3}; 
print(format("(%s,%s,%s)", point.x, point.y, point.z)

# Python
class Vec3d:
    def __init__(self,x,y,z): 
        self.x = x; self.y = y; self.z = z
point = Vec3d(3, 4, -3)
print "(%s,%s,%s)" % (point.x, point.y, point.z)

;; clojure (a Lisp, ha ha!)
(def point {:x 3 :y 4 :z -3})
(printf "(%s,%s,%s)" (:x point) (:y point) (:z point))

Note: while Clojure does have some nice syntax for handling maps, which can be used to represent records, it's unfortunately still pretty idiomatic to store tuples in lists in Clojure code.

Another Alternative: Typed Tuples (Variants)

While records are a powerful and clear way to represent tuples, sometimes the verbosity of using field names all the time can obscure the meaning of the program. Some statically typed languages offer a more compact, even Lisp-like, way to represent tuples which retain their type. They are often provided as a part of something known as Algebraic Datatypes, and are available in languages like ML, OCaml, and Haskell. In these languages, a tuple is of a specific type (sometimes called a variant type), and contains a specific product of element types. For example, our 3d vector might be of the (vec3d = float,float,float).

In many static languages you have to declare your types ahead of time, which can be a burden compared to dynamic languages. However, OCaml and Irken have a special kind of typed tuple called a polymorphic variant. These can be created without declarations, by merely specifying a name. The name allows your compiler to typecheck your tuples and assure that you correctly unpack all the elements when you use them, and that uses of the same tuple type, along the same code path, have the same type.

(*  OCaml *)
let point = `Vec3d 3. 4. -3. in
match point with `Vec3d x y z -> Printf.sprintf "(%f,%f,%f)" x y z

;; Irken
(define point (:vec3d 3 4 -3))
(match point with (:vec3d x y z) -> (printf "(" (float x) "," (float y) "," (float z) ")"))

Emulating Variants in Lisp

If someone took away every other programming language, and made me program in Lisp. I would use Irken, a statically typed type-inference hybrid of ML and Lisp.

If they made me program in dynamic Lisp: I would be sad, for sure. I would use Clojure. Then I would merely do what Lisp programmers do best. I would invent my own abstraction -- for explicitly differentiated tuples. How?

It's easy enough in Clojure to write a vector with the type of the tuple as the first element. Further, it's easy to use match expressions to unpack tuples when handling their values. This assures that all tuples know their type, and all places that create or unpack them mention their type. It would also provide some runtime checking that tuples are not accidentally mixed up. It's not as nice as OCaml or Irken's compile time checking, but I'd survive.

;; clojure
(def point [:vec3d 3 4 -3])
(match [point] [:vec3d x y z] (printf "(%s,%s,%s)" x y z))

In my own code I would then have style-guidelines outlawing the use of undifferentiated tuples. Sadly, everyone else would be unlikely to use my convention, so I would have to avoid or contain their soups of mysterious tuples.

Conclusion

I hope you see why I don't think Lisp's relative lack of popularity is because it has overwhelming power that us 'mortals' just can't handle. Instead I think that lisp has a fatal flaw in it's idomatic over-use of undifferentiated tuples stored as lists. Despite Lisp's flexibility, this is the reason it's popularity lags, while the popularity of other dynamic languages, such as Python, Ruby, and Javascript has soared.