|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
IntroductionThis article presents an interpreter for the statically typed functional stack-based programming language Cat (http://www.cat-language.com/). Cat can perhaps most succinctly be described as the functional cousin of Forth with a type system. Cat is inspired by a number of languages, the most notable being Joy (another functional cousin of Forth, but without a type system). A list of related languages is available at the end of the article. The version of the Cat interpreter included with this article is version 0.9.6. The most recent version of the interpreter and source code can be accessed publicly at http://www.cat-language.com/download.html. This article and source code should be of interest to people who are interested in implementing parsers, compilers, or interpreters. This article is not a tutorial and hence is not recommended for beginners, but instead is a reasonably well-documented example of a working interpreter which does type inference and type checking which you can hack and modify for your own purposes, either as a stand-alone interpreter or scripting add-on. LicenseThe Cat executables and source code are all public domain. This means it can be used and modified without restriction, requirement, or attribution. You can release any modified version of this code using any license you want, or for any purpose. Feel free to use the source code in any commercial products, and make yourself look great in front of your boss. One caveat though, there is no warranty, express or implied. This means you use the code at your own risk, just like anything you get for free. BackgroundCat was designed to be a simple scripting language with a strong type system and also a portable intermediate language. I found that existing intermediate languages are not easily read and written by humans, and usually lacked a type system. I also found that many scripting languages tended to have burdensome syntax and lacked a static type system. The advantages of Cat I believe are its simplicity, expressiveness, extendability, and portability. The Cat language syntax and semantics are inspired by stack based programming languages (e.g. Forth, Postcript, Joy), intermediate languages (e.g. CIL/MSIL, Java byte code) and statically typed functional languages (e.g. Haskell, OCaml). Cat is a bit of an odd duck for a scripting language because it is statically typed, and does type inference. This is not a unique idea, several languages do it (e.g. Haskell, OCaml, Scala, F#) however you will not find a C# version of their source code posted on CodeProject any time soon. There are several reasons why Cat might be interesting to you:
Static TypingA statically typed language with type inference behaves much like a dynamic language, in that type annotations are not neccessary. As a result a Cat program can be very compact. For example consider the following Cat program which will add one to a value: define inc { 1 + }
In a weakly typed language, you would probably be able to call such a function passing it a string, and have it fail during execution. This is, of course, assuming that the control path is in fact executed. A challenging problem with dynamic typing is that type errors can lay dormant until after the software is released publicly. What is special about type inference is that the fact that this function requires an integer is deduced at compile time, and a call using a non-integer can be detected. Type inference mitigates the overhead of writing type annotations for your functions but without sacrificing the safety of having your compiler identify type errors. Safe and convenient. Cat BasicsThe Cat programming environment consists of a set of definitions of functions (user defined and primitive) and two evaluation stacks. Most operations manipulate the "primary" stack, and a handful use the "auxiliary" stack. When I refer to simply "the stack", I am referring to the primary stack. Cat functions pop their arguments from the stack and push their results onto the stack. Literal numbers (e.g. Binary arithmetic operators (e.g. >> 33 3 *
main stack: 99
>> 1 2 + 5 *
main stack: 99 15
>> >
main stack: true
>> pop
main stack: _empty_
All phrases (sequence of terms terminated by a new line) in the language are either definitions of new functions, or are instructions to be executed by the interpreter. Cat functions are defined as follows: >> define what_is_the_answer { 6 7 * }
main stack: _empty_
>> what_is_the_answer
main stack: 42
For more information about the Cat language there is a detailed Cat tutorial at http://code.google.com/p/cat-language/wiki/Tutorial. There is an even more in-depth and technical description of the language at http://www.cat-language.com/manual.html. A formal definition of the Cat syntax can be found at http://www.cat-language.com/manual.html#syntax. In brief the Cat interpreter accepts commands directly in the form of either: text words, groups of symbols, numbers, strings, or references. I should point out that a nifty feature of Cat is the ability to define new symbols. So if I was a fan of the old fashioned ? operator from my Atari Basic, I could write: >> define ? { write_line }
The Source FilesNothing exciting here, just a list of the files in the project along with a brief description.
The InterpreterHere is a broad overview of how the interpreter works:
The ParserThe parser's job is to convert the source code from a text format to a more easily managed format, usually an abstract syntax tree (AST). Traditionally parsing is a second step, after a lexical analysis phase which separates a text stream into tokens. The lexical analysis (lexxing) phase is not actually neccessary, and is skipped entirely by the Cat language. The AST generated by the Cat parser is represented using the The Cat parser is a straightforward implementation of what is known as a simple recursive descent parser. The Cat parser creates a new tree node to represent each of the different elements of the language, as expressed in the grammar production. There are several different kinds of language elements for which nodes are created, which are differentiated using the public enum NodeType
{
Definition,
StackState,
SimpleType,
FxnType,
Root,
Quotation,
Word,
Number,
String,
ParanGroup,
BraceGroup,
LineComment,
Reference,
VarDecl,
Undefined
};
The language parser uses various cues and will look ahead a single character if necessary to decide what language element is being parsed. This obviates the need to back up during failure in the Cat parser. In more complex languages, a parser would often need to be able to abandon a parsing rule, delete the current node, and try the next one. One big difference about the Cat language which is worth pointing out is that sequences of symbols are labelled as The Type Inferer[Note: this section is only a gross oversimplification of the type inference algorithm, and is very technical. You can easily skip this section, and still understand and use the project] The type inference algorithm for Cat is very complex and the current implementation is somewhat brittle. This is partly due to the fact that this is a brand new experimental type system and under active development. The following algorithm description is only an approximation of the steps taken. I am trying to convey the logic of what happens, rather than the nitty-gritty reality. The type inference algorithm is initialized with a function with a type signature of: (_main_:any*)(_aux_:any*) -> (_main_)(_aux_)
This is called the return type. The action of each operation that makes up the function definition is then abstractly simulated on a stack containing types. That is to say that only the consumption, and production of types is simulated, not the actual operation itself. Type checking and type inference algorithms usually work by simulating the action of functions at the type level. For example, to simulate an integer addition function you would pop two integers from the type stack and push one integer onto the type stack. In the case of the type inference algorithm, the type stack is initialized with a multi-type variable. When a type is popped from the type stack and a multi-type variable is found, it conveys a constraint: the type variable must match that value. This constraint is used to more precisely describe the type variables, through a deductive process. In the Cat type inference algorithm, the simulated type stacks are taken from a
Now that all of the instructions are simulated and all of the constraints are resolved, the The ExecutorThe Executor manages the primary and auxiliary stack, as well as provides the core implementation of the primitive functions. The primitive functions are The executor provides a public function for executing raw strings, but the actual work of processing the string is passed to a FunctionsAll functions are registered with the global scope object Alternatively you can define a standard library function using existing functions. For this purpose you could use the Unit TestsThe unit tests would probably be more accurately called integration tests, since they only test high-level functionality like the type inference mechanism, and the primitive functions. The tests are an excellent way to learn the language by observing what it is expected to do in numerous circumstances, and comparing it against your own expectations. The tests can be turned on and off, and have their behaviour modified, through various flags found in the Known IssuesThere are some tests of the type system which fail under the current implementation. These are located in the function Some known issues with the current Cat release are:
Untyped Cat (Kitten)Cat doesn't have to be used as a statically typed language. Type annotations can be ignored and the type inference feature can be turned off. To do this, simply set the flag Every Cat program will be a valid Kitten program, but the Kitten interpreter will be much more permissive. For example, you would be able to write the following code in Kitten, which would fail in Cat: define f { [42] ["all your base"] if }
The typing requirements of Cat, which are not enforced in Kitten, are that the two functions passed to the It is worth noting that Kitten resembles the Joy language even more closely than Cat. Related WorkHere are some languages which inspired the Cat language, or are very closely related. Stack based languages:
ConclusionI'd be very happy to hear about spin-off projects that result from this work, or ideas for applications of the Cat language which you are interested in seeing implemented. I hope this code serves you well, and that you find the language interesting and helpful.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||