Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / Win32

mscript: A Programming Language for Scripting Command Line Operations

4.79/5 (9 votes)
7 Feb 2022Apache5 min read 17K   289  
Replace your nasty .bat files with friendly mscripts for clean and powerful command line operations
mscript is a programming language with general functionality, simple syntax, and command line automation built in. It is useful for scripting that's too much for .bat files, and too little for Powershell or Python.

Introduction

mscript was initially designed as a teaching language. That didn't work out.

So I set out to make mscript useful to...

  • people who know how to code
  • people who are interested in scripting command line operations
  • people who for whatever reason are not interested in using Powershell or Python for the task

The thinking is, here's a simple scripting language, if it can solve your problem, then there's no need to get a bigger gun.

Initially written in C# with all sorts of HTML, IDE, server..stuff. I took a machete to all of that! I ported it to C++, and made it internally and externally sensible. You get a lot of bang for your buck with a 800 KB EXE script interpreter, with no external dependencies to boot. Just put it in your path and have at it!

The mscript Language

Here is a tiny glance at mscript:

mscript
! Caching Fibonacci sequence
~ fib(n)
	! Check the cache
	? fib_cache.has(n)
		<- fib_cache.get(n)
	}
	
	! Compute the result
	$ fib_result
	? n <= 0
		& fib_result = 0
	? n = 1 || n = 2
		& fib_result = 1
	<>
		& fib_result = fib(n - 1) + fib(n - 2)
	}
	
	! Stash the result in the cache
	* fib_cache.add(n, fib_result)
	
	! All done
	<- fib_result
}
! Our cache is an index, a hash table, any-to-any
$ fib_cache = index()

! Print the first 10 values of the Fibonacci series
! Look, ma!  No keywords!
# n : 1 -> 10
	> fib(n)
}

It is a line-based, pseudo-object-oriented scripting language that uses symbols instead of keywords.

It doesn't care about whitespace. No semicolons.

Objects

In mscript, every variable contains an object (think .NET's Object, but more like VB6's VARIANT).

An object can be one of six types of things:

  1. null
  2. number - double
  3. string - std::wstring
  4. bool
  5. list - std::vector<object>
  6. index - std::map<object, object>, with order of insertion preserved, a vectormap

list and index are copied by reference, the rest by value

mscript Statements

mscript
/* a block
comment
*/

! a single-line comment, on its own line, can't be at the end of a line

> "print the value of an expression, like this string, including pi: " + round(pi, 4)

>> print exaclty what is on this line, allowing for any "! '= " 0!')* nonsense you'd like

{>>
every line
in "here"
is printed "as-is"
>>}

! Declare a variable with an optional initial value
! With no initial value, the variable has the null value
$ new_variable = "initial value"

! A variable assignment
! Once a variable has a non-null value, the variable cannot be assigned
! to a value of another type
! So mscript is somewhat dynamic typed
& new_variable = "some other value"

! The O signifies an unbounded loop, a while(true) type of thing
! All loops end in a closing curly brace, but do not start with an opening one
O
	...
	! the V statement is break
	> "gotta get out!"
	V
}

! If, else if, else
! No curly braces at ends of each if or else if clause, 
! just at the end of the overall statement
? some_number = 12
	& some_number = 13
? some_number = 15
	& some_number = 16
<>
	& some_number = -1
}

! A foreach loop
! list(1, 2, 3) creates a new list with the given items
! This statements processes each list item, printing them out
! Note the string promotion in the print line
@ item : list(1, 2, 3)
	> "Item: " + item
}

! An indexing loop
! Notice the pseudo-OOP of the my_list.length() and my_list.get() calls
! This is syntactic sugar for calls to global functions,
! length(my_list) and get(my_list, idx)
$ my_list = list(1, 2, 3)
# idx : 0 -> my_list.length() - 1
	> "Item: " + my_list.get(idx)
}

{
	! Just a little block statement for keeping variable scopes separate
	! Variables declared in here...
}
! ...are not visible out here

! Functions are declared like other statements
~ my_function (param1, param2)
	! do something with param1 and param2

	! Function return values...
	! ...with a value
	<- 15
	! ...without a value
	<-
}

! A little loop example
~ counter(low_value, high_value
	$ cur_value = low_value
	$ counted = list()
	O
		! Use the * statement to evaluate an expression and discard its return value
		! Useful for requiring deliberate ignoring of return values
		* counted.add(cur_value)
		& cur_value = cur_value + 1
		? cur_value > high_value
			! Use the V statement to leave the loop
			V
		<>
			! Use the ^ statement to go back up to the start of the loop, a continue statement
			^
		}
	}
	<- counted
}

! Load and run another script here, an import statement
! The script path is an expression, so you can dynamically load different things
! Scripts are loaded relative to the script they are imported from
! Scripts loaded in this way are processed just like top-level scripts,
! so they can declare global variables, define functions, and...execute script statements
! Plenty of rope...
+ "some_other_script.ms"

mscript Expressions

mscript statements make use of expressions, some simple, some very powerful.

Binary operators, from least to highest precedence:
or || and && != <= >= < > = % - + / * ^

Unary operators: - ! not

An expression can be:
null
true
false
number
string
dquote
squote
tab
lf
cr
crlf
pi
e
variable as defined by a $ statement

Strings can be double- or single-quoted, 'foo ("bar")' and "foo ('bar')" are valid; 
this is handy for building command lines that involve lots of double-quotes; 
just use single quotes around them.

String promotion:
	If either side of binary expression evaluates to a string, 
	the expression promotes both sides to string

Bool short-circuiting:
	The left expression is evaluated first
		If && and left is false, expression is false
		If || and left is true, expression is true

Standard math functions, for your math homework:
abs asin acos atan ceil cos cosh exp floor 
log log2 log10 round sin sinh sqrt tan tanh

getType(obj) - the type of an object obj as a string
			 - you can also say obj.getType()
			 - see the shorthand?
number(val)	 - convert a string or bool into a number
string(val)  - convert anything into a string
list(item1, item2, etc.) - create a new list with the elements passed in
index(key1, value1, key2, value2) - create a new index with the pairs of keys 
and values passed in

obj.clone() - deeply clone an object, including indexes containing list values, etc.
obj.length() - C++ .size(), string or list length, or index pair count

obj.add(to_add1, to_add2...) - append to a string, add to a list, or add pairs to an index
obj.set(key, value) - set a character in a string, change the value at a key in a list or index
obj.get(key) - return the character of a string, the element in a list, 
or the value for the key in an index
obj.has(value) - returns if string has substring, list has item, or index has key

obj.keys(), obj.values() - index collection access

obj.reversed() - returns copy of obj with elements reversed, including keys of an index
obj.sorted() - returns a copy of obj with elements sorted, including index keys

join(list_obj, separator) - join list items together into a string
split(str, separator) - split a string into a list of items

trim(str) - return a copy of a string with any leading or trailing whitespace removed
toUpper(str), toLower(string) - return a copy of a string in upper or lower case
str.replaced(from, to) - return a copy of a string with characters replaced

random(min, max) - return a random value in the range min -> max

obj.firstLocation(toFind), obj.lastLocation(toFind) - find the first or last location 
of an a substring in a string or item in a list
obj.subset(startIndex[, length]) - return a substring of a string or a slice of a list, 
with an optional length

obj.isMatch(regex) - see if a string is a match for a regular expression
obj.getMatches(regex) - return a list of matches from a regular expression applied to a string

exec(cmd_line) - execute a command line, return an index with keys 
("success", "exit_code", "output")
This is the main function gives mscript meaning in life.  
You build your command line, you call exec, and it returns an index with all you need to know. 
Write all the script you want around calls to exec, and get a lot done.

exit(exit_code) - exit the script with an exit code

error(error_msg) - raise an error, reported by the script interpreter

readFile(file_path, encoding) - read a text file into a string, using the specified encoding, 
either "ascii", "utf8", or "utf16"

writeFile(file_path, file_contents, encoding) - write a string to a text file with an encoding

A Little Magic

If a function name is not a built-in one, and it's not the name of a user-defined function, then, if the function name is the name of a variable, and the variable is a string, then the value of that variable becomes the function name, and is executed with the same parameters.

mscript
So if you have...

~ addTogether(one, two)
	<- one + two
}

and

~ powTogether(one, two)
	<- one ^ two
}

...you can then use...

$ func = "addTogether"
$ added = func(2, 3)
! added is 5

& func = "powTogether"
$ powed = func(2 ,3)
! powed is 8

Poor man's function pointers. Pretty neat, huh?

A Little More Magic

As discussed, mscript makes function calls of the form something.function(param1...) "object oriented" by passing something as the first parameter to the function, function(something, param1...). You can't chain these things, and something has to be the name of a variable, not any another kind of expression. This shorthand makes it easier to read and write scripts.

If you want to do a lot with one line of code, you can nest functions to your heart's content:

mscript
$ lines = split(trim(replaced(get(exec("dir"), "output"), crlf, lf)), lf)

Code of Note

popen

The core useful functionality of mscript is the exec function.

In C++...

C++
// Initialze our output object, an index
object::index retVal;
retVal.set(toWideStr("success"), false);
retVal.set(toWideStr("exit_code"), -1.0);
retVal.set(toWideStr("output"), toWideStr(""));

// Run the program.
FILE* file = _wpopen(paramList[0].stringVal().c_str(), L"rt");
if (file == nullptr)
	return retVal;

// Read the program's output into a string
char buffer[4096];
std::string output;
while (fgets(buffer, sizeof(buffer), file))
	output.append(buffer);
retVal.set(toWideStr("output"), toWideStr(output));

// If we stopped at EOF, things went well
retVal.set(toWideStr("success"), bool(feof(file)));

// Close the "file" and get our exit code
int result = _pclose(file);
retVal.set(toWideStr("exit_code"), double(result));
file = nullptr;

// All done
return retVal;

popen is the magic sauce. Call it, read the output, close the file, get the exit code, and return it all. Using an index object in this way allows for a multi-valued return... value.

std::string <=> std::wstring conversions

I've used character encoding code like this for years:

C++
std::string toNarrowStr(const std::wstring& str)
{
	std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> converter;
	return converter.to_bytes(str);
}

std::wstring toWideStr(const std::string& str)
{
	std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
	return converter.from_bytes(str);
}

I noticed that the test scripts were running really slowly. I found that Visual Studio Community Edition including a profiler (wow!) and it showed that 60% of the execution time of the whole program was in toWideStr, specifically converter.from_bytes(str). The sample script for this project called dir on a large and varied directory structure, and the mscript interpreter crashed processing the dir output.

Something had to give.

So I went with Win32 character conversions functions for correctness, and pre-call conversion checkers for speed. So many strings need no special processing, especially within mscript, so I optimized those cases:

C++
std::wstring toWideStr(const std::string& str)
{
	bool allNarrow = true;
	{
		const unsigned char* bytes = reinterpret_cast<const unsigned char*>(str.data());
		for (size_t i = 0; i < str.size(); ++i)
		{
			if (bytes[i] > 127)
			{
				allNarrow = false;
				break;
			}
		}
	}

	if (allNarrow)
	{
		std::wstring retVal;
		retVal.reserve(str.size());
		for (auto c : str)
			retVal += char(c);
		return retVal;
	}

	int needed = MultiByteToWideChar(CP_UTF8, 0, str.data(), int(str.size()), nullptr, 0);
	if (needed <= 0)
		raiseError("MultiByteToWideChar failed");

	std::wstring result(needed, 0);
	MultiByteToWideChar(CP_UTF8, 0, str.data(), int(str.size()), result.data(), needed);
	return result;
}

std::string toNarrowStr(const std::wstring& str)
{
	if (str.empty())
		return std::string();

	bool allAscii = true;
	for (wchar_t c : str)
	{
		if (c <= 0 || c > 127)
		{
			allAscii = false;
			break;
		}
	}

	if (allAscii)
	{
		std::string retVal;
		retVal.reserve(str.size());
		for (auto c : str)
			retVal += char(c);
		return retVal;
	}

	int needed = WideCharToMultiByte(CP_UTF8, 0, str.data(), 
                 int(str.size()), nullptr, 0, nullptr, nullptr);
	if (needed <= 0)
		raiseError("WideCharToMultiByte failed");

	std::string output(needed, 0);
	WideCharToMultiByte(CP_UTF8, 0, str.data(), int(str.size()), 
                        output.data(), needed, nullptr, nullptr);
	return output;
}

I don't have the numbers, but the performance was night and day, and the dir output was processed and usable. One-two punch.

Sample Script: musicdb

This sample script uses dir against your entire music library, parsing out what look like artists, albums, and tracks. You then enter search tokens to find artists of interest and see their catalogs. If arguments are passed to the script on the command line, the script takes those as the search tokens, and silently loads and processes the music library paths, does the search operation, outputs the results, and quits.

mscript
! If we have command line arguments, those are our search terms
! Handle things directly, no "UI", just results
? arguments.length() > 0
	$ lines = loadLines()
	
	$ db = index()
	* processLines(db, lines, false)

	$ matching_artists = getMatchingArtistNames(db, arguments)
	@ matching_artist : matching_artists
		* summarizeArtist(db, matching_artist)
		>>
	}
	! Exit the script with a good exit code
	<- 0
}

>> Loading music files...
$ lines = loadLines()
> "Music Files: " + lines.length()

$ db = index()
* processLines(db, lines, true)
>> All done
* outputStats(db)

O
	>>
	>> Enter artist name search string, as much as you'd like, however you'd like:
	$ matching_artists = getMatchingArtistNames(db, split(input(), " "))

	> "Matching Artists: (" + matching_artists.length() + ")"
	? matching_artists.length() > 0
		> matching_artists.join(lf)
	}

	@ matching_artist : matching_artists
		* summarizeArtist(db, matching_artist)
		>>
	}
}

! Do the dir of the user's Music directory
! Surprisingly fast!
~ loadLines()
	$ result_index = exec('dir /B /S "C:\Users\%USERNAME%\Music"')
	? !result_index.get("success")
		> "Running the dir command failed with exit code " + result_index.get("exit_code")
		<- list()
	}
	<- split(replaced(result_index.get("output"), crlf, lf), lf)
}

! Walk dir output processing each line in turn
~ processLines(db, lines, should_pacify)
	? should_pacify
		$ line_count = 0
		@ line : lines
			* processLine(db, line)
			& line_count = line_count + 1
			? (line_count % 5000) = 0
				> line_count
			}
		}
	<>
		@ line : lines
			* processLine(db, line)
		}
	}
}

! Given a line from the dir output, add a track to our database...
! ...if it's a somewhat valid line
~ processLine(db, line)
	! Split up path, bail if not at least artist\album\track
	& line = line.trim()
	$ parts = line.split('\')
	? parts.length() < 3
		! not deep enough
		<- false
	}
	
	$ filename = parts.get(parts.length() - 1)
	$ dot_index = filename.lastLocation('.')
	? dot_index <= 0
		! not a file
		<- false
	}
	$ track = filename.subset(0, dot_index)

	$ artist = parts.get(parts.length() - 3)
	$ album = parts.get(parts.length() - 2)

	! DEBUG
	!> "Artist: " + artist + " - Album: " + album + " - Track: " + track

	! Ensure the artist is in the DB
	? !db.has(artist)
		* db.add(artist, index())
	}
	$ artist_index = db.get(artist)

	! Ensure the artist has the album
	? !artist_index.has(album)
		* artist_index.add(album, list())
	}
	$ album_list = artist_index.get(album)

	! Add the track to the album
	* album_list.add(track)

	! All done
	<- true
}

! Walk the database of artist collections gathering and outputting stats
~ outputStats(db)
	$ artists = db.keys()
	$ album_count = 0
	$ track_count = 0
	@ artist : artists
		$ artist_index = db.get(artist)
		@ album_name : artist_index.keys()
			& album_count = album_count + 1
			$ album_tracks = artist_index.get(album_name)
			& track_count = track_count + album_tracks.length()
		}
	}
	>>
	> "Artists: " + artists.length()
	? artists.length() > 0
		> "Albums:  " + album_count + " - albums / artist = " + 
                        round(album_count / artists.length())
		> "Tracks:  " + track_count + " - tracks / album = " + 
                        round(track_count / album_count)
	}
}

! Given search terms, return the names of artists that match all terms
~ getMatchingArtistNames(db, parts)
	! Normalize the input, trimmed and lowered
	$ normalized_parts = list()
	@ part : parts
		$ normalized_part = trim(toLower(part))
		? normalized_part.length() > 0
			* normalized_parts.add(normalized_part)
		}
	}

	! Walk the artists finding matches
	$ matching_artists = list()
	? normalized_parts.length() = 0
		<- matching_artists
	}
	@ artist : db.keys()
		$ artist_lower = toLower(artist)
		$ match = true
		@ part : normalized_parts
			? !artist_lower.has(part)
				& match = false
				V
			}
		}
		? match
			* matching_artists.add(artist)
		}
	}
	<- matching_artists
}

! Output an artist's collection
~ summarizeArtist(db, artist_name)
	> "Artist: " + artist_name
	$ artist_albums = db.get(artist_name)
	@ album : artist_albums.keys()
		> "  Album: " + album
		$ album_tracks = artist_albums.get(album)
		@ album_track : album_tracks
			> "    " + album_track
		}
	}
}

Project Layout

In the attached source.zip, you'll find it all, including the Visual Studio solution file.

Here are the projects that make up the solution:

mscript-lib

The mscript-lib project is where expressions and statements are implemented. All the working code in the solution is in this project.

  • expressions
  • object
  • script_processor
  • symbols
  • utils
  • vectormap

mscript-tests

Unit tests

mscript-test-runner

You can only do so much with unit tests without the test code getting large and unwieldy. Instead of making bad unit tests, I made a set of files with script to execute and results to expect. So in mscript-test-scripts, you'll find test files, with statements up top and expected results below, separated by ===.

mscript-test-runner runs all scripts in the directory and validates that it gets the expected results.

mscript

This is the script interpreter. All the code is in mscript-lib, so this is just a shell around that project.

The tricky bit in the mscript program is loading secondary scripts as requested by + statements. The path to the secondary script is relative to the script doing the importing. So if you tell the intepreter to run c:\my_scripts\mine.ms, and mine.ms imports yours.ms, yours.ms is looked for in c:\my_scripts, not in the current directory or some such thing.

Conclusion

So that was a deep dive into the mscript language: the symbol-driven list of statement types; the expression syntax with a purpose-built function library and a little magic, and a real-world example.

This is the second draft of mscript. Hopefully, it resonates with you, and you can use it to get big command line jobs done with simple and clean mscripts. I'd love to get your feedback, so please chime in in the comments. Thank you! Enjoy!

History

  • 7th February, 2022: Initial version

License

This article, along with any associated source code and files, is licensed under The Apache License, Version 2.0