Today, more and more cell-phones (or other mobile clients) support WAP browsing functions. To enhance the ability of the browser side, now many WAP browsers support WMLScript. The WMLScript language is based on the ECMAScript [ECMA262] but it has been modified to better support low bandwidth communication and thin clients. WMLScript can be used not only with WML but also as a standalone tool.
One of the main differences between ECMAScript and WMLScript is the fact that WMLScript has a defined bytecode and an interpreter reference architecture. That can give better performance in narrowband and small memory environments. To make the language smaller, and easier to compile into bytecode, many advanced features of the ECMAScript have been dropped. For example, WMLScript is a procedural language and it supports locally installed standard libraries.
What DWmlsc can do
Since WMLScript can be compiled into bytecode (usually using extension file name .wmlsc), sometimes we need to decompile the bytecode to view the source. So I wrote a tool named "DWmlsc" to do this job.
Resources about WML and WMLScript can be found at:
DWmlsc is a MFC SDI program. When a user open a .wmlsc file, this function will be called:
void CDWmlscDoc::Serialize(CArchive& ar)
if(ar.IsStoring() == FALSE)
m_codeLen = ar.GetFile()->GetLength();
m_binCodeBuf = new BYTE[m_codeLen];
if(DeCompile(m_binCodeBuf,m_codeLen,m_result_code) == false)
In the function
Serialize, I read the whole file into a buffer, and then call the core function.
bool DeCompile(BYTE *bin_code,int len,CList<CString,CString&> &result);
The output parameter
result will be used to store the de-compilation result.
The WMLScript bytecode consists of the following sections:
FunctionPool. (Refer to the WMLScript specifications please.)
DeCompile reads and parses the file into these parts:
- The information read from
ConstantPool is stored in a list
- The information of
PragmaPool is almost ignored.
- The information of
FunctionPool is stored in a list
Now, we can start to decompile the bytecode in the functions. The following code segment visit through the
unsigned int func_size;
TransCode will do the real decompiling job:
i = 0;
while(i < func.func_size)
int n = TransCode(func.CodeArray + i, i,func.arg_num);
if(n < 0)
TransCode will translate the bytecode into textual instructions.
int TransCode(BYTE *data,int addr,int arg_num)
To make the bytecode smaller, WMLScript uses the "Inline parameters" technique.
|Signature||Available Instructions||Used for|
|00XXXXXX||63||The rest of the instructions|
TransCode parses these "Inline parameter" instructions with an "
The other 63 instructions will be parsed by indexing the array:
const ins_count = sizeof(InArray)/sizeof(InArray);
if(op_code >= ins_count)
Instruction *ip = InArray + op_code;
if(ip->parser == NULL)
int n = ip->parser(data,addr,arg_num);
i = i + n - 1;
What then is "
InArray"? See this:
Instruction InArray =
JUMP_FW_W etc...are all function pointers of type
parser_t (see decompiler.h):
typedef int (* parser_t)(BYTE *data,int addr,int arg_num);
The program checks the instruction parsing function by indexing "
InArray". Simple, and very fast.
Points of Interest
Multi-byte Integer Format
In many places, the byte code uses the "Multi-byte Integer Format" to represent an integer.
A multi-byte integer consists of a series of octets, where the most significant bit is the continuation flag and the remaining seven bits are a scalar value. The continuation flag is used to indicate that an octet is not the end of the multibyte sequence. A single integer value is encoded into a sequence of N octets. The first N-1 octets have the continuation flag set to a value of one (1). The final octet in the series has a continuation flag value of zero.
The remaining seven bits in each octet are encoded in a big-endian order, e.g., the most significant bit first. The octets are arranged in a big-endian order, e.g. the most significant seven bits are transmitted first. In the situation where the initial octet has less than seven bits of value, all unused bits must be set to zero (0).
For example, the integer value 0xA0 would be encoded with the two-byte sequence 0x81 0x20. The integer value 0x60 would be encoded with the one-byte sequence 0x60.
get_mb_uint helps us to decode the "Multi-byte Integer".
unsigned int get_mb_uint(BYTE *data,int len,int &k)
unsigned int r = 0;
int i = 0;
BYTE b = data[i];
r = (r << 7) | (b & 0x7F);
if( (b & 0x80)==0 )
k = k + i + 1;
Name Translation of WMLScript Standard Libraries
WMLScript bytecode uses "lib index" and "func index" to identify which standard library function is to be called.
char * make_call_name(int lindex,int findex);
Check the lib index and function index in the internal string table, and return the result. Users can read the library and function name in the decompiled text result directly, instead of checking documents.
Currently, the DWmlsc can only decompile the bytecode into "WMLScript Assembly Language". In future, I will enhance it to decompile bytecode into WMLScript, to be a real "Decompiler" :).