Download Robin.zip - 290.21 KB
Web space for Robin Project
Robin at Sourceforge.net (Please check svn for newest update)
Preface<o:p>
Robin is a kind of bird, nothing special
at all.<o:p>
There was an algorithm named Round-Robin
which is widely used in designing Operating System.<o:p>
I take this word from the Round-Robin idea,
as I assume this kind of bird (I have never seen this bird
actually) has some periodical habit, maybe sings at 6:00am everyday... I truly
can not find other explanation
of Round-Robin algorithm, and please forgive me if I got bad guess...<o:p>
The reality is too complicated; only way
to build simulation of the reality is to make it regulated, such as the
Robin bird singing at 6:00am everyday. From this point of view, things are
simplified reasonable and then you
got a very beautiful view of the massed-up world - then you may find that it
has its perfect regularity; it's quite
suppressing and even out of your imaginary.<o:p>
And then, you see back to yourself... you
understand that complication is just illusion... for example, you may
find it hard to understand why human beings have the ability of learning,
speaking and thinking, until one day,
you build a machine which has same ability (Turing machine), then you
understand why and you understand
yourself... for some point of view, not all point of views, we're some kind of
machines... but no worry, we
still have something plus...<o:p>
<o:p>
Real Parallel<o:p>
Since this was a long way, there were many
forks. <o:p>
I was always trying to make the real
Artificial Neuron Network which is not simply nodes, threshold, and feedbacks.<o:p>
To be honest, I know few about this
classic ANN theory. Because I do not feel it has any means to solve the
language problem. It may be used to recognize the human face, or
controlling the body movement, I can not
see it is possible to make the ANN system learning human languages.<o:p>
So I'm working on other ways to try to
solve the problem with other visions which I will talk about later. I did
try to simulate the whole brain of neuron cells communication and even
living status which currently I can even
not think that is possible. But yes, I did this and failed. <o:p>
There was a big problem when I was trying
to simulate the real cells: I can not make them working as they
were in the real world. Why? In the real world, they work side by side,
they each has own time line. I can
not truly simulate them with the method of threads (threading
programming), I can not accept too much
unpredicted matters because it is already too complicated. I have no choice
other than trying make them as
Robins, while I can not accept they sing one another... that's too
conflict.<o:p>
Until one day, 2 years ago, I found the
solution, I called this True Paralleling, and this is the previous
implementation of Robin framework. Simply saying, parallel working has
conditions: if changing same value
within same period does not matter, these running bodies which are
assumed to be running in parallel are
qualified to be parallel.<o:p>
Based on this method, I found it is quite
possible to write a compiler which is able to simulate human language
processing with human brain which is composed with neuron cells and all
cells are running in parallel. If
simulating human neuron cells brings up too many factors, why not
reassemble the compiler with part of
human neuron ability to make better?<o:p>
Actually, this idea returns the beginning
point, trying to let the computer recognize human language, if this is
still not near target, let the computer recognize all computer language
easier and let the computer recognize
human language-like computer language, which means, runtime designable
language or super parser<o:p>
Parsers and Machine
Translation<o:p>
This is where Robin project came from.<o:p>
If you know compilation, you know
compilers. Very basically, modern compilers (since FORTRAN) are
made of three parts, the Tokenizer, the Parser and the Code Generator.
Many people combine the
Tokenizer and the Parser together to be Parser that makes sense as well.
They do some similar jobs.<o:p>
Why similar has to be two? No, it is only
technical solution to the hard problem. In most advanced languages
(BASIC or C or something), which are in fact some kind of English
variation, has the natured that the
words (identifiers) are separated by space chars, or ASCII(32). A
tokenizer can be easily introduced to
separate words into TOKENs which are simpler to be used by Parser level.<o:p>
Human languages do not all follow this
manner. For example, you don't need to separate two words of
Chinese characters when you read Chinese sentence. Please do not say that
you will not code in Chinese, I
do not code in Chinese as well, but does parser have this ability to
separate the Chinese words?<o:p>
Modern parsers can not so far as I know.
Something special can do this; the technology is called NLP,
Natural Language Processing. NLP parsers are basically super parsers.
They recognize characters and tag
sentences into syntax structures. But however, this is some kind of NP
(very hard problem). If all these NLP
parsers can work at our expectation, we have already seen products in the
market place. <o:p>
This is not the fault of researchers, this
problem is too hard and it is the same level as Machine Translation. (If
the NLP problem is solved, you will see many Machine Translation products
that can work as human
translators, however, unfortunately, I did not see this kind of product
for many years, and many times, I got
joke-like translation result, very funny.)<o:p>
So we just say NLP is more complicated
parser which uses rules as common parsers to recognize characters
and words and syntax.<o:p>
Modern parsers follow rules to recognize
syntax and generate result trees. The rules are in the form of BNF,
Backus-Naur Form. There are two kinds of parsing orders: from bottom to
top (LR parsing) and from top
to bottom (LL parsing). For most situations, we use LALR parsing which
means "from bottom to top parsing
with looking ahead of one token. Why? it is very hard to determine why
rule to use if not giving more
information.<o:p>
Parsers are too complicated, modern
parsers are actually built with programs, saying generated by
programs. These generators are call compiler-compilers. You may find such
Yacc or GNU Bison as instance.
Almost every compiler is build with tools like Yacc or Bison, whatever
C/C++ or BASIC. <o:p>
Hard Rock<o:p>
Modern parsers are programs generated by
programs. This means, the parsers can not change after their
own compilation, which also means, the language that the parser can parse
has no chance to chance unless
the parser changes, which also means, when the parser is compiled, the
language is fixed and can not add or
remove or modify any part of the language. Then you see C program is C
program, BASIC is BASIC.<o:p>
What's the problem? <o:p>
No problem, if you don't want to write
BASIC code in .C file. As if you are not hungry, food means
nothing to you. <o:p>
But what if you're hungry?<o:p>
What if you want to add new grammars to
the current parser and you are not the writer of the parser,
what if you want to change your currently working programming language either
in compilation time or run time?<o:p>
For example, you would have to do this:
(C# code) <o:p>
class Door
{
void Open()
{
}
void Main()
{
Door door = new Door();
door.Open();
}
}<o:p>
When you want to do this:<o:p>
class Door
{
void Open()
{
}
void <st1:place w:st="on">Main()
{
door is an instance of Door;
Open door;
}
}<o:p>
You're not allowed by C# parser, and even
C# inventor.<o:p>
No ability of changing the language rules,
no ability to take any step to make human understand human better
not even be able to talk about making computer understand human better. <o:p>
Is it possible to make parsers to
dynamically learn and change syntax rules?<o:p>
The answer is theoretically YES. And in
the reality, that means too complicated to make it. The writer of the
compilation books used the word "too complicated" to describe this
issue. I did not fully understand what
he referred to, however, until I started programming a real parser many years
ago, I saw the problem. It was
very near to impossible. Why? Because our knowledge of human language and human
brain (neurology)
was quite not enough to complete
this task by the time the book was written. That was about 40 or 50 years ago,
and during the past
decades, there was no significant advance in this science and technology field.
We're still using Yacc, maybe
a better Yacc to generate code of parser, or even write the parser by hand.<o:p>
Coding by hand, that is still viable way
than no way at all. However, things change while time elapse;
now we have idea to solve the problem better.<o:p>
Robin theory<o:p>
Robin starts new path to solve the
problem, what problem?<o:p>
The problem that we can not write program
in the form that we like which is more human-like but not be able to be
understood by parsers; The problem that a modern parser has no way to ascend to
a human language parser.<o:p>
Why it had to be generated? Because of
complication. Why complicated? Because not having fully understanding of
the human brain? Do we fully understand now?<o:p>
No, but much better than ever. We know
neuron encoding, we know sequence matching, we know neuron cooperation,
we know main and micro loops (and more that are not written on the books which
are current only the knowledge
of the boy).<o:p>
Actually, Robin is proof of this
hypothesis.<o:p>
How does Robin work?<o:p>
Robin uses BNF as rules (in XML form),
Robin generates result trees. But Robin is different...<o:p>
Robin assumes that patterns are recognized
by artificial neurons: Neurons process signals which are sent from main loop
and
micro loops. When a neuron accepts a signal, it reports to next neuron which is
connected. The most important thing is that
all neuron are working simultaneously (side by side), which makes all pattern
matching working simultaneously.<o:p>
This means, for a given input signal, the
Robin will response as much as possible in every of its parts that can accept
this
signal. A certain signal may have different "meanings" in various
situation and different patterns; Robin gets them all instead
of selecting "the correct choice" which is actually not possible if
you do not look ahead.<o:p>
It is not allowed to look ahead in Robin.
Yes, no look ahead and signal by signal (a character is a signal by this
meaning), this
is LR(0) parser. Every language (computer language) is LR(0) language, but not
every parser is LR(0) parser. Parsers have
to look ahead to know which is correct pattern to match, and actually no common
parser can be real LR(0). <o:p>
Robin is true LR(0) parser since it does
not need to do any look ahead action. However the cost is, Robin does not care
about which is correct pattern. When you do a deep research on any language,
you would see that trying to find the correct
pattern is not possible or even not good at all. Why? Because you may get bad
guess and that's very common issue.<o:p>
The how does Robin get correct pattern?
Robin just does not get correct pattern at parsing time.<o:p>
Robin records every possible pattern, in
the form of trees. What's more important, these artificial neurons (implemented
as
Matchers) send signals to micro loops to inform other neurons. This forms a
chain of reaction. And as result, you would get
the Reducing like action comparing with the classical parser.<o:p>
Robin's rule is actually Reduce-first
rule. Reduce-first rule means searching for every possible meaning of the same
signal
before looking at next signal. Reduce-first rule is actually reflection of
micro-loops in neurology. <o:p>
Then how to deal with shifting? or saying
how to move to next signal?
Shifting action is scheduled and will be performed at the end of all patterns
finished their jobs. This is the idea of Robin or
Real Parallel: these patterns can work side by side only if it does not matter
in order. <o:p>
This is like time is zoomed-in, in very
sub level of time; pattern matching works in order, but this order never
generates bad
result at the given circumstance. We get them running in parallel.<o:p>
During the signal matching and
re-generating process, we record every useful information to build trees
dynamical.
And finally we got a set of trees at the end of input signal serial. That is
all possible understanding of the given
input serial (input text).<o:p>
Final results are generated after the end
of input, this means, there would be no stable result generated
while reading in signals (characters). This behavior is very similar to LL(k)
parsing which is the other
very hard problem and only LL(1) parsing is solved well for now. Robin takes a
work-around, no need for
LL(k) problem solution, it is solution to LL(k) problem already.<o:p>
Now you see what I mean, and you may guess
what the next step is.<o:p>
The next step is to trim the trees to
eliminate inter-operation steps.<o:p>
If in the end, you got a single tree on
every level (no level with more than one trees), this means, you got
unique meaning of this input text, no ambiguous at all. Otherwise, some problem
in BNF definition that you
describe the language, or maybe the input text has double meanings.(there is
situation that not these two
reasons, like some special dangling, this means Robin has problem in
implementation, that's something
should be improved).<o:p>
Expression and Prefixing<o:p>
A simple Robin framework has two simple
native rules: 1. Reduce-First, 2. Late-Tree-Trimming.<o:p>
This is not enough for more complicated
situations, such as parsing an expression.<o:p>
Why parsing expression is so important?<o:p>
Actually, human language is not
complicated as computer languages in some manner.<o:p>
For example, most complicated human
sentence has no more braces "{}" than three levels, which means
the sentence is in plain and not in nested form very deeply.<o:p>
Human can not accept this kind of nesting
even in long articles. However, this kind of code is very common for
programmers.<o:p>
Nesting also has corresponding mechanism
in the human neurology. That's prefixing.
Prefixing means a set of signals that are always sent before the main signal.
They form prefix to that signal,
to make a meaning scope.<o:p>
Assuming that you're blind and how can you
read this expression easily?
(((1+2)*3+(4+5)*6)/7)
It is quite hard to match the parenthesis even you have eyes focusing on it,
and also hard to ensure there is no mistake at all.<o:p>
Classical parsers solve this problem by
introducing symbol stacks. This solution actually breaks up parsers and
tokenizes
since tokenizes do not require stacks at all. This solution also limited the
parsers to have it select the correct patterns
which is not theoretically possible (in the reality, this means, no way to make
sure whether it is good or bad guess,
however, you can just guess an answer).<o:p>
Robin solves this problem by using
prefixes of signals.
Prefixing is very similar as stacks but no need to make sure good guess.
All current working matchers can not work with new prefixed signal, and new
matchers should be generated for the prefixed
signals. And this actually means, baking up the whole parser (where is
necessary) until the prefixing stage is over.
When adding new prefix, a new level of baking up starts right away. You will
see how this work if you just type a complicated
expression into Robin input text field. Words can not be better than experience
of your own.<o:p>
Dynamically generating of Matchers is
simulation of neuron pattern duplicating. Along the main loop, there are many
copies of
same pattern in the form of neuron cell sequence. For example, there may be 10
or more sequences to match "ABC".
You can not feel this because all sequences generates the same result signal
ABC (this is token, not chars). Neurons would
encode "ABC" into a single signal. However this how-to is still
unknown yet. We just do this simulation with computer
implementation, and no need to care about how neuron works.<o:p>
<o:p>
Robin implementation<o:p>
The implementation of Robin is quite
tricky. Frankly, it is coded rather than designed.
There is no way to design something that I myself even can not fully understand
yet. I have to try many times
on a single condition to find out what's the problem and then think out what's
the solution.<o:p>
By introducing real paralleling
technology, you will see many levels of for loops. Too many levels, some even
has 8 levels.<o:p>
Why so many levels? To synchronize
patterns and micro-loops, to ensure a signal can traverse to every pattern to
avoid pattern
selection, and finally, to make sure trees are correctly trimmed. Besides,
there are also many recursive methods, most about
trimming.<o:p>
Since no other explanation is better than
the code itself - code is self-explained,
please read code for design and implementation details.<o:p>
Conclusion<o:p>
10 days of coding work is done to complete
this period. I feel much better solving this problem than having it as heavy
load for past 14 years. This is good news for me.<o:p>
Robin is still at its starting stage. I
would like to give GPL or LGPL license to this project, so that more people
can join and help developing.<o:p>
Robin currently needs to confirm with
programming languages such as BASIC/C or more by given BNF and examine
its output trees. An language converter can be made for programming languages.
A well defined programming language
can also use this ability to provide compilation time syntax extending (like
the example of doors).<o:p>
Then the following step, we try to capture
micro-loop signals and even main-loop signals to make dynamic pattern learning.<o:p>
This way we will make Robin a great
ability to learn language. The mechanism is still complicated which I will talk
about in future. Just to inform that this is quite possible and near to target.<o:p>
This means, Robin will be not only a super
parser... <o:p>
The next step, we not capture any trees,
we redirect output signal sequence back to input entry.<o:p>
We keep learning ability in runtime and
not in compilation time. This way, Robin captures its own output
signal sequence which is result of input or even no input sequence, it process
the result sequence again
and again... this is primitive implementation of a real think machine... I will
not give proof here, just direction.<o:p>
No proof is better than implementation,
however, that just something which will happen in future and not now.<o:p>
<o:p>
By Yilin (yyl_20020115@yahoo.com.cn)<o:p>
<o:p>
[20080317-01:13]<o:p>
<o:p>