This would be interpreted as a p element having one attribute attrib1 with the value value1 and a text (attrib1=value1)
Though, how can I set the text to be (attrib1=value1) without attributes. Maybe I missed something, but I don't think there is a mention that parenthesis shall be escaped, thus, how does the parser make the difference?
I admit... escaping the parenthesis would do the trick (I presume any character can be escaped as \ has to be written \\). Also, writing useless () after the element name would also do it.
My only point here is: Is it mentioned somewhere? Or... Am I wrong?
how can I set the text to be (attrib1=value1) without attributes
Both solutions you mentioned are actually supported (but it's not mentioned in the article). You can write: [p() (attrib1=value1)]
or: [p \(attrib1=value1)]
Notes:
- pXML has been renamed to Practical Data and Markup Language (PDML), and there is now a dedicated website for it.
All those formats are for computer to computer communication. So who gives a sh*t about a layout. Indentation spaces and new line characters can all be left out. Makes it even better for transmision, less to encrypt, less to transmit.
All you need is case of trouble is a program that makes the message readable for people.
It is less verbose than Xml. It has less whitespace, brackets, and equal signs than JSON.
For adoption, you (or the community) needs to write a serializer and a deserializer in the top 10 most popular languages: java, python, dotnet, javascript, etc...
Even though you don't need a shift, [] are a distance from the home keys, are typed with the pinky, ] requires a slow stretch, is one of the harder keys to type, and one of the most error-prone keys to type (high mistake rate). While <> are typed with middle and ring fingers, they require shift, but are directly under the home keys, have lower mistake rate. I would prefer <> and I can type them much faster than [].
And for the french, who have a just complaint, what if [] were the default, but there was a syntax you could put on the first line that would change the bracket type?
#BRACKETS=[] // default you can leave this out
#BRACKETS={}
#BRACKETS=<>
Just a thought. Soon you would see the community deciding which brackets are best.
needs to write a serializer and a deserializer in the top 10 most popular languages
I am currently working on:
- an open-source serializer/deserializer in Java
- an article to introduce the serializer/deserializer (to be published here on codeproject)
- a dedicated pXML website, where users can submit implementations in other languages
All is planned to be published before end of next month (May 2021).
rhyous wrote:
Even though you don't need a shift, ...
Good explanation showing why different people have different preferences when it comes to the choice of brackets.
rhyous wrote:
a syntax you could put on the first line that would change the bracket type
That's an interesting idea worth pondering about.
However, adding such a parameter would increase complexity. The rules for escaping become more complicated, readers/writers risk to become less efficient, pXML documents become less uniform, and incompatibility issues might occur in practice.
Suppose the following pXML that uses standard brackets []:
[doc
[foo This text contains \[, \], < and >]
]
A user who prefers <> would write the same code like this:
#BRACKETS=<>
<doc
<foo This text contains [, ], \< and \>>
>
Unescaping now becomes more complex for deserializers (readers), and possibly less efficient, because they now have to consider the #BRACKETS parameter.
Complexity is also increased for serializers (writers), because they have to apply escape rules that depend on the #BRACKETS parameter.
If somebody decides to change the brackets in a document, he/she must be careful to also adapt escape sequences.
To avoid confusion, one could decide that all types of brackets must always be escaped, but that would be very inconvenient for people who use a lot of brackets in their texts.
Moreover, if pXML snippets from different sources are merged (a pXML feature for later), then things get even more complicated, when different snippets use different brackets. Readers must then be able to change brackets on-the-fly.
I experienced this complexity myself a while ago when I had to create a parser with this kind of flexibility.
Unfortunately, it seems there is no one-size-fits-all solution for brackets.
The basic pXML syntax should be kept as simple as possible, because this makes it easier for people to create readers/writers. Maybe a brackets parameter could later be added as an optional extension or an experimental feature, before taking a final decision based on community feedback.
Thanks, your article hit upon an issue I have been dealing with. I really appreciate the analysis of [] verses alternatives and the thought given to making it easy to read. My ambition was to create something much simpler, just a few attributes, to markup code for UI help etc. all of which is generated. Good Job.
My assumption is that this would convert to PXML as:
[tag (A=1 B=2 C=3)[D 4][E 4]]
Which would be fine and not lossy; however, in your paper you said that the PXML parser would surface both attributes and child nodes as child nodes. So would the node object then have a type (attribute or child)? Otherwise, it would seem we'd lose the knowledge of which was which.
My assumption is that this would convert to PXML as:
[tag (A=1 B=2 C=3)[D 4][E 4]]
Yes, correct!
When a pXML document is parsed into memory, each node has a flag to indicate if it has been defined with the attributes or tag syntax. Hence the information is not lost, and it is considered when pXML is converted to XML, or vice versa.
In my next article about the parser I will show examples of this.
I will also add a note in this article to clarify this.
Sorry, I was in a hurry that morning and the formating is terrible.
I think it's easer to read the code, if it matches the old XML syntax as close as possible. So, please keep the <> braces. [] and {} are hard to type on a German QWERTZ keyboard, where <> is very easy to type.
Next, keep the attributes the old style. In order to do that, we need a separator for the start of the contents of an element. I'd pick the ':' colon character for that, but I'm open for better ideas. Using the '\' for escaping is excellent.
So the XML/XHTML
<a href="link" target="blank" >Click here > </a>
[] and {} are hard to type on a German QWERTZ keyboard
As stated already in other comments, there is unfortunately no brackets pair that is easy to type on all keyboards. Therefore I chose the pair that is easiest to type for most people in the world. Maybe a parameter could be added to the parser, so that users can chose their preferred bracket pair ([], <>, {}, or ()). However, this solution has disadvantages too, as explained in another comment[^]
Gernot Frisch wrote:
I'd pick the ':' colon character for that, but I'm open for better ideas.
':' is problematic, because it's used already as namespace separator. <foo:bar> is parsed as an empty element with name 'bar' in namespace 'foo'. So I'll continue my comment with '/' as separator (arbitrary choice).
Gernot Frisch wrote:
keep the attributes the old style. In order to do that, we need a separator for the start of the contents of an element.
Your syntax reduces verbosity for elements that have only attributes, or attributes and text:
But it makes elements with text only less user-friendly:
pXML: [div text]
your syntax: [div / text]
or: [div/text]
I compared many other examples. An important point is to consider typical markup code (e.g. XHTML):
pXML: [p This is [i italic] and [b bold]]
Your syntax: [p/This is [i/italic] and [b/bold]]
In this case the pXML code is easier to read and write.
As with brackets, there is no absolute-best-one-size-fits-all syntax. The challenge is to choose (among the infinite set of possible syntaxes) a syntax that is well suited in most cases.
In the context of pXML, it is important to have a user-friendly syntax for markup code and config data.
Moreover, it is always possible to provide optional lenient parsing for specific domains. However, lenient parsing requires look-ahead parsing (and maybe also regexes), which makes parsing more complex and less efficient. Lenient parsing also requires more rules, and sometimes there must be specific rules for specific tags. Therefore lenient parsing should not be part of basic pXML.
However it can make the syntax much more user-friendly in specific use cases. As explained in the article, I use lenient parsing in PML[^] (makes PML code succinct). Here is an example:
strict pXML: [image (source=ball.png title="Red ball")]
lenient PML 1: [image source=ball.png title="Red ball"] // parentheses not required
lenient PML 2: [image source=ball.png title=Red ball] // quotes not required,// even if value contains spaces
final PML: [image ball.png title=Red ball] // name not required for// default attribute
Your bracket choice of [ ] may be true for a QWERTY keyboard but we live here in France with an AZERTY one. It requires CTRL and ALT. I do not like to trigger this pair.. a DEL ( SUPPR on french keyboard ) is not far from reach !
How many people in the world are using an AZERTY (french) keyboard? And how many use a QWERTY (english) keyboard?
If a designer has to favor one or the other he/she will select the keyboard that most people use.
Unfortunately there is no bracket pair that is easy to type on all keyboard layouts (as can easily be verified here).
To minimize the need for typing inconvenient key combinations, it might be useful to use an editor/tool that allows you to reconfigure your keyboard, use hotkeys, predefined code snippets, auto-completion, etc.
modified 22-Apr-21 3:35am.
Last Visit: 31-Dec-99 18:00 Last Update: 23-Sep-24 18:12