|
you lost me at:
<i>foo</i>
becomes
<i foo>
because now the text content of i has become an attribute with no value. These two are not the same thing. Text is free, attributes are constrained by the DTD. If you change to say "any unrecognised attribute is the text", that's not great:
- what happens when there are multiple unknown attributes?
- what happens when some short text becomes confused with a new attribute in the future?
Your solution after this is to put attribute pairs in parentheses (), but what about if text contains parentheses? I suppose we could escape the parens in text - but you don't mention doing so, so as it stands, I'd expect undefined behavior.
XML may not be great, but I'm not really seeing the win here, sorry. XML is verbose as-is, but compresses incredibly well because of the repeated content. Size over the wire, therefor, from any modern web server implementing gzip compression, isn't an issue. Readability is not improved (personal opinion) - much like how LISP isn't "easier to read" than XML (you've got something LISP-like here, with square brackets instead of parens)
You could introduce a new node, eg t
------------------------------------------------
If you say that getting the money
is the most important thing
You will spend your life
completely wasting your time
You will be doing things
you don't like doing
In order to go on living
That is, to go on doing things
you don't like doing
Which is stupid.
|
|
|
|
|
Davyd McColl wrote: now the text content of i has become an attribute with no value
The code <i foo> in the article is just an intermediary step (neither valid XML nor valid pXML code) towards achieving the final pXML syntax: [i foo] . BTW, an "attribute with no value" would be invalid in XML and pXML.
Davyd McColl wrote: Your solution after this is to put attribute pairs in parentheses ()
Exactly. Therefore there is no ambiguity with [i foo] in pXML. foo is clearly text, and surely not an "attribute with no value".
Davyd McColl wrote: but what about if text contains parentheses?
That's a good question. I forgot to mention it in the article.
Suppose that the text content of a node named foo is (a=b) .
Then [foo (a=b)] doesn't work because this is the syntax for assigning b to attribute a .
There are two solutions:
1. Write [foo() (a=b)] to make it clear that there are no attributes
2. Escape the ( with \( , like this: [foo \(a=b)]
Both methods are already implemented in the pXML parser (to be published next month).
I will update the article in the coming days to mention this edge case.
Davyd McColl wrote: XML is verbose as-is, but compresses incredibly well
The goal of the pXML syntax is to make it more human-friendly: easier to read and write for humans. And because pXML is less verbose, it will probably also produce smaller compressed sizes than XML.
|
|
|
|
|
Quote: The code in the article is just an intermediary step (neither valid XML nor valid pXML code) towards achieving the final pXML syntax: [i foo]. BTW, an "attribute with no value" would be invalid in XML and pXML.
- incorrect: xml attributes may be empty (see Can an XML attribute be the empty string? - Stack Overflow as well as asking anyone who has written directives / custom attributes for front-end frameworks.
- the intermediate presentation still suffers the same problem. The syntax change from angle brackets to square brackets doesn't fix the inherent flaw.
Quote: The goal of the pXML syntax is to make it more human-friendly:
With the requirement to step carefully around empty attributes & the inability to match the closing tag from a large node to the parent by type (sure, you can use an editor to match brackets, but if you have a giant node, eg "<customer-data>" and it ends off-screen at "</customer-data>", I don't think you're necessarily achieving your stated goal.
If this works for the set of problems you have to deal with, great! I don't find this more readable and I'm definitely not introducing a new parser to working code.
The power of existing formats comes largely from how easy it is to consume them. XML and JSON (and now even YAML) parsers are a dime a dozen and found for free in practically every programming environment. In addition, they're formats that I can confidently hand off to a third party to deal with. To disrupt that, you're going to need to provide such an astounding edge as to make it impossible to refuse your format.
And I'm simply not convinced.
Again, I don't want to be "that guy". If this tool works for the tasks you have lined up for it and you don't have to share with anyone else and you don't need support on a plethora of programming environments, then I wish you all the best, friend (:
------------------------------------------------
If you say that getting the money
is the most important thing
You will spend your life
completely wasting your time
You will be doing things
you don't like doing
In order to go on living
That is, to go on doing things
you don't like doing
Which is stupid.
|
|
|
|
|
Davyd McColl wrote: incorrect: xml attributes may be empty
In your first comment you spoke about "an attribute with no value", referring to the syntax <i foo> .
I replied that such an "attribute with no value" would be invalid in XML and pXML. Which is true (<i foo> generates an error in an XML validator).
However, now you are talking about xml attributes that "may be empty" (e.g. <i foo="" /> ). Of course attributes can be empty (in XML and pXML) by assigning an empty string. But these are two different cases, unless I totally misunderstand your point.
Davyd McColl wrote: I wish you all the best, friend (:
Thank you.
|
|
|
|
|
I think the first mistake is equating HTML to XML. Only the out-of-fashion XHTML requires strict adherence to XML.
HTML requires no self-closing tags: <br> comes to mind, or if its use offends you, <link> or <input>. Also, it is extremely common to use “disabled” as an attribute with no value.
These are not exotic examples. I like the idea of a simpler syntax to HTML, but pXML seems to only fit a particular use case very well, and others less so. In contrast, XML fits many more uses all equally “well”.
|
|
|
|
|
Andre_Prellwitz wrote: I think the first mistake is equating HTML to XML
When I refer to "HTML" in the article, I mean of course XHTML (because this article is about XML syntax), but I should indeed have been more explicit (e.g. writing "XML/XHTML", instead of "XML/HTML"). As far as I know, all modern popular browsers support XHTML syntax, so pXML could be used to create web pages with a pXML-to-XML converter.
Andre_Prellwitz wrote: pXML seems to only fit a particular use case very well, and others less so. In contrast, XML fits many more uses all equally “well”
Sorry, I have to disagree, unless I misunderstand your point. Could you please show an example of XML code that cannot be written with the pXML syntax?
|
|
|
|
|
The last example of the config file example is, to my eyes, missing a ) between green" and ] .
[config
[size XL]
[colors (background=black foreground="light green"]
[transparent true]
] Cheers,
Peter
Software rusts. Simon Stephenson, ca 1994. So does this signature. me, 2012
|
|
|
|
|
Well spotted! Thanks a lot.
It's now fixed.
|
|
|
|
|
The two links to pml-lang.dev in pXML Predecessor are not working.
You say "element names within metadata don't need to be prefixed with #", however
I would instead lean towards saying they should in fact be mandatory.
The biggest obstacle for adoption is getting browsers to support the format natively.
Open source converters are a good first step and I'd recommend adding a task to
rosettacode, as long as you can keep it reasonably simple, with examples in both
Java and PPL. Converters in C++ JavaScript and Python are probably key to widespread
adoption, along with a formal and complete test/verification set. Like any good idea,
it would need ramming down people's throats.
Before you get ahead of yourself, similar replacements for CSS, XQuery, XPath, XML and
JSON Schema, all native not "install x convert to y then z", well, you see the problem.
I think I have spotted a potential achilles heel: in html+js, pretty much the only thing you need to look out for is (say)
k = src.indexOf("<\script>"); ==> k = src.indexOf("<\" + "script>");
However with [script ...] you are well and truly hosed (escape every single ] in js, no thanks). Therefore I suggest a more hybrid approach, with <script></script> and similar still valid, in cases that need it.
While somewhat moot, the xml <phones><phone>123</phone><phone>456</phone></phones> equates to the invalid json "phones": {"phone": "123", "phone": "456"}, either that or the given vice-versa of
"phones": [ "123", "456"] to <phones><phone>123</phone><phone>456</phone></phones> ain't quite right.
Pete Lomax
modified 16-Mar-21 8:48am.
|
|
|
|
|
Glad you liked the article. Thank you.
Pete Lomax Member 10664505 wrote: The two links to pml-lang.dev in pXML Predecessor are not working.
I just tried all the 6 links in chapter 'pXML Predecessor', and they all worked. Maybe a server was down when you tried. If you still encounter problems then could you tell me please which two links don't work.
Pete Lomax Member 10664505 wrote: I would instead lean towards saying they should in fact be mandatory.
Interesting point. Could you please explain why you think the # should be mandatory in child elements?
Pete Lomax Member 10664505 wrote: I'd recommend adding a task to
rosettacode
Thanks for the tip
Pete Lomax Member 10664505 wrote: the given vice-versa of
"phones": [ "123", "456"] to <phones><phone>123<phone>456 ain't quite right
True. But XML doesn't have native arrays (or lists), unlike JSON.
|
|
|
|
|
links: ok, seems my isp (O2) is probably blocking that domain.
#mandatory: consistency - should you accidentally clip any ] after [#GUI_data it will treat [content] as an attribute, breaking "No confusion" and probably eventually resulting in an error much further away from where it c/should be.
I also corrected/encoded <script></script> in my original post, it should make more sense now.
Pete Lomax
modified 16-Mar-21 8:56am.
|
|
|
|
|
Pete Lomax Member 10664505 wrote: accidentally clip any ] after [#GUI_data
In that case the document becomes invalid, and the parser reports an error. So there would be no risk of accidentally turning content into metadata.
However a problem could arise if a child-node of metadata (without # ) is copy-pasted into another place, and the user forgets to add the # . That risk could indeed be avoided by making the # mandatory.
Also, in case of big metadata elements with lots of child-elements, looking at a child-element without # in the middle or end of the metadata-tree makes it less obvious to the human eye that it's looking at metadata.
So, maybe it would be better to make the # in child nodes mandatory, as you suggested. It could still be made optional with a parser flag.
Pete Lomax Member 10664505 wrote: eventually resulting in an error much further away
That's true, and it can be annoying, especially in big documents. As said in the article, this can be avoided in two ways:
1. Use the more verbose closing tag syntax ][/tag] for big nodes.
2. Quote: "Note, however, that this problem can be largely mitigated when elements are indented, and the parser emits a warning if the indentation of the opening [ and closing ] are different."
|
|
|
|
|
From practice, I've seen that the reason why you want to keep your child data as elements and not attributes is because child data is bound to be restructured. For instance, today the data may be available as a 1:1 relationship, but that doesn't mean it can't change. The data could very well have a 1:many relationship, and thus keeping it as an attribute is impossible.
In my own practice using XML as a structured data file has been to default all child data as attributes unless there is more data structure (i.e., more children), and then to change that structure into elements only when needed to support the structure. I avoid using the PC DATA of an element to hold data and instead prefer to use an attribute, as it is more defined. In the future, if the nature of that child data changes, you can just revision the XSD and let your subscribers know that there is a new revision of the XSD available.
After working with JSON Schema, I find that it is a very good practice, but I have yet to see it gain wider acceptance. Instead, I can very quickly determine the level of competence of a development team that insists on using JSON as an exchange format without also insisting on validation with JSON Schema.
The other benefit I've found from this structure is that with verbose starting and ending tags, it is fairly obvious when the XML is malformed, even without white space and indentation. It's a lot more difficult to spot this in JSON.
Another issue I've found with JSON is the range of supported encodings. By default, JSON is UTF-8, while almost any encoding can be used for XML as long as it is specified in the declaration.
I've used YML in the past and found the same problem that you did...with larger data structures, it was difficult to follow the data structure because it relies so heavily on white space and indentation.
I will take a deeper look into the pXML that you are proposing. I'm not likely to change my current practice, mostly because it is practical and succinct enough as it is and once compressed, the XML and JSON files are practically the same size. And XML Schema is very mature and robust in just about every development language.
|
|
|
|
|
Pragmatic and very nice approach, thanks for sharing.
|
|
|
|
|
|
I love the idea of what you're proposing. I just hate the idea of what Microsoft, Google, and all of the other big vendors would turn it into - assuming you can get their attention at all.
For all its faults, XML has given us a fantastic structured data format with excellent tools (XPath, XQuery, XSLT) for data retrieval and transformation.
JSON is nice for when you need to pass simple data structures to a user interface, especially if that UI is written in Javascript.
YAML gives us nice whitespace issues and a very awkward place for storing commands.
SignalR and Proto have given us high-performance wire transfers for making RPC calls performant and scalable.
XML could certainly stand to be improved - I don't see why we couldn't have something like <item Toothbrush /> where the schema would define that there's a string defined inside item . Maybe a mashup of Relax NG is in order.
Another useful mashup might be introducing DolDoc, but I don't think the web kids are ready for that.
|
|
|
|
|
I never heard about 'DolDoc'. Will have a look at it. Thanks for sharing.
|
|
|
|
|
I change my previous opinion because I think I read your article too quickly !
I focused on the differences of pXML with JSON, and not on HTML, where indeed everything is in character string ...
Sorry.
modified 11-Mar-21 3:44am.
|
|
|
|
|
DidierO wrote: The format that you offer does not guarantee the restitution of the original data
I'm sorry, but I don't understand what you mean. Could you give us an example please, and explain exactly what you mean by "does not guarantee the restitution of the original data".
DidierO wrote: no guarantee on the type
All values in XML documents are just strings. That's how XML works, and therefore the same is true in pXML. There is no native way in standard XML to specify 'types'. You can add 'type information' with metadata, and you can define XML schemas to validate string values. But it's not like in JSON, where native values can be strings, integers, boolean, null. It seems that you are not aware of the fundamental basic differences between XML and other formats.
DidierO wrote: loss of white characters at the ends of the values
That's simply not true, unless I totally misunderstand your point. If you write [name foo ] in pXML, then the trailing space after "foo" is part of the value of name . Please provide an example if this is not what you are talking about.
I honestly think that your vote is totally unjustified (because your arguments are wrong). You might consider reevaluating your arguments and vote.
|
|
|
|
|
ChristianNeumanns wrote: You can add 'type information' with metadata
Isn't that true for any format when you serialize for storage?
|
|
|
|
|
Jörgen Andersson wrote: Isn't that true for any format when you serialize for storage?
Yes, true for most formats.
My intention is to (later) add types as an optional extension to pXML. Besides predefined types like boolean, number variations, date, time, list, map, etc. it must be easy for a user to add customized types. I have a very concrete idea about how to do that (without changing pXML's syntax), and I might publish a "Suggestion for types in pXML" article in the future, and consider feedback from the community.
|
|
|
|
|
DidierO wrote: Sorry.
No problem. Glad you changed your mind. Cheers.
|
|
|
|
|
Most large documents are created by WYSIWYG editors. Style is preset by the developers of the editor and difficult to change. The current solution is Cascading Style Sheets (CSS). These can easily become a maintenance nightmare. What is needed is named blocks — sort of like subroutines in code. A syntax is needed to define a name and its pXML code block, both with and without the use of an external style file.
An important feature to simplify maintenance is to prevent redefinition of a block name using different code within a document. This a prevents block named StyleFoo from being redefined in a sub-sub-document and screwing up the formatting from that point on. This problem often arises when multiple documents become merged into a larger document, such as short stories in an anthology or as chapters into a user manual.
In my experience, the designers of XML documents design a style sheet which they know and understand and use very effectively. Years later, maintenance must modify the document, but the time to understand the style sheets is not available, so the maintainers use local formatting for the modifications. When the style sheets change, such as happens when two companies merge or the company's graphics change, the document becomes an instant mess. I have never seen management budget for the time required to fix these document issues.
__________________
Lord, grant me the serenity to accept that there are some things I just can’t keep up with, the determination to keep up with the things I must keep up with, and the wisdom to find a good RSS feed from someone who keeps up with what I’d like to, but just don’t have the damn bandwidth to handle right now.
© 2009, Rex Hammock
|
|
|
|
|
Jalapeno Bob wrote: What is needed is named blocks — sort of like subroutines in code.
Could you please provide an example of such a 'subroutine' (maybe pseudo-code), and explain its benefits. Thank you.
|
|
|
|
|
Cascading Style Sheets is a good example. They were not mentioned in the description of the proposed syntax.
The problem with CSS is that styles can be redefined, causing the document to screw up after the redefinition. For a style that is used only occasionally, finding the redefinition can be time consuming and management never allocates sufficient (if any ) time document modification.
I suggest that if a named block definition is repeated identically, a warning should be displayed. If the definition differs, an error should be displayed and the original definition should be retained.
I have seen a hierarchy of CSS files redefine the style for the same element — usually <title> or <hn> and, of course, various table elements — multiple times. Of course, changing a CSS file to fix one document may well break another document that relies on the same file.
Disclaimer:I am a software developer and maintainer who uses xml codes in documentation. I do not have the time to study the chain of CSS files used by existing documents that I have to modify. I am not, by any stretch of the imagination, an expert in xml document tags.
__________________
Lord, grant me the serenity to accept that there are some things I just can’t keep up with, the determination to keep up with the things I must keep up with, and the wisdom to find a good RSS feed from someone who keeps up with what I’d like to, but just don’t have the damn bandwidth to handle right now.
© 2009, Rex Hammock
modified 12-Mar-21 20:34pm.
|
|
|
|
|