Welcome to the Lounge

Joseph T. Adams13-Jan-21 5:04

Joseph T. Adams

13-Jan-21 5:04

For some kinds of very simple data, some dialect of CSV may be ok. But in 3 decades of software development I've rarely found it it be an adequate solution, much less the optimal one.

Steve Naidamast13-Jan-21 5:26

Steve Naidamast

13-Jan-21 5:26

I retired after 4 decades in software engineering in 2014.

Though I used XML extensively during my career, I found it more of a nuisance than anything else. XML and JSON merely serve to add layers of software to handle the formats, making them both rather inefficient. And both are text-based.

Similarly comma-delimited data is text based as well without all of the extra meta-data and when encrypted would produce smaller files or data-packets for transmission.

For most situations, one can use comma-delimited data in the same ways as XML with a little ingenuity and without all the extra meta-data.

Steve Naidamast
Sr. Software Engineer
Black Falcon Software, Inc.
blackfalconsoftware@outlook.com

Joseph T. Adams13-Jan-21 5:31

Joseph T. Adams

13-Jan-21 5:31

Well, the metadata is something I find incredibly useful, and this is also why I tend to prefer XML over JSON as well when I need a robust way to transfer data of more than trivial complexity.

Sr_Dogmeat13-Jan-21 4:50

Sr_Dogmeat

13-Jan-21 4:50

I use it all the time, but I also always describe my message formats in XSD. That is the easiest way to get the code generators on all platforms to properly parse one’s messages.

dandy7213-Jan-21 5:11

dandy72

13-Jan-21 5:11

Chris Maunder wrote:
as something that is not seen or edited by humans

I could've sworn when I first started reading about XML, it was being sold based on the idea that it was trivially easy for people to read and write.

Chris Maunder13-Jan-21 6:16

13-Jan-21 6:16

Ah, marketing...

cheers
Chris Maunder

dandy7213-Jan-21 8:38

dandy72

13-Jan-21 8:38

Chris Maunder wrote:
Ah AAARGH! marketing...

FTFY.

davercadman13-Jan-21 5:19

davercadman

13-Jan-21 5:19

My favorite part is when the schema doesn't work.

XML - Another solution in search of a problem.

My Professor Said

michaelbarb13-Jan-21 5:30

michaelbarb

13-Jan-21 5:30

My professor said that XML exists because Microsoft was afraid of being sued because JSON was to much like Java. Kid of like the same reason that C# exists. I think he was being sarcastic but am not totally sure. He did show that history of the two and which came first is debatable. Both have roots that run back a long, long time ago.

So many years of programming I have forgotten more languages than I know.

Chris Maunder13-Jan-21 6:15

13-Jan-21 6:15

Every big company is afraid of being sued, but I doubt that's the reason. Microsoft would more likely choose a competing solution in order to lock out a competitor. The story doesn't seem "right" but who knows.

The Microsoft of today is a very, very different company than the Microsoft of 2000. (and it makes them better and worse)

cheers
Chris Maunder

dandy7213-Jan-21 8:52

dandy72

13-Jan-21 8:52

michaelbarb wrote:
My professor said that XML exists because Microsoft was afraid of being sued because JSON was to much like Java

This doesn't hold up, even if only because it seems backwards. If Wikipedia's accurate, work on XML started in 1996, and became a "W3C recommendation" in 1998, while JSON only started showing up in the "early 2000s" (granted, with some references to work starting in 1999 - but it was still very early in its design by then).

And how is JSON in any way "like Java"? One's a data storage file format. The other's a full-blown programming language.

michaelbarb13-Jan-21 10:09

michaelbarb

13-Jan-21 10:09

As one of our exercises working in Java we were to take a Json file and convert it to routine that could be compiled in the program. It was to contain data that the program loaded. As I remember from the early 20's it was quite easy.

So many years of programming I have forgotten more languages than I know.

trønderen13-Jan-21 9:03

trønderen

13-Jan-21 9:03

michaelbarb wrote:
My professor said that XML exists because Microsoft was afraid of being sued because JSON was to much like Java.

Then your professor, by "XML exists", presumably meant "XML did not die" rather than "XML was created".

XML predates the first JSON RFC by ten years. And, XML was in use for several years before it was formally standardized.

I really do not see how Microsoft gets into this. MS certainly neither defined XML nor JSON. I never saw Microsoft as a very active promoter of XML. C# was created by MS. I am not (yet) able to find on the net any documentation of the MS/Sun controversy, but twenty years ago "everybody knew" that C# was a response to Sun not allowing MS to use Java as it wanted. (If my memory is correct, MS wanted to add language features that Sun did not approve of.) So C# is a very different story from XML/Json.

XML syntax borrows a lot from far older formats: Typesetting systems of the late 70s (maybe even older) used the same style bracketed keywords, e.g. to delimit paragraphs and specify paragraph formatting. You can see a selection of such tags e.g. in the 1982 Historical Manuals: Guide to Typesetting at the UKCC[^], at page 14-15.

In the typesetting systems I ever touched, the brackets were displayed as common brackets, but had a different internal representation, and distinct keys on the dedicated terminals. So there was no need for escaping or other special handling of the common brackets (or math smaller/greater than).

DumpsterJuice13-Jan-21 5:32

DumpsterJuice

13-Jan-21 5:32

I always hated working with XML. JSON is a god send. I had to serialize from JSON to XML for a file upload. It was mandated that way.. not my choice. Anyway I now have an XML Serializer that is stupid simple to use.

Keep It Simple, keep it moving.

rhyous13-Jan-21 5:37

rhyous

13-Jan-21 5:37

Xml wasn't originally written for web service data transfer or serialization/deserialization. It was written by the W3c to replace Html but still be Html-like. Xml is a Mark-up language, hence it has mark-up. Mark-up makes it good for readability by humans but also a standard readable by machines. Xml was then hi-jacked to be used by SOAP web services with serialization/deserialization. Then someone realized that json was better for serialization/deserialization, especially since readability by both humans and machines wasn't necessary, it only needs to be read by machines. JSON also has room for improvement in verbosity and as soon as a good replacement exists, people will say the same things: why json when new-thing is better.

XHTML

Chris Maunder13-Jan-21 6:25

13-Jan-21 6:25

That brought back XHTML nightmares...

cheers
Chris Maunder

Bruce Patin13-Jan-21 6:12

Bruce Patin

13-Jan-21 6:12

I've had to generate a file from a database, but the built-in methods didn't work for me, so I also just constructed it all by adding to a string.

Peter Adam13-Jan-21 6:18

Peter Adam

13-Jan-21 6:18

Hungary's National Tax and Customs Agency requires real-time XML invoice reporting[^] , so we do it.

It requires the schema designer to know his/her art, because xsd.exe[^] can choke on

XML

<thing>

and

XML

<otherthing>
  <thing>

type brainless design.

Other pain was that while the XML standard is happy with a default namespace, XPath requires a prefix[^].

Roger Wright13-Jan-21 9:07

Roger Wright

13-Jan-21 9:07

Do you remember what we had before XML? Talk about misery!
When I joined here, I was working at an Ace Hardware, and trying to get our in house system to integrate with the Ace Corporate online ordering system (no Internet then, direct dialup connection) required the patience of Job, along with a love of self abuse. I'm still grateful for XML!

Will Rogers never met me.

trønderen13-Jan-21 10:53

trønderen

13-Jan-21 10:53

Roger Wright wrote:
Do you remember what we had before XML?

Well ... What if I do?

I was studying ASN.1 in the very early 1980s. The scheme is mandatory; the legal constructions are always specified. Great!

It is abstract: ASN.1 specifies the logical structure of files/documents, with no concern for a specific representation or format. Great!

An ASN.1 document/file may be represented in a handful well specified, clearly identified concrete encodings - functionally 100% identical; you may read in one encoding and write back in another encoding, with no loss of information. Great!

You must have access to the scheme, which ensures a proper interpretation with no guesswork. You know what you get. Great!

A data stream not adhering to the scheme is like a transmission ruined by noise: It is worth nothing. Any non-ruined document honors the ADN.1 scheme. Unconditionally. Great!

The 'Tag' part is binary. You display it to the user e.g. by mapping it to local language terms (Since you must have access to the scheme, you have an opportunity to set up a meaningful mapping) - I worked with a handful applications providing mappings of tags to several different languages. Great!

The representation essentially being binary required the use of an ASN.1 data editor - vi wouldn't suffice. As a result, you never forgot to add the closing tag (there was no closing tag). Great!

You never misspelled a tag name, but selected from those allowed by the scheme. Great!

You never got the length wrong - that was handled by the ASN.1 editor and concrete coder. Great!

The format was space efficient (although dependent on the encoding), BER excessively so, according to some critics. (Some other concrete encodings, such as the XML encoding, could be quite wasteful, though.) Great!

The 'Value' part was completely unrestricted, a binary blob, with no need for escapes, quoting or anything resembling 'character entities'. (Or Base64, QP, AtoB/BtoA, BinHex, UUencode, or whathaveyou.) Great!

If you wanted to edit an ASN.1 document, you had to use an ASN.1 editor that made sure that the scheme was honored; the document couldn't be arbitrarily tampered with outside scheme control. Great!

I sure miss both the abstract ASN.1 side, and the BER encoding.

chrisseanhayes13-Jan-21 9:18

chrisseanhayes

13-Jan-21 9:18

IDKW people are bad-mouthing XML OR barking that JSON is the magic carpet of serialization. XML was an attempt by some fairly smart people (smarter than me) to make a standardized, text-based, human-readable serialize standard. Is it 'easily' human-readable? Most but not all the time, still useful. Can you just open it in a text editor and read it? Yes, didn't say you'd enjoy the experience but you could, can, and do. Is it a good standard? Well, people are still using it, today, it works, and 'works' gives programmers and interested parties (those with the money) this warm fuzzy feeling inside.

And consider this, you can make some REALLY simple XML, read, serialize, transmit it, and you can make some seriously complex XML, read, serialize, and transmit that. Things you couldn't imagine serializing in JSON. That's what the X stands for Extensible. Nevertheless, just like someone else said on this post, EVERYONE IS USING XML EVERY DAY, it's called the internet, it's called HTML which, really deep down, is XML that transmits some of the greatest amounts of information around the world billions and billions of times over, and it just works, not perfect, but it does.

Chris Maunder13-Jan-21 9:34

13-Jan-21 9:34

Yes, the discussion certainly drifted a little from "I am losing my mind trying to coerce a class to serialise to XML using the .NET serialisers", and the dumb hacks you do when you just need to get something done and don't actually need the magic the classes provide.

XML, to me, followed the classic arc of new-tech-to-solve-well-defined-problem, into the you-can-use-it-everywhere! right into we're-using-it-everywhere!-Even-my-cat-uses-it and then into the gutter of why-on-earth-is-it-being-used-here.

I see the same thing with AI to be honest. Amazing idea, finally hit its stride, and now you can't swing said cat without hitting half a dozen products that use AI for no reason other than to have "uses AI" in their marketing (or they really mean they use a Bayesian model or even just basic statistical analysis).

cheers
Chris Maunder

chrisseanhayes13-Jan-21 10:20

chrisseanhayes

13-Jan-21 10:20

I like your reply. If there was ever a red flag about some technology, to use or not use or abuse, it's those knee-jerk reactions to all flood one side of an argument like a holy war. Engineering is really just a bunch of tradeoffs in favor of the most 'optimum' solution at the time and available smarts. Just look what they did with a simple adjective like 'agile' and all those tussles over to normalize data or to not. Thanks for this lively discussion.

trønderen13-Jan-21 10:06

trønderen

13-Jan-21 10:06

chrisseanhayes wrote:
Is it 'easily' human-readable? Most but not all the time, still useful.

Almost twenty years ago, when XML was super-hype, I was involved in digital library projects. Everyone was praising XML as The Savior, the greatest thing since sliced bread. I went to a Digital Libraries conference: Of the first eleven papers presented, ten was making a big issue of XML adoption being crucial to their project's success...

I got XML up to here. Much because of the extreme hype, the total lack of any critical evaluation of its suitability, and stereotypical praise of such "qualities" as "human readable". You don't even need a scheme - the tags are self-documenting!

They are? I went to one of Sami speaking guys, to give me a list of Sami language terms for chapter, section, paragraph, table of contents etc., as well as some Sami text, and composed a sample Sami XML document. This I frequently used to illustrate the "readability" of XML.

(Note that the Sami culture in Norway is quite strong, and it is certainly to be expected that a digital library receives XML documents according to a scheme specifying Sami tags. Or, if your library handles XML documents of Asian origin, don't be surprised if tags contain, say, Chinese or Thai characters.)

A second example I uses involved a 'p' tag. What can we expect it to identify? A paragraph? A part number? A person reference? I could show actual examples of all three interpretations, but made up other possible uses: A point, a position, a product name, a page number ... When you see a 'p' tag, you immediately understand that it has something to do with something relating to the letter p, most likely some concept that starts with 'p' ... in some language. It doesn't have to be English. In an international world, you cannot take for granted that the scheme designer prioritizes readability for native English speakers over readability for the native speakers of the language of the document.

This project I was on was focused on long term document archival: As far as possible, a faithful reproduction of the original should be possible fifty years from now, a hundred years, or more. So, from different document format specifications, I collected no less than fourteen different parameters affecting the formatting of a paragraph. Some day many years from now, you are to interpret a document with a lot of 'avsnitt' tags. After some searching, you realize that 'avsnitt' is Norwegian for 'paragraph', but which of the fourteen formatting/layout parameters applies, and in which way, for making a faithful reproduction of this document?

The 'readability' and 'self describing' properties of XML may be for HelloWorld level examples, but for general real world, full blown applications, it is clearly nowhere close to good enough. (Or if you want: A joke.)

For simplistic, limited scope documents in a limited context, XML may be sufficient. But certainly not for a general digital document library.