|Text representation does not completely evade endianism - at least not with UTF-16!
If you consider UTF-8 an alternative: The multibyte encoding of a code point is just a compression method for an integer. You can use that for the Tag and Length fields as well - that could save you a few bytes when tags are few and lenghts short, and it solves endianness equally well for 32 bit tags and lengths as it does for text files. I have been considering this solution, but there hasn't been any need for it yet.
You obviously still have an endiannes-issue if the value field contains any binary numeric value at all (including UTF-16 characters). A large group of application/data formats are mainly targeted at user environments where CPUs of one given endianness is dominant. Defining that as The byte order for your format, and clearly indicate to those readers / writers in the opposite endianness that they have to flip bytes (some CPUs have special instructions for that!) is, in my opinion, a far better solution than converting everything to text.
Text doesn't solve all format problems either, unless you define one of many alternate formats as The format (analogous to defining the endianness of the format). How do you represnent dates? 05/19 is unambiguous (but must be converted to e.g. ISO standard before presenting to a Norwegian user). A week ago, 05/12, is ambiguous unless the representation is explicitly defined. Time: AM/PM is virtually unknonwn in many languages/cultures. Numerics: Is 1,500 one and a half, or fifteen hundred?
Text: How do you represent characters beyond ASCII? 8859-1? 8859-x, with x specified in metadata? UTF-16? UTF-8? Maybe you will stick to ASCII and use QP, or Base64? HTML charcter entities? (named, # or either?) Backslash escapes? (hex, decimal, octal, or any of those?) URL percent-encoding? Which characters do not need to be escaped? How is newline and end of string represented - is NUL accepted as a fill byte, in accordance with ISO standards?
And so on and so on. Text representation certainly doesn't solve all problems. (I'd say that binary encoding solves more!)
In the days when I was working with ASN.1 and BER, a BER string had to be inspected using a BER reader (which should have access to the ASN.1 to provide symbolic names). The readability was a lot better than with XML! When I went from BER to XML, I was considering making a similar XML reader to make it readable; I never got around to do that.
Today, most systems for displaying plain text have some facilities for improving readability, starting with collapsing inner structures, then highlighting of tags, and so on. You could say that such functions illustrate that the plain text format is not good enough. If I need a display tool that parses and transforms XML or whatever into something readable, it might as well transform some TLV format into something readable.
There is one issue that still remains, though: How self-describing the file should be. TLV tags are usually opaque, just some integer number. When you see an XML "p" tag, you know that it may have to do with a person, a product, a paragaph or something associated with the "p" (usually as the initial letter). At one presentation of handling of arbitrary XML documents, I had a sami colleague give me Northern Sami terms for chapter, section, picture and so on, for me to use in the examples: The tags were just for illustration (something like Ipsum lorem), but for the audience to realate to this as a document was difficult
I made one TLF format a few years ago: The file contained zero or more tag name tables, providing symbolic tags for presentation purposes; each table was headed by a language code. For simplicity, in that format, tags were unique. If partial structures could have had "locally defined" tags (as allowed e.g. in ASN.1), a more complex scheme would be required, easily growing into a complete scheme representation. In this case, that would be overkill; global tags was far easier and fully acceptable.
Such issues to not arise at all with textual tags; they are at least at some level self-describing. An they rise issues of e.g. case significance, allowed character set, and a bunch of other issues that a numeric tag evades.
When ASN.1/BER was in war with other alternatives, the lack of symbolic tag names in BER, mandating the receiver to have access to the ASN.1 scheme for interpretation, was one of the strongest critisisms of BER (/DER/CER). Later, we got XML and JSON encoding rules, encoding symbolic names from the ASN.1 scheme into the stream, but this was only a half-way solution: Matching (and keeping in synchronization) an ASN.1 scheme to an XML scheme is, for all practical purposes impossible, certainly over time. So it mostly served as to poor mans BER reader
I see a lot of areas where computer guys are rather unwilling to seriously assess the commonly used solutions, asking critically if they really are the best. Textual encoding is one of those. We use it because that's the way we do it. Because textual encoding is there, not because it came out with the highest evaluation score. Sure, it is there, we have to accept it when exchanging data with others. But in "local" contexts (such as private files for an application), I tend to use other alternatives.