|
.
Seems to me I've come across notes on encoding issue in these forums.
I've seen editors unable to load a tdl related file, or load with two strange initial characters, sometimes only, sometimes as well.
If I change the 'utf-16' to 'utf-8' or ansi or something, and change the encoding, the files load into the editors correctly.
(I use Notepad++.)
Could someone explain the issues involved as to how they impact TDL use, including stylesheet development, such that I can cut and paste it into the wiki?
e.g. What's going on, and what to do about it / how to work around it. (Including any necessary reverses to work it back in.)
- presumably one way is to choose editors that understand {what} exactly?
- if a change is made to utf-8 or whatever, does it ever need to be changed back (lest something elsewhere become unhappy, later)?
.
|
|
|
|
|
_BS_ wrote: load with two strange initial characters, These are called a 'BOM' are are Unicode identifying characters.
By default TDL encodes tasklists as Unicode to accept multi-lingual characters.
The 'File > Save As' dialog can be used to turn off Unicode-ness.
ps. I've not had a problem loading Unicode tasklists into Notepad++.
|
|
|
|
|
.
.dan.g. wrote: p.s. I've not had a problem loading Unicode tasklists into Notepad++.
Sorry didn't mean to imply it in particular was.
Merely, so many tools, so many days, I have seen this in various apps.
Sorry, but "By default TDL encodes tasklists as Unicode to accept multi-lingual characters." is meaningless. Like saying I wrote Shazam in Arial (instead of Times New Roman). Still doesn't tell me what Shazam means.
A wiki bit would be good - when a file fails to load ("usually older code editors" ?), 'this' is the problem, 'this' is what you can do about it. When done beating on the file, you may need to do 'that' to get it back over 'there'. ('Lest your changes be misinterpreted at next load, or overwritten.' ?)
.
|
|
|
|
|
_BS_ wrote: is meaningless It means that you can combine Arabic, Asian, Cyrillic and English languages in the same tasklist.
|
|
|
|
|
I get that. What I mean is, the ramifications of that, or what to do with it, per the OP question / requested wiki article, isn't answered by what you've said. (However factually correct.)
|
|
|
|
|
As to "ramifications", is this what you mean?
Open a TDL CSV with Excel.
Make a couple changes.
Save.
You get warned that you will lose formatting, which is fine.
The doc saves.
Now exit Excel.
It gives you the same warning.
Tell it to save.
It doesn't. Just exit Excel, you've already saved it.
I believe that has to do with the unicode.
Here's another "ramification".
Open the TDL CSV with notepad or wordpad.
Make a change and save.
It might not re-save as unicode.
Update the file with TDL.
Now view it.
It might look like Chinese. The first couple bytes that were removed get replaced and now double bytes are added where single bytes were before. The document is trashed on the top part because the next time anything looks at it, it tries to interpret every two byte as something meaningful - and it renders as Chinese (Big5). The top part of the file is trashed but the bottom looks fine because the two bytes up top are good and so is the multibyte data written later by TDL.
I've accidentally stumbled on this a few times but can't accurately describe how to repeat it because I really don't want to repeat it right now.
HTH
|
|
|
|
|
> I really don't want to repeat it right now.
> believe that has to do with the unicode.
I don't think so. I think that's a spreadsheet thing. I see the same thing in both MS Office and Open/LibreOffice. I think it has more to do with you're saving as .csv and you're about to lose formatting, and it lets you know. Then, when exiting, it assumes you want to keep your work, probably in a native ('rich') format, defaults to .doc/.odt/.xls/.ods/.{whatever} [point is, not .csv], and warns you all over again. I don't believe this has anything to do with character set or unicode.
> As to "ramifications", is this what you mean?
Really, really, close, Thank you!
What I'd like to have in the wiki, presumably titled "Encoding", ("Yes, it's due to unicode."), this truth will bite you if you're -really- careless, here's how, here's why, and here's what you can do about it. i.e. After it's bitten you, do {these} things.
[I'm also recognizing that I'm confusing encoding, character sets, and probably something else. Whatever utf-16 vs utf-8 is.] Because of my uncertainty, I'm not comfortable just going ahead and writing a blurb. Nor do I wish to put something out there, someone take it as gospel, and get damaged before I get as far as making any corrections. Should I even ever become aware of them.
- and I'd like to be able to warn them off / which things are likely to do the bad deeds. I think (guessing) that means non-unicode aware editors / apps, likely those more than a few 'stylistic' versions old. e.g. It feels like xmlcookbook, xtrans, and xmlcopyeditor are all not only from a different era (pre-unicode?), they each themselves appear to be from different eras.
Like you, I don't remember where I got, when, with what (let alone what I did about it) - but coming out of it later I recalled it as being an issue, don't know the specifics, and thought it worthwhile to get something written down. So the next time it happens to me, *I* know where to go to figure out what's what. Too.
e.g. Having gotten gibberish, or two characters, with / without following characters, somewhere along the line, I believe I changed the encoding in Notepad++ (utf-8, ansi, ???), saved, reopened in whatever I was using, and got on with my day. Independently I probably changed utf-16 to utf-8 in the header. Don't know if that helped, made it all better, or made it worse. (I'm just so confused!)
Dan has nicely pointed out the first two funny characters are BOM. One inexplicable bit down - thanks. However, if I leave those characters alone and get on with my day, and resave - are they preserved? Of course, nobody knows (which editor I was using at the time, how it behaves, which environment, and nobody can know all possible combinations.)
I haven't noticed a case where TDL was unhappy with the resulting file - so I'm guessing TDL can roll with whatever its handed, and I'm guessing it puts out whatever it got. (But a new file by default will be unicode. Nice.) But these are guesses, and it would be nice to nail down the specifics in a wiki article - for the next poor schlob ... which will probably be me again. Given my luck.
- thanks to your note ... that was another aspect of confusion. Single byte vs multi-byte {stuff} [BIG5???]
= not that the user much cares what it is, they just want to know what to do about it, how to avoid it (the problem), and what to look for (criteria) when looking at an app / trying to avoid it.
So ...
TDL - Encoding and character sets...
TDL is a unicode app making use of the utf-16 character set (?) and {x} (?) - but will read most anything handed to it and save the same way. (?)
If you open or transform .tdl you will want to use a unicode/{x} app by preference, for compatibility and ease of use. e.g. Notepad++, msxsl, this that or the other thing.
If you view the file and get gibberish, your editor is not {xyz} aware. Either change editors, or use something like notepad++ to change the encoding to {abc} and save.
- if you view the file and only get two BOM {cue wp reference} characters ...
- if you view the file, get those two characters, and the rest of the file, you are (will be, may be?) good to go, just don't delete the first two characters. (Or can you?)
= test by ... closing / reopening / doing a {lmno} run. If the run is unsuccessful, you will need to change editors. Sorry.
If you choose not to do so, and instead use notepad++ to change the {ghi}, doing so will bite you {somewhere here down the road}, so before {going there}, change the {abc} back to {def}, beforehand.
Or something like.
-----
On a related, but separate note, a number of apps have reformatters, usually tidy. (Sadly they're not also fixing the style braces.) I've found such very handy. Course, different programs are using different options. So, yet another learning curve to overcome. (Yes, tidy config usually changeable somewhere within that particular editor.)
It would be useful to include in the above as to whether the use of a formatter will botch TDL somehow.
It would also be useful to note what TDL itself prefers / uses. e.g. Managing expectations - format to your liking, make a change in TDL, you will get {xyz [tidy settings equivalent} when you get back in.
|
|
|
|
|
.Feels to me like an AUTHOR / MAINTAINER / MAINTAINEDBY list attribute would be useful.
Not precisely sure whether it should be a simple field, e.g. "Dan G.", or multi-line (much like an e-mail signature) where address, e-mail, web-site, etc. could be entered. Much like a CONTACT field, I suppose.
If you look at a file's properties in windows explorer, other candidate fields present themselves for consideration.
- is the (windows) file metadata available to a stylesheet? i.e. Present outside the internal schema of the actual file contents / apps control?
Are there standards for file properties? I can't help but think of Exif, XMP, or IPTC metadata (IPTC Information Interchange Model) for images? (Better than re-inventing the wheel.)
.
|
|
|
|
|
.
How could a single list record be extracted in a stylesheet, then excluded from later 'all' processing?
- e.g. search for task title 'MAINTAINER' (category METADATA?), take the title of the next (sub-)task for actual value?
Such would make the concept user-extensible / not something Dan would have to maintain at every turn. In essence, custom list level fields.
Actually, a single task title = METADATA, with however many sub-tasks with one element per 'pair' would make some sense?
Sample stylesheet?
.
|
|
|
|
|
TDL actually already supports tasklist-level metadata, the design intention being to allow importers (and other plugins) to embed domain-specific info in the tasklist which they could later read when exporting.
Same goes for tasks. The possible uses I considered would allow a mind-map plugin to store coordinate information, or a database plugin to store table-row keys (I'm currently working on an ODBC plugin).
The format (which I wasn't expecting to reveal) is:
<METADATA>
<KEY1>VALUE1</KEY1>
<KEY2>VALUE2</KEY2>
...
</METADATA>
Might be worth having a play with...
|
|
|
|
|
(Aspect 1)
"(I'm currently working on an ODBC plugin)."
Does that mean you think the tdl database can be rendered in a structured form / relational database?
i.e. Possible to migrate away from xml in the future?
I ask / all I'm thinking here is there is a bevy of 3rd party tools surrounding most databases, and an apparent dearth of popularly comprehensible tools for xml.
Easy example, csv - from dbf a long solved problem. But you/someone had to write a similar xml to csv xsl, 'reinventing that wheel.'
If output is a sore point for tdl (and we all live and work in team / group environments), and if 'report writers' off other database sources is a solved problem, then changing underlying database ties you into those already solved problems and ecosystems.
What I don't know is whether the inherent self-referent / reentrant nature of the data (sub-tasks) is even possible in other underlying database forms.
|
|
|
|
|
_BS_ wrote: i.e. Possible to migrate away from xml in the future? I've no plans to do this (unfortunately for you ) which is why it's a plugin, but I can definitely understand that people may prefer a more centralised data storage system. We'll have to see how it turns out in practice.
_BS_ wrote: What I don't know is whether the inherent self-referent / reentrant nature of the data (sub-tasks) is even possible in other underlying database forms. Currently I make one assumption in this, and that is that the parent of a given task is defined by storing the parent's database 'Task' key as a column in the 'Tasks' table.
I also don't use SQL directly to read the task hierarchy, I just pull all the required row data with a simple SELECT statement, sort the rows to ensure that parents always preceed children and then build the tree sequentially.
|
|
|
|
|
> I've no plans to do this (unfortunately for you )
Don't mean this to come out as an attack - not unfortunately for me, for everyone. And all I really mean by that is, you yourself have comments in this forum that you regretted going with XML in the first place.
- it being something that seemingly has given you grief every step of the way, since.
(Not that I'm disagreeing with the choice, hindsight is 20:20. Another good idea the execution of which didn't pan out.)
[For those same ideas/reasons, Linux has still not moved to a centralized information bit like the windows registry - textual .ini file equivalents everywhere have just worked out too well for everyone. Despite their shortcomings. I don't expect that to change any time soon. Why would anyone not think human readable XML wasn't a good idea. Especially given btree's success.]
How many inquiries / how much time have you spent over the years, just in 'printing' or 'stylesheet' issues, alone? Just non-productive, gets old really fast, time suckage, in even having them cross your monitor, wading through the distraction, let alone in answering them?
> I can definitely understand that people may prefer a more centralised data storage system
People (users) don't care what's under the hood, they just want to get what they need to do done, and get on with their day. So you could store the data as smoke signals, for all they care, as long as they can get on with their day quickly. However, given every user (it seems) thinks their own special printed output format is the cat's meow, and the learning curve presented by XML and resulting impediment preventing users from getting on with their day, it begins to matter, let alone to them.
Assuming there is 'something else' that would let them get on with that, if there were a different back end.
Given the other Freemind thread, I do wonder ... how much effort would it take for TDL to use the 'Freemind' back end instead? IIRC, it's also XML, and it's backend should be a super-set of TDL's. Should it not only be just about a matter of change a header file and get on with your day? Fields are fields, databases are databases. The TDL special sauce is not the database, but the data relationships and processing, data interpretation and business rules, not data storage mechanisms. (?)
And would not accomplishing such, instantly let you and your users leverage all the other richness that will be present in that community? Such as reports, data integrity, documentation, community - and take a load off of you for a lot of the fiddly bits most every ecosystem needs? (Let alone bring a richness of functionality and simplicity that Freemind, et al, so desperately needs, itself.)
Even just using a schema - without it, how do you do data integrity checks? With it, XML tools gain an understanding of the data structure, for you to leverage and not reinvent wheels. You aren't (apparently) interested in exposing the user tools, necessary by your own comment to be able to filter data, via command line, and encourage the use of 3rd party tools for direct interpretation of the .tdl data file. But without the schema adherence, those very same 3rd tools are hamstrung in their usefulness - so this is all very confusing. Chicken and egg, even.
> I also don't use SQL directly to read the task hierarchy, I just pull all the required row data with a simple SELECT statement, sort the rows to ensure that parents always preceed children and then build the tree sequentially.
SQL, select, it's all still a rose. [Please substitute (generic) database query wherever you see 'SQL'. It's all fields and rows, regardless of the specific language used to talk to it.]
|
|
|
|
|
(Aspect 2)
"The format (which I wasn't expecting to reveal) is:"
- this tag applies to both list level and task level?
- interface?
i.e. You noted earlier / elsewhere, if I just slap a field into the .tdl, tdl will overwrite it.
So by what means would I get the data in so that it's maintained?
In either case, presumably this is just a list of key/data pairs. (i.e. your controls / objects are already present, just no subtasks permitted.)
- ouch: maybe not. doc.author, doc.copyright, doc.tags, doc.permissions, also seem not unreasonable.
So, right-click on a task or 'the list', choose properties, and a simple list dialog is presented?
In the mean time, it's easy enough to code an xsl for these fields, but how to get the data in and have it stick?
|
|
|
|
|
> - interface?
You note in another response, plugin.
So does this mean if I manually put in something like:
<TODOLIST ...
<METADATA>
<Maintainer>Dan G.</Maintainer>
</METADATA>
It will stick / be preserved / not stripped out by TDL?
|
|
|
|
|
Thanks. Played. Works for me.
Except ... TDL is not preserving the lines.
Can you advise that they should be, or will be going forward?
|
|
|
|
|
I recall Dan was working on an tasklist owner a while back. Not sure what became of that.
|
|
|
|
|
.
If I throw a field into the tdl, is it just going to be overwritten?
e.g. I could throw an AUTHOR= attribute into the task list and get on with my day in the meantime.
Pointless if it's just going to disappear at next TDL write, though.
.
|
|
|
|
|
This is what 'Custom Attributes' are for: 'Tools > Add/Edit Custom Columns...'.
And, Yes, adding tags manually to a .tdl file will get them overwritten.
|
|
|
|
|
.
By chance, did you miss my point? These are list attributes, not task attributes.
Wouldn't custom columns potentially add a field to every task, not just a single field to the list?
.
|
|
|
|
|
_BS_ wrote: Wouldn't custom columns potentially add a field to every task, not just a single field to the list? Sure.
If you want to assign an author to a whole list try this:
1. Menu: View>'Show project name field' to show the field
2. Enter a combination of the lists name and the author (John Doe - listname or Listname (Doe, John) etc.)
3. Menu: View>'Show project name field' to hide the field.
Done.
This way you do not have to rename the list on the HDD.
Jochen
|
|
|
|
|
Unless you're already using that for it's intended purpose.
It is probably reasonable that there are entire list attributes, like maintainer. Arguably even the list itself deserves a comment field. e.g. Purpose.
In my case I'm looking for:
{Project Name}
------
{Maintainer} - {Last Modified} ({Rev.})
Arguably under the name could go a comment of: A project to complete the blah blah blah, see {link}. Project to be completed by, having the constraints of ...
|
|
|
|
|
_BS_ wrote: Unless you're already using that for it's intended purpose.
What is the purpose - in your opinion - of the project name field?
Usually if I create a new list I save it and by doing that I give it a name. That name is shown by ToDoList all the time in the wintitle.
Is the project name field just good for giving a list a very long name? Or use characters that are not allowed in file names?
Jochen
|
|
|
|
|
TCP_JM wrote: Is the project name field just good for giving a list a very long name? Or use characters that are not allowed in file names? Pretty much, yes.
|
|
|
|
|
Project name is good for giving the project a name.
Projects have many more attributes than just their name. Some are already present, such as last modified.
Do file / properties on something, a document or spreadsheet, for example, and you will see examples of many other useful 'list level' attributes. Or check out IPTC or Exif data for similar file metadata. Not that I think shutter speed, or aperture, applies to a project file, but you should get the idea.
In the end, by definition, no one lives in a vacuum or an island - we are members of teams and work in groups - and so must share and communicate what we do and are on about. So if I hand you a copy of my list, where (whom) it's from, what it's all about, when last updated, are just a small subset of what might be useful to be able to communicate / print / display about a list.
Given the current organization of TDL (e.g. Multiple categories, 1-9?), it could be quite reasonable to simply make CUSTOMLISTATTRIBUTES 1 - 9. (Perhaps with tags? i.e. labels for the fields?) CLA0_Name = 'Author', CLA0_Data = 'Dan'. Over time, use may lead to a (unmandated?) convention that CLA0 is used for 'Author'?
|
|
|
|
|