Click here to Skip to main content
14,732,620 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am trying to read different tag values (like tags 259 (Compression), 33432 (Copyright), 306 (DateTime), 315 (Artist) etc.) from a TIFF image in Java 11.

What I have tried:

I tried with ImageIO like following:

File tiffFile = new File(tiffFileName);

    ImageInputStream input = ImageIO.createImageInputStream(tiffFile) 
    ImageReader reader = ImageIO.getImageReaders(input).next(); 

    reader.setInput(input);
    IIOMetadata metadata = reader.getImageMetadata(0); 

    TIFFDirectory ifd = TIFFDirectory.createFromMetadata​(metadata);
    TIFFField myTag = ifd.get​TIFFField(33432); 
    String tagString = myTag.getAsString(0);  
    // problem here

    //String[][] replacements = { { "ä", "ae" }, { "ü", "ue" }, { "ö", "oe" }};
    String[][] replacements = {{"\u00C4", "Ae"}, {"\u00DC", "Ue"}, {"\u00D6", "Oe"},    
          {"\u00E4", "ae"}, {"\u00FC", "ue"}, {"\u00F6", "oe"}, {"\u00DF", "ss"} };

    for (String[] replacement : replacements) {
       tagString = tagString.replaceAll(replacement[0], replacement[1]);
    }


But it does not give exact value of the tag. In case of non-ASCII values (ö, ü, ä etc), question marks replace the real values. TIFFField.getAsString(0) return values like Universit�t. But I want Universität.

Can anyone tell me how to get byte values of the tag, then decode it with utf-8 to get the exact tag values ?

Suggestion for alternative java library for reading the TIFF images is also welcome. I just need to read the exact tag values including non-ASCII characters.
Posted
Updated 11-Nov-20 3:35am
v6
Comments
Richard MacCutchan 6-Nov-20 7:32am
   
The values are correct, it is your display code that is producing the strange characters. You need to know the language that is being used in the text and adjust your display font to match it.
Member 12213239 6-Nov-20 8:58am
   
any idea how to handle the display font ?
Richard MacCutchan 6-Nov-20 11:20am
   
That depends on how you are displaying the results.
Member 12213239 6-Nov-20 11:45am
   
I want to replace the umlaut (ä, ö, and ü) with equivalent characters like ae, oe and ue. My problem here is TIFFField.getAsString(0) return values like Universit�t, not exact value Universität. Can you specifically tell me how to get the exact value including the umlaut ?
Richard MacCutchan 6-Nov-20 12:02pm
   
No, they do not return "Universit�t", that is produced by you trying to display a character in a font that has no equivalent for that character's value. You need to examine the character's actual value. It is no use trying to print it and hoping for the best. Look at the Character Map application in the Windows Accessories folder on the start menu. That will show you what characters are equivalent in different language fonts.
Member 12213239 6-Nov-20 12:29pm
   
I am not printing the values here. When i debug the code ( in IntelliJ IDEA ), it shows Universit�t, not exact value Universität. I just need to read the tag value and replace umlauts with equivalent characters (like ae for ä, ue for ü ) in the string. If i can't read the umlaut, i can't replace it with equivalent values. can you give any hints how to read the exact umlaut here ?
Richard MacCutchan 6-Nov-20 12:38pm
   
Stop looking at them as displayed, and look at the actual numeric value of the character, that is what determines what will be displayed. For example, in Unicode the character ä has the value 0x00E4. And if your display font set uses a different mapping then you will get whatever character is at that value in the font set. All of this information can be found in the Character Map application I referred to above.
Richard MacCutchan 7-Nov-20 3:45am
   
A string is just an array of bytes. ASCII characters are represented by 8-bit byte values, and Unicode by 16-bits. I suggest you get a book on computer basics and learn how data is stored and manipulated.
Member 12213239 8-Nov-20 11:51am
   
i am new in Java programming. can you help please ?
Richard MacCutchan 8-Nov-20 12:05pm
   
This has nothing to do with Java, it is about understanding computers and how data is stored and manipulated, and what each byte, or sequence of bytes, may represent. If you do not understand the basics you are going to struggle more and more.
Member 12213239 8-Nov-20 14:00pm
   
i got your point. here i am using replaceAll() to replace umlaut in the string. But myTag.getAsString(0) is not returning the exact value. what am i missing here ? How can i manipulate the string differently ?
Richard MacCutchan 9-Nov-20 3:52am
   
What do you mean by "But myTag.getAsString(0) is not returning the exact value"? I cannot guess what is happening in your system.
Member 12213239 9-Nov-20 5:18am
   
Please have a look at my updated code above. Here myTag.getAsString(0) return values like Universit�t. But the exact value is Universität. How can i replace the umlaut if i don't get the exact value ? Can you please tell me how to access the byte values and replace the umlaut with the equavalent values like ae for ä, ue for ü and so on ?
Richard MacCutchan 9-Nov-20 5:21am
   
At the risk of repeating myself ad nauseam: look at the actual values of each character in the returned data.
Member 12213239 9-Nov-20 16:58pm
   
you mean like this ? String[][] umlautReplacements = { { "\u00C4", "Ae" }, { "\u00DC", "Ue" }, { "\u00D6", "Oe" }, { "\u00E4", "ae" }, { "\u00FC", "ue" }, { "\u00F6", "oe" }, { "\u00DF", "ss" } };
Richard MacCutchan 10-Nov-20 3:27am
   
Yes.
Member 12213239 9-Nov-20 17:00pm
   
i checked each character using Unicode Character (Hexadecimal ). But it is showing same result
Richard MacCutchan 10-Nov-20 3:33am
   
Sorry, I have no idea what that means. I have just tested your code and it works correctly.
Member 12213239 8-Nov-20 14:33pm
   
I tried to convert the string into byte array and replace the umlaut. but it is not working.
Member 12213239 8-Nov-20 15:31pm
   
can you give little hints ?

1 solution

Quote:
Can anyone tell me how to get byte values of the tag, then decode it with utf-8 to get the exact tag values ?

First, you need to understand that before unicode (DOS era), ascii codes between 128-255 where used for special chars and with pagecodes to handle different charsets.
ASCII Code - The extended ASCII table[^]
One of the reasons TIFF uses this is that TIFF was created before unicode/utf exist, at the time they needed ways to encode non ascii chars.
-So to know what was read, you need to display as hexadecimal.
Your read is probably: 55 6E 69 76 65 72 73 69 74 84 74, ä is usually encoded as 84.
- You need to understand how you data is encoded and then call function that will convert to the coding of your app.
- if you want to update this data, you will need to do a coding in reverse.

In your case, you probably need a conversion from CP437 to urf8.
   

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900