Click here to Skip to main content
14,774,409 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
How do Handle Non Byte order Mark UTF-8 text file handles japanese language with MSWord conversion

HI i have a Text file which has a japanese character て
how can i handle this japanese character with MSWord Conversion.

What I have tried:

byte[] utf8Preamble = System.Text.Encoding.UTF8.GetPreamble();
     int byteLength = utf8Preamble.Length;
     byte[] theData = new byte[byteLength];
     System.IO.FileStream fs = new System.IO.FileStream(@"c:\folder\file.txt", System.IO.FileMode.Open, System.IO.FileAccess.Read);
     fs.Read(theData, 0, byteLength);
     fs.Close();

     // Process only if UTF-8 preamble does not exists
     bool utf8PreembleExists = EncodingCompare(utf8Preamble, theData);
     if (!utf8PreembleExists)
     {
        // open document
        WordDocument = Documents.OpenNoRepairDialog(FileName: paramSourceDocPath,
                    ConfirmConversions: ConfirmConversion, ReadOnly: ReadOnly,
                    OpenAndRepair: OpenAndRepair, NoEncodingDialog: NoEncodingDialog);

         // check to see if MSWord thought it was UTF8
      if (WordDocument.OpenEncoding == 
                        Microsoft.Office.Core.MsoEncoding.msoEncodingUTF8)
            {
                object SaveChanges = false;
                object RouteWB = false;
                WordDocument.Close(ref SaveChanges, ref ParamMissing, ref RouteWB);
                                

               // reopen document with desired encoding
               object Encoding = Microsoft.Office.Core.MsoEncoding.msoEncodingWestern;
               WordDocument = Documents.OpenNoRepairDialog(FileName: paramSourceDocPath,
                      ConfirmConversions: ConfirmConversion, ReadOnly: ReadOnly, Encoding: Encoding, OpenAndRepair: OpenAndRepair, NoEncodingDialog: NoEncodingDialog);
                            }
                        }
Posted
Updated 29-Jul-20 1:11am
Comments
Garth J Lancaster 29-Jul-20 4:48am
   
ok .. if you have a text file (no BOM), with the character て , how is that represented in the text file - eg using a hex display tool or such ?
Richard MacCutchan 29-Jul-20 5:51am
   
What happens when you run the above code?

1 solution

Depending on how the character is represented by byte(s) in the file (which my question was trying to ascertain), you could try

1) Creating a new, empty Word doc
2) reading the text file as a byte-stream
3) add the byte-stream from (2) to/as a new paragraph in the Word Doc
   

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900