Click here to Skip to main content
15,886,873 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Hi I am trying to read ms word file from server path by following code :
string wordHTML = System.IO.File.ReadAllText(path);

but the content is showing like this.Please suggest to read word file content as it is.

PK!�t+d�7[Content_Types].xml �(��T�N�0�#���(q�!Դ�#�D�Ǚ�o���mT�Im/�b�-�f2��Z�h ��S�Oz$�M.�,%�ӗ��D>0�3i4�d����W��Ƃ��}J�!�J=��b>14��)��ͨe��̀��zw�@�8�d8x��-d���x�ub��D���J��U��sz�	��������ѥ�؊MQ���f#a�} ���������	?��`0 ����+����,'r�&̅W�0�2.�u!�4�A�l���(��̊	����C/T��ٍ�ԝ&|�H��w��m�ǰ&�XO�w'�C�C���.��1!`��(~��V~5�����?9���S��U1e����~�M�ib����? o3R�7�a�wV�>2u�Z��/��PK!���N_rels/.rels �(����JA���a�}7� "���H�w"����w̤ھ�� �P�^����O֛���;�<�aYՠ؛`G�kxm��PY�[��g Gΰino�/<���<�1��ⳆA$>"f3��\�ȾT��I S����������W����Y ig�@��X6_�]7~ f��ˉ�ao�.b*lI�r�j)�,l0�%��b� 6�i���D�_���,	� ���|u�Z^t٢yǯ;!Y,}{�C��/h>��PK!-w�9�word/_rels/document.xml.rels �(����N�0��&��] ���,�Ř��D� P�-igU��q7l��K��7��/3���ߪ >�:it��0b�”R�	{˞��Y�P�R�FC�zpl�^^�^�H�\#;��v k��]р.4h:��U��5�D�!j�(Zr;�`�g�)f7%�g}G��{���<�b�@���d(l ��]��@?�p��"u׍e��'Be4f"o'��9��O�}K�iƾ���}����� F�A��X��0-<��P��U�����i�R�{��>����a�R����̂��d����ώL��t������PK!y�(�+�9word/document.xml�}�n�H����*�lZ�-g:/�f����پ���Ec��!�i�T�b��W��'`��%�Q�I�;AR")�&%Y&�H��6%Q��s?�9��>2ٝp\ö�j��^�	K�u�վ^�y_c��-���%�jS�����?��O�C�����<�[X��^z��pw�Նb�݆=^��Έ{��쎸s�w4{4�qc��7����{[oc�|�:o�324�v�G9��}C��N�� >y>���]G�x�r��؍�6Z�nX�


What I have tried:

string wordHTML = System.IO.File.ReadAllText(path);
Posted
Updated 22-Jun-18 8:29am

You need to be able to correctly interpret the file format. It's easiest to use a library that can read the files for you

GitHub - xceedsoftware/DocX: Fast and easy to use .NET library that creates or modifies Microsoft Word files without installing Word.[^]
 
Share this answer
 
MS Word document is not just text document. Till the version of 2003 it was kind of "digital" file. MS Word 2007 and higher document is just a zipp'ed xml file. ;)
Take a look here:
About the Open XML SDK 2.5 for Office | Microsoft Docs[^]
Understanding the Open XML file formats | Microsoft Docs[^]

So, if you'd like to read/change MS Word document, please, follow this: Word processing (Open XML SDK) | Microsoft Docs[^]
 
Share this answer
 
Quote:
Trying to read MS word file

An msword file is not plain text, that is the reason why one use library to read/change/save such a file.

The 'PK' you see at beginning of file is the signature of a zip file. The word file is a zip archive and the 'PK' signature is beginning tag of each individual file in the archive and you find the file name a few positions after it.
So this is the 4 file names one can see in your code dump :
[Content_Types].xml
_rels/.rels
word/_rels/document.xml.rels
word/document.xml
C#
string wordHTML = System.IO.File.ReadAllText(path);

By the way, such a file is to be read in binary mode.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900