|
|
Comments and Discussions
|
|
 |
|

|
I tried to use this code to generate a pdf document. It works when debugging but once I publish and try to use this function it doesn't work. Why is that?
I think the programm can't find the word document...
|
|
|
|

|
That could be true. If you don't put the Word document in the location where the application is looking for it then your application will fail. Since this was a demo application, I hard-coded the location but in a real-world application you would want to specify the file and path at runtime.
Also, don't forget that you need Microsoft Word installed wherever you run this application (not just on the computer you build it on).
|
|
|
|

|
Thanks for the answer.
Yes, I have it installed and it works fine when debugging. So probably the best way is for the user to set the location of the .docx file before creating the pdf. Is that it?
I would prefer that the user didn't have to handle with the word document, but the location Environment.CurrentDirectory seems not to be working, and I don't know how to set the location (I'm newbie in programming ) Do you know some way to do this?
|
|
|
|

|
Don't forget that the relative path to a file changes between debug and release. Since you are using just one file, I would recommend placing the name of the file in your config file and then reading that value to find it (so the path can be hard-coded but still changable) or, if you want to do things the easy but hard to maintain way, put the hard-coded full path to the file in the code.
As for the Environment.CurrentDirectory not working, I think that this might just be a simple coding mistake. Try outputting the path that it gives you as well as the full path to the file that you then create. My guess is that you left off a slash or doubled one up accidentally. To output the path, it would probably be easiest to use the MessageBox, since that will show up even when you aren't debugging the application or running it from the command line.
|
|
|
|

|
The problem was that when i published, the MyDocument.docx file was not being included in the release. I managed to solve that by setting the build action option of the file to content (it was set to none).
Thanks again for your help, Tim.
|
|
|
|

|
Free is the keyword, 'cos if it wasn't nobody would even bother reading it here
|
|
|
|

|
hey i just i copy and paste your code, but its not converting to pdf
Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
Please reply ASAP
|
|
|
|

|
My best guess would be that you are putting this code in a web application and it is having a trust issue. However, that is just a guess based upon the little you told me. Can you provide me with more details on what you are doing? Also, did you try downloading the source solution and running that as a demo? Maybe if that runs you can compare the differences and see why your application isn't working.
|
|
|
|

|
Hello, I have run the solution on my server and i get the following error, "The system cannot find the file specified", although when i run the solution on my local pc then it works great.
I added every possible right to the folder for everyone and shared the folder completely, just to make sure it is not a rights issue.
Any Ideas?
|
|
|
|

|
When you say server, do you mean web server or just another computer that is a server in your environment? If you are running this application on IIS, it is going to be tricky. You will have to play around with which user the website runs as and how you path to your file. If you are just talking about another computer that acts as a server, probably it is something simple (like the path is a bit different from that location). What I would suggest is either remotely debugging your app and stepping through the code when it tries to find the file or, if that is impractical, put logging messages around the suspected error location in your code. For example, output the full path and file name the application is using to open the file. Then do a copy/paste in explorer to see if the file comes up. If it does, the issue is probably the permissions of the account that the application is running under. Another simple method would be to convert what sounds to be a hard-coded path to the file over to a open file dialog box. Browse to the file you want and select it using the OFD. That way you know the path is right.
|
|
|
|

|
It does not work with Word 2003 because there is not ExportAsFixedFormat method
|
|
|
|

|
Correct. I patterned the article and the information I listed based upon Word 2010. I believe it will work with Word 2007 but you might need to install an add-on pack. However, that is as far back as you can go with compatibility.
|
|
|
|
|
|

|
Thanks for writing an absolutely excellent article!
I wonder how you manage to stay sane after having to explain a thousand times just what you meant by "free"
This is a very interesting approach for us, this article presents an excellent starting point!
|
|
|
|

|
Nice article, only you're introducing lots of new problems because of Office Interop. You introduce gc-issues, office version problems and system administration problems.
In the end those will probably cost you more then buying a normal PDF addin or using a free one like iTextSharp which has a more steep learning curve.
|
|
|
|

|
Well, I understand what you are saying but I guess I would disagree. First, there isn't an office version problem, since you can use lowest common denominator if you want wider version support. Obviously if you want to use functionality native to newer versions of Office you wouldn't be able to use it on machines with older versions of Office. As for GC issues, the issue isn't the GC. The issue with the GC is only if you cause one trying to force it to do something it isn't designed to do. I'm not sure what you mean by system administration problems.
I have no problem with using other options. However, as I stated in the article, this solution uses existing tools to their full extent. If you do it right, you won't have an issue. I've been using this solution in production for three years now (over 1,000 pdf creations a year in an automated environment) without an issue.
|
|
|
|

|
I have worked a lot with these kind of tools. I actually most worked on fixing problems others created by using Office Interop. The Office assemblies from v9 do not work very nicely with Office v10 etcetara. In some cases it might work, but you will definatly run into weird problems. Secondly what I mentioned is that installing Office on a server is a big no-no in the system administration-world for a lot of reasons (For example: http://support.microsoft.com/kb/257757[^].
|
|
|
|

|
A useful technique and a well written article.
Just because the code works, it doesn't mean that it is good code.
|
|
|
|
|

|
Word is not free? ...no $h1t.
Tim offers us a viable solution to work with and cultivate ideas on how to get around the PDF issue, and all you steak-heads can complain about is how Word is not free.
You’re all a bunch of d-bags, screw you!
I guess you can call the next atricle "Word isn't free, you cheap f#cks!"
Tim nice work, thanks for ideas!
Peace
|
|
|
|

|
lol, thanks for the comments and the support. I'm glad you found the ideas in my article useful.
-Tim
P.S. - you know Word isn't free, right?
|
|
|
|

|
Good inside look at Microsoft.Office.Interop
|
|
|
|

|
Simple and easy bridging between systems to create pdf.
|
|
|
|

|
I believe the article is misleading but it does show the reader what objects are available from Word to get the job done.
|
|
|
|

|
I believe the name of this article is misleading. Microsoft Word is not a free product. Your article should be named "Scripting MS Word to create PDF documents". You should investigate using open office and see if it exposes an OLE automation model and script against it. Secondly, the remark about using preexisting widgets instead of new ones doesn't really apply here. Using Word for this task is much like using Thor's hammer to swat a fly. Word is hardly a widget. As far as the server scenario goes, you could host your code from within an Outlook VBO script and trigger it based on an email. A typical scenario might be to email a Word document and have the PDF emailed back to you. You could take it farther even with code by investigating SMTP solutions of CodeProject which would let you code an object that acts as a proxy to an MS Outlook email server. This object would automate the process of sending and receiving mail so that your solution could then truly be considered viable since the whole world wouldn't need Word to have this solution. You could have a generic application widget then that does the complete task. Just my thoughts here; I mean no disrespect to your work.
I'm glad you have a tool at your disposal that helps you day in/day out but I don't think it is prime time enough yet to call it a generic solution. I have particular insight into creating PDF documents and I have investigated many vendor products which do this and even wrote my own wrapper by using the Ghostscript DLL's from within a C++ project.
|
|
|
|

|
Hi,
I have faced the same issue (generating template-based pdf reports) and I spent days looking for the best solution. From my research I gathered that the best/most flexible solution is to use Office OpenXML. In that way you don't call a COM component - don't need to have MsWord installed, you just manipulate xml documents.
Here's my solution.
A SOA approach to dynamic docx report generation[^]
Cheers.
Erion
|
|
|
|

|
Erion,
I reviewed your solution before I posted mine. It really isn't what I was going for. Yes, if you want to modify a Word document then modifying the file directly is a good solution. However, I wanted only the template to be in Microsoft Word format; I didn't want to have a bunch of Word documents saved with different information in them. The actual output I wanted to be in PDF format, which is something that your solution cannot do.
|
|
|
|

|
Sorry, you're right, I thought I had put in the bottom of my article something more about pdf conversion. Will do it soon...
Just a few days after I wrote the article I integrated automated pdf conversion into the solution. I achieved it through BullZip pdf (http://www.bullzip.com/products/pdf/info.php[^]), which is a freeware, full-featured, programmable and very well documented pdf printer. They give you a package that you can call from managed code to automatically print to pdf virtually any file - including docx.
My PrintToPdf method loads the printer settings from a static file, it "reads" the docx file from a temporary directory, creates the pdf file and then destroys the original docx. In my application I have a docx templates directory from which I pick the docx files, I generate the customized docx reports, save them in a temporary directory and then batch print them to pdf, destroying the original docx reports in the end.
Here's my PrintToPdf method.
using System;
using System.IO;
using System.Linq;
using System.Collections.Generic;
using System.Diagnostics;
using System.ComponentModel;
using System.Configuration;
using System.ServiceModel;
using Bullzip.PdfWriter;
namespace DocxGenerator.SL.WCF
{
public class PdfMaker
{
internal static byte[] PrintToPdf(string appFolder, string tempDocxFileName)
{
try
{
string tempFolder = appFolder + @"\temp";
string tempDocxFilePath = tempFolder + @"\" + tempDocxFileName;
PdfSettings pdfSettings = new PdfSettings();
pdfSettings.PrinterName = ConfigurationManager.AppSettings["PdfPrinter"];
string settingsFile = pdfSettings.GetSettingsFilePath(PdfSettingsFileType.Settings);
pdfSettings.LoadSettings(appFolder + @"\App_Data\printerSettings.ini");
pdfSettings.SetValue("Output", tempFolder + @"\<docname>.pdf");
pdfSettings.WriteSettings(settingsFile);
PdfUtil.PrintFile(tempDocxFilePath, pdfSettings.PrinterName);
string tempPdfFilePath = tempFolder + @"\Microsoft Word - " + tempDocxFileName + ".pdf";
bool fileCreated = false;
while (!fileCreated)
{
fileCreated = PdfUtil.WaitForFile(tempPdfFilePath, 1000);
}
byte[] pdfBytes = File.ReadAllBytes(tempPdfFilePath);
File.Delete(tempDocxFilePath);
File.Delete(tempPdfFilePath);
return pdfBytes;
}
catch (Exception ex)
{
throw new FaultException("WCF ERROR!\r\n" + ex.Message);
}
}
}
|
|
|
|

|
Erion,
Very nice. I'll have to look into that solution. Thanks for sharing.
|
|
|
|
|
|

|
The reason is that you are creating COM objects by using inline procedural type code. The "inbetween" object is not assigned to a variable that you can dispose, and so the object remains alive until the GC gets around to killing it.
Here is an example:
oDoc = oWord.Documents.Open(strWordDoc, oFalse, oTrue);
When you call oWord.Documents, "Documents" is placed in a temporary variable, and you cannot control its disposal. Instead, try this:
Word.Documents docs = null;
Word.Document doc = null;
try
{
docs = oWord.Documents;
doc = docs.Open( strWordDoc, oFalse, oTrue );
}
finally
{
if( null != doc )
{
Marshal.ReleaseComObject( doc );
doc = null;
}
if( null != docs )
{
Marshal.ReleaseComObject( docs );
docs = null;
}
}
(this is totally off my head - you get the idea, but use at your own risk)
Also, be careful of loops - creating objects in loops without realising it is very easy (the Range objects you are creating are not being disposed at the end of each loop iteration).
The easist way to confirm that you are closing all of your objects is to comment out all of your Word code and confirm Word exits when you call Quit (and call Marshal.ReleaseComObject( wordAppVariable );). Then re-introduce parts of your code to see which sections you need to work on.
Good luck!
|
|
|
|

|
Close, but no cigar.
I've already got a macro that sets up bookmarks and hyperlinks within a Word Doc converted from a PDF. That doesn't solve my problem either.
My problem is converting from PDF->Word->PDF messes up some of the formatting, specifically the pagination.
I'm looking for an automated tool that can read a PDF, edit it with automation, then write a PDF without converting it to Word or any other format inbetween. Any suggestions on that?
|
|
|
|

|
5 Globes ....
|
|
|
|

|
Great article, very informative. Don't know why folks keep saying "It's not free....". It's obviously free if you have Word already installed, which you clearly state several times.
I can see rolling a driver client app with template locating functionality, choice of tag formats, inputs for runtime entry of tags/values, etc.
Could you please answer / clarify a few points?
- Please expand on why the COM component is buggier than the .Net component.
- You said to verify the existence of the Word doc, its extension and Office version. You only showed the first in your code. While it's obvious how to check the second, could you please provide sample of how to verify the office version of the document?
- Although I've been automating Word since VB6 days, I'm somewhat of a newbie using Microsoft.Office.Interop.Word. In the old days, I could late-bind to Word and not worry about which version. If I need to code this DLL to support multiple versions of Word (say 2003, 2007, 2010), and have one dev PC, how can I create multiple versions of DLL? Do I need to have all 3 versions of Word installed on dev box? Do they even coexist on same PC? Will Add Reference let me choose a particular version of the Interop if I have multiple Word versions installed?
Thanks again for a great article...
|
|
|
|

|
To answer your question about the differences between the COM component and the .Net component, I'm not sure why this is the case. In fact, to be honest I am basing my recommendations off of my experience. I have extensively used both and I consistently had more problems when I was referencing the COM component. Behind the scenes, they are both just a COM wrapper so I'm not sure why one would perform differently than another. The only thing I could think of is if .Net treats the COM component differently than it would a .Net component.
That is an excellent question...and one I don't yet have an answer for. I believe you would need to crack into the file to see (unless you see it is saved as a doc file - then you know it is in 2003 or earlier format). I'm currently researching the more in-depth method of file modification based upon direct XML manipulation. I'm not sure yet how to output this information except as a Word document (I want a PDF format) but doing this would probably expose that information. The good news is that unless you are using the features released in a later version of Word, you should be fine with looking at the file extension.
Creating multiple DLLs can be tricky but you could do so by using multiple development machines. As I was researching this issue (since I haven't worried about it in the past), I came across some answers from Microsoft that helped shed some light. Basically, you can specifically target the lowest version and all should work fine for each version. The newer Interop assemblies add more features but they try to keep the existing method calls the same. If you use a more recent version, you may be ok with backward compatibility but you might have an issue if you use any of the newer calls. I have used my application on a machine with Office 2007 (I developed it for 2010) without an issue. I haven't tried it with 2003 but it should fail since 2003 didn't have a Save As PDF option. The Add Reference box does allow you to choose which version you target so you could just target the lowest version and you should be fine.
Great questions. Thanks for the feedback. Let me know if you have more questions or want clarification on any of these items.
|
|
|
|

|
Hi,
thanks for interesting solution. Few month ago, we have choosen very similar way - word document as a template, custom tags for marking, xml output from our application. With Word 2003 we were able to work with almost any element in the document that was tag with custom tagged - even tables. The bonus of this solution was the very same flow for sheets.
Unfortunately, there will not be custom tags support in Word in the future.
Our problems now are tables. Have you tried, or are you planing to work on marking "somehow" the tables (with unknown line count when templating) or table rows?
Thanks for answer
|
|
|
|

|
Good question. I haven't tried this yet myself (I will be trying it soon) but I figured I would post the clue here and let you see if it will work in your situation now while I try to get the full solution documented. Microsoft has posted a document on how to work with the Word Interop. In that document they posted a way to insert a table. Here is the code they give:
Word.Table oTable;
Word.Range wrdRng = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range;
oTable = oDoc.Tables.Add(wrdRng, 3, 5, ref oMissing, ref oMissing);
oTable.Range.ParagraphFormat.SpaceAfter = 6;
int r, c;
string strText;
for(r = 1; r <= 3; r++)
for(c = 1; c <= 5; c++)
{
strText = "r" + r + "c" + c;
oTable.Cell(r, c).Range.Text = strText;
}
oTable.Rows[1].Range.Font.Bold = 1;
oTable.Rows[1].Range.Font.Italic = 1;
That is a direct copy from http://support.microsoft.com/kb/316384
To make this work with my code, since I didn't use bookmarks, I believe you would need to place this code to run when it found a certain "tag" that you created. I would have to play around with it to get the proper syntax/implementation logic but that should get you started.
Do you have a link to the place where they say that custom tagging is going away? What exactly do you mean by custom tags? The method I use simply does a find and replace on the text I give it. I call the items tags just for clarity but the reality is that it is doing a replace all command.
|
|
|
|
|
|

|
Very readable and well explained.
|
|
|
|

|
word is not free
also you'd call Marshall.DestroyComObject() to realease object, and then call GC when exit, there's not extra word.exe running.
|
|
|
|

|
I didn't say Word was free. I said this solution would allow you to create PDFs for free if you had Word.
Calling Marshall.DestroyComObject() does nothing further to release the object beyond what has already been done with the Quit command. The quit command executes the Marshall.DestroyComObject() inside of its call. As for calling the GC when you are exiting, this is a BAD idea. You should try to avoid every calling the Garbage Collector. The whole point of the GC is to work on its own. If you study how the GC works, you will find that it learns how your application operates and adapts to better clean up the system based upon this knowledge. Calling the GC destroys this optimization and makes it start over. The only time you should call this is if your application will change its overall operation and you want to clear out the optimizations in order to have the system learn new ones. As far as this specific instance is concerned, the GC has nothing to do with why winword.exe would hang. As I explained, even calling the GC will just mean the application will hang (or at least the thread will) until the COM object closes out. The issue here isn't with .NET objects not being released. The slowness is inside the COM wrapper. Unless you have caused an error, this should close on its own. That is why I specified a number of steps to ensure that everything has been done properly on your end. While we cannot control what happens inside the COM wrapper, we can control what we give it, which is almost as good.
|
|
|
|

|
You should change the title, it's not free.
And second using the Interop, you force the users to have a specific version of office
|
|
|
|

|
Well, I understand where you are coming from but here is why I disagree: the solution is free as long as you meet certain pre-existing conditions. I would not, and do not, advocate purchasing anything to use this solution. If you have to buy Microsoft Word to make this work, don't do it. However, if you already have Microsoft Word (and there are a lot of Windows users who do), this solution will work for free.
You make a good point about the Interop requiring a specific Office version. That is one of the things that I grumble about too. However, if you are developing for an environment that has multiple versions of Word, you would simply need to create a separate DLL for each version with the reference to one specific version. If you are using a factory pattern or other methods of loosely-coupled applications, it will be simple to point to the correct version of the DLL based upon which version of Word is installed. The code does not change, only the reference call changes.
|
|
|
|

|
After further review, it looks like you can add the lowest Interop assembly in order to provide compatibility across multiple versions. For instance, I have two options when I add the Interop reference: version 12 and version 14. If I add version 12, I can work with Word documents from 2003 on. The other solution works but it is definitely not as clean as this one. The only thing that one adds that this one does not is the ability to use the new features provided in the latest Interop library. If you absolutely need those new features for your new documents but also want to manipulate older documents then you would need to create multiple DLLs.
|
|
|
|

|
Hi,
First I would like to say that it is a very good article. If Microsoft Office is already installed on the PC where your application runs it is very good. But unfortunately Microsoft office is not free and some people prefer using the free Open Office alternative.
Someone already wrote a similar word->pdf article a couples of years ago
Generate PDF Using C#[^]
Valery.
|
|
|
|

|
Valery,
Thanks for pointing that article out. For those who use Open Office, this looks like it might be a good alternative. You would still need to figure out how to replace values in the document before outputting it to the PDF but it definitely gives you a good start.
|
|
|
|
 |
|
|
General News Suggestion Question Bug Answer Joke Rant Admin
Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.
|
Learn how to automate Microsoft Word in C# to create PDF files based upon a template document that can be modified at runtime to reflect the dynamic information
| Type | Article |
| Licence | CPOL |
| First Posted | 3 Jan 2011 |
| Views | 46,122 |
| Downloads | 1,204 |
| Bookmarked | 113 times |
|
|