Click here to Skip to main content
15,883,755 members
Articles / Web Development / HTML

Product Evaluation: Aspose.Words

Rate me:
Please Sign up or sign in to vote.
4.83/5 (5 votes)
4 Apr 2013CPOL10 min read 35K   6   3
Product Evaluation: Aspose.Words

This Review is from our sponsors at CodeProject. These reviews are intended to provide you with information on products and services that we consider useful and of value to developers.

Introduction

In 2004, I was approached by a client to help write a web application. A multi-page wizard collected information from the user, and at the end, generated a Word document containing pages of best practices and engineering specifications for installing the company’s products. Our solution worked, and hundreds (if not thousands) of documents have been generated for their customers. But, we used Office Automation in ASP.NET in a way that was not supported by Microsoft, at least at that time (since it was executing on the server).

Fast forward to the present day, and I was invited to evaluate Aspose.Words for .NET. While I personally don’t get many requests for Office Automation-type projects these days, as a consultant, it is good to have a go-to library to use should the need ever arise. So, I agreed to take a look at the product.

The Unboxing Experience

These days, software typically does not come in a box, so there is no unboxing experience involved to establish that first impression. But, there is a lot to be said about how easy the installation process is for software, as well as how discoverable the documentation is. Aspose.Words can be installed by means of a traditional MSI install wizard, or as a NuGet package.

The MSI option provides a full experience, including:

  • Installs the Aspose.Words assemblies (multiple assemblies to support different versions of the .NET Framework)
  • Installs the demo projects with source code
  • Installs the documentation locally on the developer’s machine
  • Adds the assembly to the Add Reference dialog box in Visual Studio

After the installation, the developer must manually add a reference to the assembly. Locally-installed documentation is available by means of Windows HTML Help, which is completely available and searchable while disconnected from the Internet.

The NuGet option is intended to be a bit more task-oriented. Instead of modifying the developer’s machine to support general-purpose development with the Aspose.Words library, it merely copies the .NET 2.0 and .NET 3.5 Client Profile assembly versions to the project’s directory, and then adds the necessary reference to the project. The package will need to be added to each project that uses the library.

There is no help file installed locally with the NuGet option, so the developer is left with the online documentation located at: http://www.aspose.com/docs/display/wordsnet/Home.

Hello World

One challenge that we had with writing the document generator all those years ago was finding an efficient way to insert formatted text into the document. Specifically, the resulting document contained sections of text, each with paragraphs and special formatting. Because of time constraints, the decision was made to not build a document on-the-fly by inserting text into it, but instead start with a document that had every possible section already in it (nicely formatted by a human), and then just delete sections based on the user’s input.

So, in evaluating Aspose.Words, I wanted to see how easy it would be to just insert whole sections of pre-formatted text, written as HTML, into a document. As it turns out, besides the rich Document object (which is very similar to Microsoft Word’s object model), there’s also a DocumentBuilder object that abstracts away the layers of nodes that makes up a document, and lets you focus on the task at hand.

Conveniently, DocumentBuilder has an InsertHTML method that looks like it will do exactly what I want. But, will it handle all aspects of HTML, like images and hyperlinks? Let’s find out!

C#
var doc = new Aspose.Words.Document(); 
var builder = new Aspose.Words.DocumentBuilder(doc); 
var html = @"<div> 
             <img src='http://thetabletshow.com/dnr_photos/JasonFollas.png'> 
             <a href='http://jasonfollas.com/'>Testing</a> 
             </div>"; 
			 
builder.InsertHtml(html); 
doc.Save(filename_or_Stream, Aspose.Words.SaveFormat.Docx);

The result? I was expecting the image not to be included (since that’s another fetch that Aspose.Words would have to perform), but it worked flawlessly!

Image 1

Note: I received a license, but had not yet applied it when this demo executed. This screenshot shows the default behavior of the evaluation mode where text is inserted into the document.

Licensing

When not associated with an active license, Aspose.Words will run in an Evaluation Mode that injects red text into the documents that are produced (see the screenshot of my “Hello World” experiment). Developers evaluating the product can obtain a 30-day license in order to work with the fully unlocked behavior.

Licenses are distributed as XML files, and applications must set the license before working with the API in order to disable the Evaluation Mode. Though this may sound like an inconvenience, it’s actually not that bad. For example, to unlock the product for use by an ASP.NET application, simply place the .lic file into the /bin directory, and add the following to global.asax:

C#
protected void Application_Start(object sender, EventArgs e) { 
   Aspose.Words.License license = new Aspose.Words.License(); 
   license.SetLicense("Aspose.Words.Product.Family.lic"); 
}

File Types

One very impressive aspect of Aspose.Words is the vast number of document formats that are supported for loading and saving documents. With two lines of code, the library could be used as a format converter to open a Word Document and save it as a PDF:

C#
var doc = new Aspose.Words.Document("Document.doc"); 
doc.Save("Document.pdf", Aspose.Words.SaveFormat.Pdf);

Building upon this concept, you could start with a Word Document that was authored by a business person, open it on the web server, insert/modify/delete content within the document, and then send a PDF version to the user.

Load Formats

  • Microsoft Word 97-2003 document
  • Microsoft Word 97-2003 template
  • Office Open XML WordprocessingML Macro-Free Document
  • Office Open XML WordprocessingML Macro-Enabled Document
  • Office Open XML WordprocessingML Macro-Free Template
  • Office Open XML WordprocessingML Macro-Enabled Template
  • Flat OPC document
  • RTF format
  • Microsoft Word 2003 WordprocessingML format
  • HTML format
  • MHTML (Web archive) format
  • OpenDocument Text
  • OpenDocument Text Template
  • MS Word 6 or Word 95 format

Save Formats

  • Doc: Microsoft Word 97 - 2007 Document
  • Dot: Microsoft Word 97 - 2007 Template
  • Docx: Office Open XML WordprocessingML Document (macro-free))
  • Docm: Office Open XML WordprocessingML Macro-Enabled Document
  • Dotx: Office Open XML WordprocessingML Template (macro-free)
  • Dotm: Office Open XML WordprocessingML Macro-Enabled Template
  • FlatOpc: Office Open XML WordprocessingML stored in a flat XML file instead of a ZIP package
  • FlatOpcMacroEnabled: Office Open XML WordprocessingML Macro-Enabled Document stored in a flat XML file instead of a ZIP package
  • FlatOpcTemplate: Office Open XML WordprocessingML Template (macro-free) stored in a flat XML file instead of a ZIP package
  • FlatOpcTemplateMacroEnabled: Office Open XML WordprocessingML Macro-Enabled Template stored in a flat XML file instead of a ZIP package
  • RTF: Rich Text Format
  • WordML: Microsoft Word 2003 WordprocessingML format)
  • Pdf: Adobe Portable Document
  • Xps: XML Paper Specification
  • XamlFixed: Extensible Application Markup Language (XAML) format as a fixed document
  • Swf: Adobe Flash Player
  • Svg: Scalable Vector Graphics
  • Html
  • Mhtml: Web archive
  • Epub: IDPF EPUB format
  • Odt: ODF Text Document
  • Ott: ODF Text Document Template
  • Text: Plain text format
  • XamlFlow: Beta. Saves the document in the Extensible Application Markup Language (XAML) format as a flow document
  • XamlFlowPack: Beta. Saves the document in the Extensible Application Markup Language (XAML) package format as a flow document
  • Tiff: Renders a page or pages of the document and saves them into a single or multipage TIFF file
  • Png: Renders a page of the document and saves it as a PNG file
  • Bmp: Renders a page of the document and saves it as a BMP file
  • Emf: Renders a page of the document and saves it as a vector EMF (Enhanced Meta File) file
  • Jpeg: Renders a page of the document and saves it as a JPEG file

A More Complex Example

Since I was being asked to provide an honest evaluation of the product, I wanted to find a way to stress the document generation ability a little bit more. So, after a bit of brainstorming, I came up with the idea of creating a PDF containing a Reddit post (http://reddit.com) along with its nested comments. Note: This exercise is mostly academic in nature.

Reddit provides a RESTful API to permit third-party software access to its content. For .NET languages, there is an open source library called RedditSharp (https://github.com/SirCmpwn/RedditSharp) that abstracts away the details of the networking and transport, and allows the developer to focus on the data.

C#
var reddit = new RedditSharp.Reddit();
var iama = reddit.GetSubreddit("/r/IAmA");
var firstPost = iama.GetPosts()[0];
var comments = firstPost.GetComments();

Comments on Reddit can nest deeply at times, so in order to make the document readable, I made two design decisions: Set the page orientation to Landscape, and use the Left Indent instead of the bullet list so that I could have better control over how much space each indent uses.

C#
var doc = new Aspose.Words.Document();
var builder = new Aspose.Words.DocumentBuilder(doc);
builder.PageSetup.Orientation = Aspose.Words.Orientation.Landscape;

The post’s title and author are written at the top of the document. The Heading 1 style is applied to the title, while a small italic font is used to display the author:

C#
builder.ParagraphFormat.StyleIdentifier = 
	Aspose.Words.StyleIdentifier.Heading1;

builder.Writeln(firstPost.Title);

builder.ParagraphFormat.StyleIdentifier = 
	Aspose.Words.StyleIdentifier.BodyText;

builder.Font.Name = "Arial";
builder.Font.Size = 8;
builder.Font.Italic = true;
builder.Writeln(" - " + firstPost.Author.Name);

Next, the comments collection must be crawled. I chose to use a recursive function to output each level of comments, and call itself if a given comment has comments of its own (i.e., nested comments). Each level of comments is indented using the ParagraphFormat.LeftIndent property, setting a value in points.

C#
iterate(comments, builder);

...

private void iterate(List<RedditSharp.Comment> comments, 
			   Aspose.Words.DocumentBuilder builder)
{
    indent++;
    builder.ParagraphFormat.LeftIndent = indent * 12;

    foreach (var c in comments)
    {
        if (c.ContentHtml != null)
        {
            builder.ParagraphFormat.Borders.Top.LineStyle = 
			Aspose.Words.LineStyle.Dot;

            builder.ParagraphFormat.Borders.Top.DistanceFromText = 6;

            var html = Server.HtmlDecode(c.ContentHtml)
                             .Replace("<div class=\"md\">", "")
                             .Replace("</div>", "")
                             .Replace("<p>", "")
                             .Replace("</p>", "<br/><br/>");

            builder.Font.Name = "Times New Roman";
            builder.Font.Size = 10;
            builder.Font.Italic = false;

            builder.InsertHtml(html);

            builder.Font.Name = "Arial";
            builder.Font.Size = 8;
            builder.Font.Italic = true;

            builder.Writeln(" - " + c.Author);

            if (c.Comments.Count > 0)
                iterate(c.Comments, builder);
        }
    }
    indent--;
}

When I first wrote this function, I tried using RedditSharp’s Comment.Content property (instead of .ContentHtml). I had to replace hard returns (\n) in the text with vertical tabs (\v) to keep new paragraphs from being started in the document (otherwise, the indent was lost). Even so, the result looked strange to me, since comments in Reddit are entered in plain text using Markdown (http://en.wikipedia.org/wiki/Markdown), but they are intended to be rendered as rich text.

The rendered HTML version of the comments is wrapped in a <div> tag, and uses <p> elements for each paragraph. However, similar to the hard return issue with Markdown, the indenting was lost when the DocumentBuilder.InsertHtml() function encountered the <div> and <p> elements, because these were handled as new paragraphs by Aspose.Words. So, the easy solution seemed to be to clean the HTML before inserting it into the document (by removing the <div> and <p> tags, and inserting two <br> elements in place of the closing </p> tag, etc.).

Finally, after all of the comments are rendered, it is time to save the results. I wrote my program as part of an ASP.NET web application, so saving the document is really just streaming it to the user’s browser. The Document object’s save() method has an overload that accepts a HttpResponse object as the first parameter, and makes the task of saving/streaming to the user very straightforward:

C#
doc.Save(Response, "iama.pdf", Aspose.Words.ContentDisposition.Inline, null);

Note: This method overload is not available in the 3.5 Client Profile assembly, but if you need to use the Client Profile, then chances are that you will just be saving the results to an actual file anyways.

Image 2

Closing Thoughts

Overall, I was impressed by the power and ease provided by Aspose.Words. While it didn’t always do everything in the way that I thought it should, it is probably due more to my lack of understanding of how the Word document model works rather than a flaw in this library.

The DocumentBuilder object is the real powerhouse of the library, and the InsertHtml function makes it a breeze for web developers to add entire chunks of content to Word Documents. Though, while this technique may get someone like me to the 80% mark with very little effort, it still remains to be seen how much more effort would be required in the form of code tweaking in order to produce a 100% pixel-perfect document of content.

Disclosure of Material Connection: I received one or more of the products or services mentioned above for free in the hope that I would mention it on my blog. Regardless, I only recommend products or services I use personally and believe my readers will enjoy. I am disclosing this in accordance with the Federal Trade Commission’s 16 CFR, Part 255: “Guides Concerning the Use of Endorsements and Testimonials in Advertising.”

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
Questionconvert large pdf file to word Pin
kvein_xie18-Mar-15 4:19
kvein_xie18-Mar-15 4:19 
hi, I have a question, if i want to convert large pdf file to word what can i do, use for aspose.pdf
QuestionIn InsertHtml table rendering from left side of the page . I need tab space(s) based on the multi level hierarchy Pin
tarunp12284-Aug-14 3:45
tarunp12284-Aug-14 3:45 
GeneralAspose.Words is awesome Pin
Robert te Kaat12-Jun-14 4:18
Robert te Kaat12-Jun-14 4:18 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.