Click here to Skip to main content
15,895,142 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
How can I override the writeString method of PDFTextStripper class of pdfBox in c#?

I want to extract the font size and style of the contents of a pdf file using pdfBox. When I searched, I found that I should override the writeString method of the PDFTextStripper class.

I am a newbie in c# and I am not able to implement the overriding.

The code that I used is as follows:

C#
PDFTextStripper stripper = new PDFTextStripper(){

protected override void writeString(String text, List<TextPosition> textPositions)
        {
            String prevBaseFont = "";
            StringBuilder builder = new StringBuilder();

            foreach (TextPosition position in textPositions)
            {
                String baseFont = position.getFont().getBaseFont();
                if (baseFont != null && baseFont != (prevBaseFont))
                {
                    builder.Append('[').Append(baseFont).Append(']');
                    prevBaseFont = baseFont;
                }
                builder.Append(position.getCharacter());
            }

            writeString(builder.ToString());
        }

};


Please help me.
Posted

1 solution

In order to override a method you need to create your own class that inherits from PDFTextStripper provided it is not sealed.
Also the method you want to override has to be marked virtual.
C#
public class MyPDFTextStripper : PDFTextStripper
{
    protected override void writeString(String text, List textPositions)
    {
    }

    // Another option is to hide the original method by using the keyword new
    protected new void writeString(String text, List textPositions)
    {
    }
}

See Knowing When to Use Override and New Keywords[^]

Here is an article that might be helpful Working with PDF files in C# using PdfBox and IKVM[^]
 
Share this answer
 
Comments
Biju P Dais 25-Oct-15 9:33am    
Thank you George, it saved my day.
George Jonsson 25-Oct-15 9:37am    
You are welcome.
Biju P Dais 26-Oct-15 8:07am    
Hi George,
I tried out the methods that are mentioned, but the new function that I wrote is not overriding the existing function. I am unable to use the override keyword. It shows error saying that there is no suitable method to override.

Below is the new function that I am using:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using java.io;
using java.util;
using java.util.regex;
using org.apache.pdfbox.cos;
using org.apache.pdfbox.pdmodel;
using org.apache.pdfbox.pdmodel.interactive.documentnavigation.outline;
using System.IO;

namespace org.apache.pdfbox.util
{
class CustomPDFTextStripper : PDFTextStripper
{
protected new void writeString(String text, List<textposition> textPositions)
{
//MessageBox.Show();
StringBuilder builder = new StringBuilder();
//String builder = "";
String prevBaseFont = "";
foreach(TextPosition position in textPositions)
{
String baseFont = position.getFont().getBaseFont();
System.Console.WriteLine("hello");
if (baseFont != null && baseFont!=(prevBaseFont))
{
builder.Append('[').Append(baseFont).Append(']');
/*builder += '[';
builder += baseFont;
builder += ']';*/
prevBaseFont = baseFont;
}
builder.Append(position.getCharacter());
//builder += position.getCharacter();
}

//writeString(builder.ToString());
writeString(builder.ToString());
}

}
}



And the old function is like this:
protected internal virtual void writeString(string text);
protected internal virtual void writeString(string text, List textPositions);

Please help
George Jonsson 26-Oct-15 10:12am    
If the method has the keyword virtual it should be possible to override it. Not sure if internal messes things up in this case.
As writeString is protected it means that you can never call from an instance, only from within the class or from its inheritors.
So the question is what is the public method you are calling that internally calls writeString? Maybe that could be a starting point.
Biju P Dais 14-Nov-15 22:35pm    
Hi George, sorry for the late response.

This is the code that I use:

//loading the pdf file
PDDocument doc = PDDocument.load(str_file_path);
try
{
//CustomPDFTextStripper containing the overridden version of writeString
CustomPDFTextStripper stripper = new CustomPDFTextStripper();
//stripper.hello();

return stripper.getText(doc);
}

finally
{
doc.close();
}

So, stripper.getText(doc) is the public method that calls this function internally.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900