Click here to Skip to main content
15,886,788 members
Please Sign up or sign in to vote.
1.21/5 (4 votes)
See more:
I want to convert Pdf file into Excel, it is possible in C# but i want in C++.
Can anyone help me with this??
Thanks in advance.

What I have tried:

I have trued to convert in c# but my whole other work is in c++ so i want this also in C++..
Posted
Updated 21-Sep-17 8:50am
Comments
Richard MacCutchan 2-Jun-16 8:56am    
As far as I knoe there is no open source library that allows you to read PDF files from C++. The only way would be to write all the code yourself, based on the Adobe specifications; not a trivial exercise.
Sergey Alexandrovich Kryukov 2-Jun-16 10:53am    
...worse, the whole problem means that the major mistake is already done. The data which may need any processing should never be stored in PDF. PDF is like paper: it is designed to be read by humans, that's all; it is not structured in any semantically sensible way. When it is extracted, it remains to be unstructured.
—SA
Richard MacCutchan 2-Jun-16 11:06am    
Well who knew?
Sergey Alexandrovich Kryukov 2-Jun-16 11:15am    
That's the thing: it was clear from the very beginning that PDF is the dead end in data processing. I already faced with this in real life; this is just the business anti-pattern. We had a client who had a database with, in particular, electronic components. And there are suppliers of such components, manufacturers or just sellers, and some deliver information on what they offer only in PDF. At least two businesses failed to came to any reasonable agreement on delivering catalog data. Maybe the seller supports the same database as this client, internally. And then, there are companies trying to be parasites on human stupidity. People purchased an application to parse PDF, put data in tables, and so on, more or less close, and that client wanted to integrate it, and so on. The question is: do those sellers want to effectively sell or not?
—SA
Mohibur Rashid 3-Jun-16 0:08am    
I was going to say that. Either, you are reading data that you are not suppose to or you put your engineer on fire.

1 solution

This is a very complex task. First, PDF and Excel are different in the nature of their documents but since anything placed in a PDF can be placed in an Excel file, that doable.
Start by downloading this open source PDF library.
You can then open a PDF file and access its elements such as text, font, text's attributes, pictures, etc.
Then, the easiest would be to create an ASCII file ending with .csv. Such file will be smoothly imported to Excel upon the first time it is used, then you can save it as a native Excel file. The more difficult part would be generating a native Excel file.
You can use this paid product, or a free one like this one.
Here is a simple example of creating your first native Excel file using the free library:
#include "xlsxwriter.h"
int main() {
    lxw_workbook  *workbook  = workbook_new("hello_world.xlsx");
    lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL);
    worksheet_write_string(worksheet, 0, 0, "Hello", NULL);
    worksheet_write_number(worksheet, 1, 0, 123, NULL);
    workbook_close(workbook);
    return 0;
}
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900