Click here to Skip to main content
Click here to Skip to main content

Simplest PDF Generating API for JPEG Image Content

By , 19 Dec 2008
 

Introduction

I was working on a project in which I need to wrap a JPEG file into PDF format. The program needs to be done in C, and after searched on the Internet, I could not find anything that I can refer to. Most of the Open Source PDF engine is based on either Java or PHP, and a few C PDF engines are huge and will add a lot of unnecessary code to my project. I decided to write this simple JPEG to PDF wrapper. And it's the result of reverse-engineering of the simplest PDF file that contains one single JPEG file. I just want to share this API so you can grab and use it if you have a similar requirement.

Using the Code

Just give an example to demonstrate how to generate a 2 page PDF file based on 2 JPEG files. Please refer to testMain.c for details. But the idea is:

PJPEG2PDF pPDF;
int pdfByteSize, pdfOutByteSize;
unsigned char *pdfBuf;

pPDF = Jpeg2PDF_BeginDocument(8.5, 11);    
	/* pdfW, pdfH: Page Size in Inch ( 1 inch=25.4 mm ); Letter Size 8.5x11 */

if(NULL != pPDF) {
    
    Loop For All JPEG Files {
        ... Prepare the current JPEG File to be inserted.
        /* You'll need to know the dimension of the JPEG Image, 
	and the ByteSize of the JPEG Image */
        Jpeg2PDF_AddJpeg(pPDF, JPEG_IMGW, JPEG_IMGH, JPEG_BYTE_SIZE, 
		JPEG_DATA_POINTER, IS_COLOR_JPEG);
    } 
    
    /* Call this after all of the JPEG image has been inserted. 
	The return value is the PDF file Byte Size */
    pdfByteSize = Jpeg2PDF_EndDocument(pPDF);
    
    /* Allocate the buffer for PDF Output */
    pdfBuf = malloc(pdfByteSize);
    
    /* Output the PDF to the pdfBuf */
   Jpeg2PDF_GetFinalDocumentAndCleanup(pPDF, pdfBuf, &pdfOutByteSize);
   
   ... Do something you want to the PDF file in the memory.
}

There are several places that you can fine-tune in the Jpeg2PDF.h file:

#define MAX_PDF_PAGES        256     /* Currently only supports less than 256 Images */

#define PDF_TOP_MARGIN        (0.0 * PDF_DOT_PER_INCH)    /* Currently No Top Margin */
#define PDF_LEFT_MARGIN        (0.0 * PDF_DOT_PER_INCH)   /* Currently No Left Margin */

That's it, guys. Enjoy.

History

  • Updated [2008-12-19] - Added some extra code to auto scan the current folder and automatically obtain the JPEG image dimension from the JPEG file instead of using hard coded value before.
    The JPEG image dimension code is borrowed from here.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Hao Hu
Software Developer
United States United States
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
Questionpatches done: keep images's aspect ratio & automatic page rotate for best fitmemberdab1826 Aug '11 - 6:55 
how to merge it into project?
forum? bugtracker? svn access to https://jpeg2pdf.svn.sourceforge.net/svnroot/jpeg2pdf ?
QuestionRemove the max 250 pages limitation?membermonday200015 Mar '11 - 1:54 
How to remove the max 250 pages limitation? Such limitation is too much severe.
AnswerRe: Remove the max 250 pages limitation?memberHao Hu15 Mar '11 - 7:15 
One of the principle of programming is:
Simplicity v.s. limitation.
 
I do prefer to pursue the simplicity with reasonable limitation.
You can always change the max page limitation to the number that you want. e.g. 10000
 
If you do want to remove the limitation, then go ahead and change the code as you desire.
QuestionHow to get DPI from JPEG?membermonday200011 Mar '11 - 5:22 
Hello Hao Hu,
 
I need to get programmatically DPI from JPEG. How to do it? Your code gets only the pixel dimensions.
AnswerRe: How to get DPI from JPEG?memberHao Hu11 Mar '11 - 5:39 
I might not have time to show you how to do it. But I can give you a direction. Basically, the JPEG file has a header which contains many markers. The code here is just to read the marker that includes the dimension info. You want to check out the JPEG header format and find the marker that includes the DPI info. However, since JPEG is not strictly defined to a particular physical dimension size, so the DPI info might be useless. (e.g. Zero). Good luck.
GeneralRe: How to get DPI from JPEG?membermonday200011 Mar '11 - 10:09 
Thanks. I already found the simple solution - on http://stackoverflow.com/questions/4001719/getting-jpeg-resolution-without-decoding-the-image[^]
 
Here's how I modified your function:
 
//Gets the JPEG size from the array of data passed to the function, file reference: http://www.obrador.com/essentialjpeg/headerinfo.htm
static int get_jpeg_size(unsigned char* data, unsigned int data_size, unsigned short *width, unsigned short *height)
{
//Check for valid JPEG image
int i=0; // Keeps track of the position within the file
if(data[i] == 0xFF && data[i+1] == 0xD8 && data[i+2] == 0xFF && data[i+3] == 0xE0)
{
i += 4;
// Check for valid JPEG header (null terminated JFIF)
if(data[i+2] == 'J' && data[i+3] == 'F' && data[i+4] == 'I' && data[i+5] == 'F' && data[i+6] == 0x00)
{
// http://stackoverflow.com/questions/4001719/getting-jpeg-resolution-without-decoding-the-image
unsigned int dpi_mode = (int)data[i+9];
// 1 == 'DPI'; //Dots Per Inch
// 2 == 'DPC'; //Dots Per Cm.
 
unsigned short HorzRes = data[i+10] * 256 + data[i+11]; // Y-DPI, since I consider only "Y-DPI == X-DPI", this is enough.
 
//Retrieve the block length of the first block since the first block will not contain the size of file
unsigned short block_length = data[i] * 256 + data[i+1];
 
printf("dpi_mode = %d\n", dpi_mode);
 
printf("HorzRes = %d\n", HorzRes); // this is it - the DPI I was looking for.
 
while(i<(int)data_size)
{
..........
 
I need the JPG DPI because I want to change every PDF page size according to JPG DPI.
Generalhttps://sourceforge.net/projects/jpeg2pdf/memberHao Hu26 Aug '10 - 12:27 
I'm using sourceforge to host my updates. Please check following URL for the newest version:
 
https://sourceforge.net/projects/jpeg2pdf/[^]
GeneralRe: https://sourceforge.net/projects/jpeg2pdf/membermonday200012 Mar '11 - 21:20 
What's new in there? It would be good to post here the update descriptions.
 
By the way, it is necessary I hope to remove the limitation of the max files to PDF-encode. Maybe using a temporary file.
QuestionWhat about the XMP manipulation? [modified]membermonday200017 Aug '10 - 23:11 
Hello, Hao Hu.
 
It would be nice also if your program could manipulate the XMP-metadata inside PDF files.
 
XMP is contaned just inside another PDF-object - using XML syntax.
 
XMP manipulation is a useful thing because XMP provides the different metadata information about a PDF document like Author, Title, Keywords, etc.
 
The XMP specification is available from Adobe:
 
   http://www.adobe.com/devnet/xmp/
 
Here is an example of XMP inside PDF:
 
380 0 obj<</Subtype/XML/Length 3247/Type/Metadata>>stream
<?xpacket begin="?»?" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="3.1-701">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
         <rdf:Description rdf:about=""
                  xmlns:xap="http://ns.adobe.com/xap/1.0/">
            <xap:CreateDate>2009-11-03T17:06:02Z</xap:CreateDate>
            <xap:ModifyDate>2009-11-03T20:08:04+03:00</xap:ModifyDate>
            <xap:MetadataDate>2009-11-03T20:08:04+03:00</xap:MetadataDate>
         </rdf:Description>
         <rdf:Description rdf:about=""
                  xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
            <pdf:Producer>ABBYY FineReader 9.0 Professional Edition</pdf:Producer>
         </rdf:Description>
         <rdf:Description rdf:about=""
                  xmlns:dc="http://purl.org/dc/elements/1.1/">
            <dc:format>application/pdf</dc:format>
         </rdf:Description>
         <rdf:Description rdf:about=""
                  xmlns:xapMM="http://ns.adobe.com/xap/1.0/mm/">
            <xapMM:DocumentID>uuid:6c762139-bdde-4faa-9adc-b1e164c6db73</xapMM:DocumentID>
            <xapMM:InstanceID>uuid:a2270d76-823e-405c-b377-b3a7f72c4575</xapMM:InstanceID>
         </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                     
                                      
<?xpacket end="w"?>
endstream
endobj
 

 
-- Modified Wednesday, August 18, 2010 5:17 AM
AnswerRe: What about the XMP manipulation?memberHao Hu18 Aug '10 - 6:09 
It will be a nice feature. But I don't have any plan to expand this library. Majorly because I don't have these kind of requirement and lack of spare time.
If you need this feature, please just add it in. I guess other people might also need it if you share with them.
GeneralDesired enhancementsmembermonday200014 Aug '10 - 0:09 
Hello, Hao Hu.
 
I'm planning to enhance a bit your program.
 
First of all, it is required to remove the limitation of 256 pages. This might be done via the temp file usage - instead of the RAM memory usage. It is currently the main flaw - the max files amount shouldn't be limitated at all.
 
Also I'd like to consider the real files DPI instead of the hard-coded 72 dpi.
 
Besides it is not the good idea to use the memory buffers of a pre-set size - what is the guarantee that their length would be always enough?
 
And I used A4 paper size in my clone of your program. I write the canvas size to PDF as double (%.6f)- and you - as integer (%.d) (which is not precise). I am from Russia, and in Russia we do not use inches - instead we use only the millimeters.
 
Also in your program a small JPG is stretched to the whole caanvas. I will make an option to keep the JPG original size (placing it in the center of the canvas).
 
I found also some analoguous PDF library - libharu.org. But I don't know how to compile it in my MS VC++ 6.0 for Windows.
GeneralRe: Desired enhancementsmemberHao Hu14 Aug '10 - 15:12 
Thanks a lot for your enthusiastic about this small library.
 
The original purpose of the library is to provide a start point of anyone who need a simple library code like this. Feel free to change it and use it at your own risk.
 
Regarding some of your concerns:
 
Computers always have trade-off. As an example, UNIX timestamps also have limitation of the possible time range. The trade-off of changing 256 to a unlimited number is the simplicity of the program. For my original motivation for writing this piece of code, 256 is enough for me. So I just want to keep the code simple.
 
Regarding 72DPI, I'm not 100% sure. But I think this is the predefined value in PDF spec. So I think changing this number to something else will only give yourself trouble.
 
Memory buffer size is also a trade-off technique for coding simplicity. I think you might not be able to find a case that will cause buffer overflow, since all the estimated sizes are safe enough. To give you an example, 2.45m is the current Olympic record for men high jump. So 3m will be a safe enough number to say no human can jump over. By using this kind of safe estimation, you might waste a little bit of memory, but you don't need to do dynamic memory allocation, which will spend extra time and also need add a lot of error checking logic.
 
For the paper size, I think even you use floating point number, but PDF actually will only use the integer value. You can easily verify this by opening any A4 document that generated by other PDF engine. I guess the reason is very simple, the error will only be 0.5*25.4/72=0.176mm, which is not significant at all.
 
Of course, there are many PDF libraries, they might provide more features that the code I have here, if you need those features, you should consider to switch.
GeneralRe: Desired enhancementsmembermonday200015 Aug '10 - 19:44 
Computers always have trade-off.
So I just want to keep the code simple.
I totally agree with you. I also will do my best to keep the derived code as simple as possible.
The trade-off of changing 256 to a unlimited number is the simplicity of the program.
By the way - I did not understand yet - where does this limitation come from? Maybe it is the probable RAM memory size limitation? If yes - than introducing a temp file usage would not break the trade-off significantly. The code will remain enough simple and understandable.
Of course, there are many PDF libraries, they might provide more features that the code I have here, if you need those features, you should consider to switch.
It is so much illusionary actually. I need exactly something for C++ - not C# and not Java. And the libs which conform - are too complicated to even compile them (not to tell about anything else).
 
Probably the trade-off you are speaking about may go broken if I stop using the memory buffers of the pre-set length. So I will think this issue through thouroughly before applying.
GeneralI made my program based on this onemembermonday200013 Aug '10 - 1:12 
Hello, Hao Hu.
 
I made my own program based on your utility.
 
My program is called "fi2pdf". See more details here:
 
https://sourceforge.net/projects/freeimage/forums/forum/36111/topic/3721193/index/page/1
 
My program is intended to use the FreeImage library to create PDFs.
GeneralRe: I made my program based on this onemembermonday200016 Mar '11 - 22:03 
I made a new version of my program:
 
fi2pdf v2.1
 
See the details in the same place:
 
https://sourceforge.net/projects/freeimage/forums/forum/36111/topic/3721193/index/page/1
GeneralRe: I made my program based on this onememberHao Hu18 Mar '11 - 20:16 
That's great.
The idea of posting code here is just to share with others.
And I'm happy to see people can use it and make change as necessary.
 
In fact, the original reason of developing this code was to generate PDF file on an embedded environment. So I don't have the luxury of file system. Everything need to be done within memory. e.g. jpegs are buffer in memory, and need to generate PDF in memory. Then send out by network.
 
I saw you took the advantage of file system to reduce the memory usage, that's totally good for an utility that you want.
GeneralBug foundmembermonday200013 Aug '10 - 1:08 
Hello, Hao Hu.
 
I found a bug in your program.
 
Inside the function
 
STATUS Jpeg2PDF_GetFinalDocumentAndCleanup(PJPEG2PDF pPDF, UINT8 *outPDF, UINT32 *outPDFSize)
 
you should comment the statement:
 
if(outPDF && (*outPDFSize >= pPDF->currentOffSet))
 
To understand why, initialize *outPDFSize with zero (in the calling function) - and your program is to stop working (at least on the big-filesize 24 bit JPEGs).
 
It currently relies on the randomly non-zero initialized *outPDFSize - which works in Debug, but does not in Release (on the big-filesize 24 bit JPEGs).
GeneralRe: Bug found [modified]memberHao Hu13 Aug '10 - 5:43 
Thanks for your message.
 
However, I think the place you point out is not right. The issue is within the test code.
Basically, when calling Jpeg2PDF_GetFinalDocumentAndCleanup(), I want user to pass in
their buffer size to avoid the situation of overflow. So the *outPDFSize should be initialized
in the caller as the size of the outPDF buffer. Later, I'll change that value and let caller
know the exact byte size that has been used. This is a very common way for API to get then set
a buffer size.
 
It seems complicate for me to update this article right now. (codeproject ask to submit my change to their editor for the update, which is much more complicate than before)
 
So what should been done is in the testMain.c:
Just remove the declaration of: pdfFinalSize and replace all pdfFinalSize with pdfSize.
So pdfSize will be a In/Out variable.
In: Caller let the jpeg2pdf know the byte size of pdfBuf
Out: jpeg2pdf let the caller know the actual byte size that has been used.
 
Thanks.

modified on Saturday, August 14, 2010 4:26 AM

QuestionMay I re-use your code under the "GPL 2 and later" license?membermonday20002 Jul '10 - 1:32 
Hello Hao Hu.
 
May I re-use your code under the "GPL 2 and later" license?
AnswerRe: May I re-use your code under the "GPL 2 and later" license?memberHao Hu2 Jul '10 - 7:33 
Even I wrote this small module for an existing product. But I don't see any problem that will prevent you from using it without any worry. Feel free to use it even for commercial product.
 
Thanks a lot for the 5 star.
GeneralMy vote of 5membermonday20001 Jul '10 - 23:06 
I desperately seeked for a solution like that. That's the only one in the whole Web! Thanks a lot.
GeneralBmp 2 pdf functionmemberJunson_Feng13 May '10 - 20:32 
Hi Hao:
 
Thanks for you sharing this,it's usefule for myself to study.
 
But can you add the function that conversion Bmp format image to pdf?
 
Jason.Feng
GeneralRe: Bmp 2 pdf functionmemberHao Hu13 May '10 - 21:20 
Hi, Jason,
 
What you need is a BMP to JPEG converter. Then you can pack JPEG into PDF.
NConvert is a very good image conversion program, it's available here:
[^]
 
Good Luck.
QuestionHow to keep image's aspect ratio?memberSuper Garrison6 Nov '09 - 17:50 
I'd appreciate if someone can share.
The ratio changes, the PDF looks no good.
 
Super.
AnswerRe: How to keep image's aspect ratio?memberHao Hu6 Nov '09 - 20:27 
This code insert one JPEG file per page.
So possible solutions are:
* If all your image files are the same size. Then define the PDF page as the same aspect ratio.
* You can also set the correct margin based on your image files under
"/* Contents Object in Page Object */" section of Jpeg2PDF.c file
 
Or if you really need complicate PDF generator. Then find a full size PDF library.
 
Good luck.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130516.1 | Last Updated 19 Dec 2008
Article Copyright 2008 by Hao Hu
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid