|
In the function Jpeg2PDF_GetFinalDocumentAndCleanup(), the address of the pdfFinalSize variable is passed to be assigned with the final PDF file size.
However, the conditional block at the top of the function compares the variable's value instead of the variable's address:
if (outPDF && (*outPDFSize >= pPDF->currentOffSet))
Shouldn't it be outPDFSize instead of *outPDFSize ?
Thank you.
|
|
|
|
|
The code should be correct.
outPDFSize is a pointer to UINT32 and pPDF->currentOffSet is in UINT32 type.
So doing *outPDFSize comparison with pPDF->currentOffSet is very natural.
Not exactly sure why the code fails at this line. But if you have the call stack, it could explain easily.
|
|
|
|
|
Thank you very much for your speedy reply. If I may just clarify one other point, please.
Is that statement comparing memory addresses or their contained values? Because:
1. the function is called with the address of pdfSize:
Jpeg2PDF_GetFinalDocumentAndCleanup(pdfId, pdfBuf, &pdfSize);
2. *outPDFSize would return the contained value (uninitialized) while outPDFSize would return the address.
Perhaps I'm wrong in my understanding of point #2. Thank you so much for your time.
|
|
|
|
|
Thanks for your article..
I am new to PDF and imaging stufff... So one quick question.
I would like to know if it is possible to insert images as per their size. It seems that images are shown in full screen.
However, I know that I have read somewhere that PDF is resolution independent.
But if you could throw some light on how to adjust image size according to the actual scanned jpeg image size. I would be great. Iff it could be done. Thanks again
|
|
|
|
|
why is 256 Images,not other number?
|
|
|
|
|
This is great stuff in it's simplicity.
You have saved me a lot of time. 
|
|
|
|
|
how to merge it into project?
forum? bugtracker? svn access to https://jpeg2pdf.svn.sourceforge.net/svnroot/jpeg2pdf ?
|
|
|
|
|
How to remove the max 250 pages limitation? Such limitation is too much severe.
|
|
|
|
|
One of the principle of programming is:
Simplicity v.s. limitation.
I do prefer to pursue the simplicity with reasonable limitation.
You can always change the max page limitation to the number that you want. e.g. 10000
If you do want to remove the limitation, then go ahead and change the code as you desire.
|
|
|
|
|
Hello Hao Hu,
I need to get programmatically DPI from JPEG. How to do it? Your code gets only the pixel dimensions.
|
|
|
|
|
I might not have time to show you how to do it. But I can give you a direction. Basically, the JPEG file has a header which contains many markers. The code here is just to read the marker that includes the dimension info. You want to check out the JPEG header format and find the marker that includes the DPI info. However, since JPEG is not strictly defined to a particular physical dimension size, so the DPI info might be useless. (e.g. Zero). Good luck.
|
|
|
|
|
Thanks. I already found the simple solution - on http://stackoverflow.com/questions/4001719/getting-jpeg-resolution-without-decoding-the-image[^]
Here's how I modified your function:
//Gets the JPEG size from the array of data passed to the function, file reference: http://www.obrador.com/essentialjpeg/headerinfo.htm
static int get_jpeg_size(unsigned char* data, unsigned int data_size, unsigned short *width, unsigned short *height)
{
//Check for valid JPEG image
int i=0; // Keeps track of the position within the file
if(data[i] == 0xFF && data[i+1] == 0xD8 && data[i+2] == 0xFF && data[i+3] == 0xE0)
{
i += 4;
// Check for valid JPEG header (null terminated JFIF)
if(data[i+2] == 'J' && data[i+3] == 'F' && data[i+4] == 'I' && data[i+5] == 'F' && data[i+6] == 0x00)
{
// http://stackoverflow.com/questions/4001719/getting-jpeg-resolution-without-decoding-the-image
unsigned int dpi_mode = (int)data[i+9];
// 1 == 'DPI'; //Dots Per Inch
// 2 == 'DPC'; //Dots Per Cm.
unsigned short HorzRes = data[i+10] * 256 + data[i+11]; // Y-DPI, since I consider only "Y-DPI == X-DPI", this is enough.
//Retrieve the block length of the first block since the first block will not contain the size of file
unsigned short block_length = data[i] * 256 + data[i+1];
printf("dpi_mode = %d\n", dpi_mode);
printf("HorzRes = %d\n", HorzRes); // this is it - the DPI I was looking for.
while(i<(int)data_size)
{
..........
I need the JPG DPI because I want to change every PDF page size according to JPG DPI.
|
|
|
|
|
|
What's new in there? It would be good to post here the update descriptions.
By the way, it is necessary I hope to remove the limitation of the max files to PDF-encode. Maybe using a temporary file.
|
|
|
|
|
Hello, Hao Hu.
It would be nice also if your program could manipulate the XMP-metadata inside PDF files.
XMP is contaned just inside another PDF-object - using XML syntax.
XMP manipulation is a useful thing because XMP provides the different metadata information about a PDF document like Author, Title, Keywords, etc.
The XMP specification is available from Adobe:
http://www.adobe.com/devnet/xmp/
Here is an example of XMP inside PDF:
380 0 obj<</Subtype/XML/Length 3247/Type/Metadata>>stream
<?xpacket begin="?»?" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="3.1-701">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:xap="http://ns.adobe.com/xap/1.0/">
<xap:CreateDate>2009-11-03T17:06:02Z</xap:CreateDate>
<xap:ModifyDate>2009-11-03T20:08:04+03:00</xap:ModifyDate>
<xap:MetadataDate>2009-11-03T20:08:04+03:00</xap:MetadataDate>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
<pdf:Producer>ABBYY FineReader 9.0 Professional Edition</pdf:Producer>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:format>application/pdf</dc:format>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xapMM="http://ns.adobe.com/xap/1.0/mm/">
<xapMM:DocumentID>uuid:6c762139-bdde-4faa-9adc-b1e164c6db73</xapMM:DocumentID>
<xapMM:InstanceID>uuid:a2270d76-823e-405c-b377-b3a7f72c4575</xapMM:InstanceID>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
endstream
endobj
-- Modified Wednesday, August 18, 2010 5:17 AM
|
|
|
|
|
It will be a nice feature. But I don't have any plan to expand this library. Majorly because I don't have these kind of requirement and lack of spare time.
If you need this feature, please just add it in. I guess other people might also need it if you share with them.
|
|
|
|
|
Hello, Hao Hu.
I'm planning to enhance a bit your program.
First of all, it is required to remove the limitation of 256 pages. This might be done via the temp file usage - instead of the RAM memory usage. It is currently the main flaw - the max files amount shouldn't be limitated at all.
Also I'd like to consider the real files DPI instead of the hard-coded 72 dpi.
Besides it is not the good idea to use the memory buffers of a pre-set size - what is the guarantee that their length would be always enough?
And I used A4 paper size in my clone of your program. I write the canvas size to PDF as double (%.6f)- and you - as integer (%.d) (which is not precise). I am from Russia, and in Russia we do not use inches - instead we use only the millimeters.
Also in your program a small JPG is stretched to the whole caanvas. I will make an option to keep the JPG original size (placing it in the center of the canvas).
I found also some analoguous PDF library - libharu.org. But I don't know how to compile it in my MS VC++ 6.0 for Windows.
|
|
|
|
|
Thanks a lot for your enthusiastic about this small library.
The original purpose of the library is to provide a start point of anyone who need a simple library code like this. Feel free to change it and use it at your own risk.
Regarding some of your concerns:
Computers always have trade-off. As an example, UNIX timestamps also have limitation of the possible time range. The trade-off of changing 256 to a unlimited number is the simplicity of the program. For my original motivation for writing this piece of code, 256 is enough for me. So I just want to keep the code simple.
Regarding 72DPI, I'm not 100% sure. But I think this is the predefined value in PDF spec. So I think changing this number to something else will only give yourself trouble.
Memory buffer size is also a trade-off technique for coding simplicity. I think you might not be able to find a case that will cause buffer overflow, since all the estimated sizes are safe enough. To give you an example, 2.45m is the current Olympic record for men high jump. So 3m will be a safe enough number to say no human can jump over. By using this kind of safe estimation, you might waste a little bit of memory, but you don't need to do dynamic memory allocation, which will spend extra time and also need add a lot of error checking logic.
For the paper size, I think even you use floating point number, but PDF actually will only use the integer value. You can easily verify this by opening any A4 document that generated by other PDF engine. I guess the reason is very simple, the error will only be 0.5*25.4/72=0.176mm, which is not significant at all.
Of course, there are many PDF libraries, they might provide more features that the code I have here, if you need those features, you should consider to switch.
|
|
|
|
|
Computers always have trade-off.
So I just want to keep the code simple.
I totally agree with you. I also will do my best to keep the derived code as simple as possible.
The trade-off of changing 256 to a unlimited number is the simplicity of the program.
By the way - I did not understand yet - where does this limitation come from? Maybe it is the probable RAM memory size limitation? If yes - than introducing a temp file usage would not break the trade-off significantly. The code will remain enough simple and understandable.
Of course, there are many PDF libraries, they might provide more features that the code I have here, if you need those features, you should consider to switch.
It is so much illusionary actually. I need exactly something for C++ - not C# and not Java. And the libs which conform - are too complicated to even compile them (not to tell about anything else).
Probably the trade-off you are speaking about may go broken if I stop using the memory buffers of the pre-set length. So I will think this issue through thouroughly before applying.
|
|
|
|
|
Hello, Hao Hu.
I made my own program based on your utility.
My program is called "fi2pdf". See more details here:
https://sourceforge.net/projects/freeimage/forums/forum/36111/topic/3721193/index/page/1
My program is intended to use the FreeImage library to create PDFs.
|
|
|
|
|
I made a new version of my program:
fi2pdf v2.1
See the details in the same place:
https://sourceforge.net/projects/freeimage/forums/forum/36111/topic/3721193/index/page/1
|
|
|
|
|
That's great.
The idea of posting code here is just to share with others.
And I'm happy to see people can use it and make change as necessary.
In fact, the original reason of developing this code was to generate PDF file on an embedded environment. So I don't have the luxury of file system. Everything need to be done within memory. e.g. jpegs are buffer in memory, and need to generate PDF in memory. Then send out by network.
I saw you took the advantage of file system to reduce the memory usage, that's totally good for an utility that you want.
|
|
|
|
|
Hello, Hao Hu.
I found a bug in your program.
Inside the function
STATUS Jpeg2PDF_GetFinalDocumentAndCleanup(PJPEG2PDF pPDF, UINT8 *outPDF, UINT32 *outPDFSize)
you should comment the statement:
if(outPDF && (*outPDFSize >= pPDF->currentOffSet))
To understand why, initialize *outPDFSize with zero (in the calling function) - and your program is to stop working (at least on the big-filesize 24 bit JPEGs).
It currently relies on the randomly non-zero initialized *outPDFSize - which works in Debug, but does not in Release (on the big-filesize 24 bit JPEGs).
|
|
|
|
|
Thanks for your message.
However, I think the place you point out is not right. The issue is within the test code.
Basically, when calling Jpeg2PDF_GetFinalDocumentAndCleanup(), I want user to pass in
their buffer size to avoid the situation of overflow. So the *outPDFSize should be initialized
in the caller as the size of the outPDF buffer. Later, I'll change that value and let caller
know the exact byte size that has been used. This is a very common way for API to get then set
a buffer size.
It seems complicate for me to update this article right now. (codeproject ask to submit my change to their editor for the update, which is much more complicate than before)
So what should been done is in the testMain.c:
Just remove the declaration of: pdfFinalSize and replace all pdfFinalSize with pdfSize.
So pdfSize will be a In/Out variable.
In: Caller let the jpeg2pdf know the byte size of pdfBuf
Out: jpeg2pdf let the caller know the actual byte size that has been used.
Thanks.
modified on Saturday, August 14, 2010 4:26 AM
|
|
|
|
|
Hello Hao Hu.
May I re-use your code under the "GPL 2 and later" license?
|
|
|
|