Click here to Skip to main content
15,888,802 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have few issues in decompressing gzipped http response, I separated data part from headers but its gzip header and message contain \0 characters due to which there is problem in copying gzipped data.

I've used libcurl but it is relatively slower than C sockets.

Here is some part of a sample response:

HTTP/1.1 200 OK
Cache-Control: private, max-age=0
Content-Type: text/html; charset=utf-8
P3P: CP="NON UNI COM NAV STA LOC CURa DEVa PSAa PSDa OUR IND"
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 12605
Date: Mon, 05 Mar 2012 11:46:30 GMT
Connection: keep-alive
Set-Cookie: _FP=EM=1; expires=Wed, 05-Mar-2014 11:46:29 GMT; domain=.bing.com; path=/

---BINARY DATA---

Sample code:

#define MAXDATASIZE 1024

char *recvData; // Holds entire gzip data
char recvBuff[MAXDATASIZE]; // Holds gzip chunk
int offset=0;
while(1){
    recvBytes = recv(sockfd, &recvBuff, MAXDATASIZE-1, 0);
    totalRecvBytes += recvBytes;

    // cl = content length, used regex here to extract it
		if(!clfnd){
			cl = atoi(clarr);
			clfnd=1;
			recvData = malloc(cl * sizeof(char));
			memset(recvData, 0, sizeof recvData);
		}

    // get data part from 1st iteration, furthur iterations contain only data
    if(!datasplit){
        int strtidx;
        char *datastrt = strstr(&recvBuff, "\r\n\r\n");
        if(datastrt != NULL){
            strtidx = datastrt - recvBuff + 4;
            // Following memcpy() is problematic due to second parameter i guess
            memcpy(recvData, recvBuff + strtidx, recvBytes-strtidx);
            datasplit=1;
            offset = recvBytes-strtidx;
        }
    }
    else{
        memcpy(recvData + offset, recvBuff, recvBytes);
        offset += recvBytes;
    }
    // program uses blocking socket so after reading last 420 bytes it exits the loop
    if (recvBytes<MAXDATASIZE-1)
        break;
}

char *outData = malloc(offset*4 * sizeof(char));
memset(outData, 0, sizeof outData);
int ret = inf(recvData, offset, outData, offset*4);


Inflate function:

int inf(const char *src, int srcLen, const char *dst, int dstLen){
z_stream strm;
strm.zalloc=NULL;
strm.zfree=NULL;
strm.opaque=NULL;

strm.avail_in = srcLen;
strm.avail_out = dstLen;
strm.next_in = (Bytef *)src;
strm.next_out = (Bytef *)dst;

int err=-1, ret=-1;
err = inflateInit2(&strm, MAX_WBITS+16);
if (err == Z_OK){
    err = inflate(&strm, Z_FINISH);
    if (err == Z_STREAM_END){
        ret = strm.total_out;
    }
    else{
        inflateEnd(&strm);
        return err;
    }
}
else{
    inflateEnd(&strm);
    return err;
}
inflateEnd(&strm);
printf("%s\n", dst);
return err;
}
Posted
Updated 7-Mar-12 3:04am
v4

1 solution

Problem solved ! memcpy() was copying \0 chars but eclipse debugger was not showing entire gzip chunk and was breaking at \0. Updated the above code.
 
Share this answer
 
Comments
d3llt4 19-Jun-13 11:39am    
i have some questions for AZ rescuer. i am doing something similar to you. i am trying to get http response data from the packets. 1-how do i differentiate between two different pages.? 2-how do i parse these packets together in real time using C? 3-how do i program in C, to capture only the body i.e content after the header in the first packet. and how does the program keep track of the subsequent packets? 4-how does the program know when to stop?
CHill60 19-Jun-13 19:15pm    
Might be a good idea to post a separate question ... more people are likely to spot an unanswered question than an old, resolved, post
d3llt4 20-Jun-13 8:24am    
i did. but the admin closed that thread.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900