Click here to Skip to main content
14,971,553 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Hi,

Am facing problem.
I have to write a fucntion comp() that reads a text file of ASCII characters and writes in compressed form of those characters to another file.

The compression logic for comp() should leverage the fact that ASCII only uses the bottom (leastsignificant) seven bits of an 8-bit byte. The compression logic should simply squeeze out the 8th bit. The algorithms is described below for this compression logic to be implemented in comp().

Algorithm:

The program should consider the output to be a stream of bits and the 7 data bits from each input byte should simply be sent to the output bit stream, with the 8th bit being discarded. The 7 data bits should be sent to the output stream in order of increasing significance, bit 0 (least significant) first, bit 1 next, and so on.
Of course, the output is really a sequence of bytes so the logical output bit stream must be packed into output bytes. The bit stream is packed into bytes by assigning bits to increasing bit position (increasing significance) within a byte, and going to the next byte when the upper bit has been filled in the current byte. At end of file, if there are empty bit positions in the current byte, then fill them with zeros before outputing the byte.

As an example, a file consisting of the bytes 53 7F 63 4B should be "compressed" to the file containing D3 FF 78 09. (This file is too small to gain anything from this simple compression technique.

---------
Another function decomp(), that will read a compressed file (can be produced by function comp())
and will decompress it. The program should simply reproduce the ASCII file from which its input was derived (assuming the original file was a valid ASCII file with the 8th bit always 0).
Posted
Updated 28-Mar-18 2:57am
v2

If this isn't an assignment, and the algorithm isnt set in concrete, i would suggest taking a look at huffman coding[^].

If it is a set algorithm for you to implement then we generally don't do your homework but here goes
#include <stdio.h>
#include <string.h>

//comp()
void Compress(unsigned char *szOut, const char *szMessage) {
	unsigned long long nBuffer = 0; //This is enough to store 8 uncompressed characters or 9 compressed. We will only use 8 tho.
	char nBitsInBuffer = 0;
	while (*szMessage != 0) {
		nBuffer |= (unsigned long long)(*szMessage++ & 0x7F) << nBitsInBuffer;
		nBitsInBuffer += 7;
		if (nBitsInBuffer == 7 * 8) { //We have 8 chars in the buffer, dump them
			while (nBitsInBuffer > 0) {
				*szOut++ = nBuffer & 0xFF;
				nBuffer >>= 8;
				nBitsInBuffer -= 8;
			}
			//The following should be redundant, but just to be safe
			nBitsInBuffer = 0;
			nBuffer = 0;
		}
	}
	//Write out whatever is left in the buffer
	while (nBitsInBuffer > 0) {
		*szOut++ = nBuffer & 0xFF;
		nBuffer >>= 8;
		nBitsInBuffer -= 8;
	}
}

//uncomp()
//nCompressedLen is the number of bytes in the compressed buffer.
//NOTE: the compressed buffer does not have a NULL terminating character
void Uncompress(char *szOut, const unsigned char *szCompressed, unsigned nCompressedLen) {
	unsigned long long nBuffer = 0; //This is enough to store 8 uncompressed characters or 9 compressed. We will only use 8 tho.
	char nBitsInBuffer = 0;
	while (nCompressedLen) {
		while (nCompressedLen && nBitsInBuffer < 7 * 8) {
			nBuffer |= (unsigned long long)*szCompressed++ << nBitsInBuffer;
			nBitsInBuffer += 8;
			--nCompressedLen;
		}
		while (nBitsInBuffer > 0) {
			*szOut++ = nBuffer & 0x7F;
			nBuffer >>= 7;
			nBitsInBuffer -= 7;
		}
		//The following should be redundant, but just to be safe
		nBitsInBuffer = 0;
		nBuffer = 0;
	}
}

int main() {
	//char szMessage[] = "\x53\x7F\x63\x4B";
	char szMessage[] = "hello world. this is a compressed long string";
	static const unsigned nCompressedSize = sizeof(szMessage) * 7 / 8; //This does not include the NULL terminating character from the string
	unsigned char pCompressedBytes[nCompressedSize];
	char szUncompressed[sizeof(szMessage)];
	printf("     Message: %s\n", szMessage);
	Compress(pCompressedBytes, szMessage);
	printf("  Compressed: ");
	for (int nByte = 0; nByte < nCompressedSize; ++nByte) {
		printf("%02X ", pCompressedBytes[nByte]);
	}
	printf("\n");
	Uncompress(szUncompressed, pCompressedBytes, nCompressedSize);
	szUncompressed[sizeof(szMessage) - 1] = 0; //We need to terminate the string. The NULL terminator is not stored in the compressed bytes
	printf("Uncompressed: %s\n", szUncompressed);
	//Now just verify that we ended up with the same message we started with
	if (strcmp(szMessage, szUncompressed) == 0) {
		printf("Compression works\n");
	} else {
		printf("Compression failed\n");
	}
	return 0;
}


It works by using a 64bit integer as a buffer for the bits in order to fit 8 uncompressed characters or 7 compressed characters into the 1 integer. This really simplifies things as we can use standard bitwise operations on the integer. The compiler will convert the bitwise math into math that a x86 computer can work with (i.e. convert it into 2 32 bit numbers)

I have given basic explanations. You can figure out the rest.
   
Comments
Espen Harlinn 6-Feb-11 7:27am
   
Good effort, 5+
Not exactly a difficult problem; what is your question?
   
Comments
Member 13751349 28-Mar-18 9:21am
   
Is it possible to compress the string to 6 bit instead of 7 bit?
Richard MacCutchan 28-Mar-18 9:27am
   
Think about it.
CPallini 28-Mar-18 12:35pm
   
Beating the old horse? :-)
Richard MacCutchan 28-Mar-18 12:49pm
   
Yes I know, but was just replying to the message above, which arrived today.
ManKirat Kaur 29-Aug-20 12:49pm
   
which algo is used?

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900