4-bit Encoder/Decoder
Code for a 4-bit encoder to store 15 different symbols with higher efficiency
Introduction
Converts an 8 bit string to a 4-bit string (max. 15 different characters allowed).
Respectively: Converts two 8 bit strings to one 8 bit string.
Through this conversion, strings can be stored using only 1/2 of the size of a usual string
. This might be useful for a huge amount of data, that uses 15 different characters at max (like phone numbers).
Background
I was thinking, that storing telephone numbers in a database as string
s is a waste of memory. But storing as an integer is also not possible. My solution was to use an encoded string
.
Using the Code
Below, you see the implementation of the class. At the bottom, there is a test()
function, that shows how to use the code.
For customizing the symbols, that can be represented/encoded, change Encode4Bits._mappingTable
. Never use more than 15 customized values.
class Encode4Bits:
def __init__(self):
# first element is always "END"
self._mappingTable = ['\0', \
'0','1','2','3','4','5','6','7','8','9', \
'-','','','','']
def _encodeCharacter(self,char):
"""@return index of element or None, if not exists"""
for p in range(len(self._mappingTable)):
if(char == self._mappingTable[p]):
return p
return None
def encode(self, string):
strLen = len(string)
# ===== 1. map all chars to an index in our table =====
mappingIndices = []
for i in range(strLen):
char = string[i]
index = self._encodeCharacter(char)
if(index is None):
raise("ERROR: Could not encode '" + char + "'.")
mappingIndices.append(index)
mappingIndices.append(0)
# ===== 2. Make num values even =====
# 4 bit => 2 chars in one byte. Therefore: need even num values
if(len(mappingIndices) % 2 != 0):
mappingIndices.append(0)
# ===== 3. create string =====
ret = ""
i = 0
while True:
if(i >= len(mappingIndices)):
break # finished
val1 = mappingIndices[i]
val2 = mappingIndices[i+1]
val1 = val1 << 4
mixed = val1 | val2
char = chr(mixed)
ret += str(char)
i += 2
return ret
def decode(self, string):
ret = ""
for char in string:
index1 = (ord(char) & 0xF0) >> 4
index2 = (ord(char) & 0x0F)
ret += self._mappingTable[index1]
ret += self._mappingTable[index2]
return ret
def test():
numberCompressor = Encode4Bits()
encoded = numberCompressor.encode("067-845-512")
decoded = numberCompressor.decode(encoded)
print(len(decoded))
print(len(encoded))
if __name__ == "__main__":
test()
History
- 8th February, 2019: Initial version