Multiple Character Modification for Huffman Algorithm

Wilham, Reyssan (2020) Multiple Character Modification for Huffman Algorithm. S1 thesis, Universitas Mataram.

[img] Text
The_4th_ICST_2019_paper_12.pdf
Restricted to Repository staff only

Download (287kB)

Abstract

Data compression is a method to reduce a file size. The reduction will also speed up the data transmission between devices. Huffman algorithm is one of the lossless data compressions methods, which calculates characters' occurrence as the reference to convert the characters to the related bit strings. The more frequent the occurrence, the shorter the converted bit string obtained. In a text file, a sequence of characters (or letter) may occur frequently. These sequences consist of twin characters or combination of two consonants (eg. 'aa', 'gg', 'ny', and 'ng'), as a part of a word. If a sequence is replaced with a new symbol outside the alphabet, the vocabulary will increase. However, if the symbols frequently appear in the text, they will be coded to a shorted bit string that we believe will increase the Huffman text ratio compression. We did experiments by converting some sequences of characters in a text file to new symbols. To find the recommended sequences, we only converted the sequences that their probability was higher than an observed threshold. Therefore, we observed several thresholds to find which sequences that potentially converted to symbols. Our test data consisted of five text files: three raw text file and two files of doc extension. To verify whether conversion of some sequence characters improved the Huffman ratio compression, the Huffman compression was implemented on the original test data and on the converted test data. The experimental results showed that the threshold of 1% is the best. Evaluation on the original test data resulted the Huffman ratio compression of 44.88% and 88.84% for the raw text file and for the doc extension file respectively. This experiment proofs that the conversion of some sequence characters to symbols benefits to the Huffman Algorithm by obtaining 45.93% and 89.09% for the raw text file and for the doc extension file respectively.

Item Type: Thesis (S1)
Keywords (Kata Kunci): Data Compression, Huffman Algorithm
Subjects: T Technology > T Technology (General)
Divisions: Fakultas Teknik
Depositing User: Rini Trisnawati
Date Deposited: 18 Jun 2020 01:25
Last Modified: 18 Jun 2020 01:25
URI: http://eprints.unram.ac.id/id/eprint/15729

Actions (login required)

View Item View Item