Informatica logo


Login Register

  1. Home
  2. Issues
  3. Volume 37, Issue 1 (2026)
  4. Context-Aware Word-Based Forward Coding

Informatica

Information Submit your article For Referees Help ATTENTION!
  • Article info
  • Full article
  • Related articles
  • More
    Article info Full article Related articles

Context-Aware Word-Based Forward Coding
Volume 37, Issue 1 (2026), pp. 229–249
Igor Zavadskyi ORCID icon link to view author Igor Zavadskyi details   Shmuel T. Klein ORCID icon link to view author Shmuel T. Klein details   Dana Shapira ORCID icon link to view author Dana Shapira details  

Authors

 
Placeholder
https://doi.org/10.15388/26-INFOR621
Pub. online: 6 March 2026      Type: Research Article      Open accessOpen Access

Received
1 June 2025
Accepted
1 February 2026
Published
6 March 2026

Abstract

Forward-looking coding has recently been introduced as a source modelling paradigm that exploits predictions of forthcoming symbols. In this paper, we extend this methodology to word-level alphabets, enabling improved compression performance for large and variable-length symbol sets. We present a space-efficient scheme for encoding header information, with particular emphasis on the accurate representation of symbol frequency distributions. In addition, we propose an alternative ordering strategy for word-based dictionaries that leverages the adaptive nature of forward-looking compression. We further show how these techniques can be integrated with a word-based Prediction by Partial Matching model of order one, while avoiding the zero-frequency problem. Experimental results confirm the effectiveness of the proposed approach across multiple datasets.

References

 
Adiego, J., Martínez-Prieto, M.A., de la Fuente, P. (2009). High performance word-codeword mapping algorithm on PPM. In: Data Compression Conference, DCC 2009, Snowbird, UT, USA, pp. 23–32.
 
Aljehane, N.O., Teahan, W.J. (2017). Word-based grammars for PPM. International Journal of Advanced Computer Science and Applications, 8(10), 1–11.
 
Avrunin, R.M., Klein, S.T., Shapira, D. (2022). Combining forward compression with PPM. SN Computer Science, 3(239), 239.
 
Baruch, G., Klein, S.T., Shapira, D. (2017). A space efficient direct access data structure. Journal of Discrete Algorithms, 43, 26–37.
 
Baruch, G., Klein, S.T., Shapira, D. (2020). Accelerated partial decoding in wavelet trees. Discrete Applied Mathematics, 274, 2–10. https://doi.org/10.1016/j.dam.2018.07.016.
 
Bratley, P., Choueka, Y. (1982). Processing truncated terms in document retrieval systems. Information Processing and Management, 18(5), 257–266.
 
Burrows, M., Wheeler, D.J. (1994). A Block-Sorting Lossless Data Compression Algorithm. Technical Report 124. Digital Equipment Corporation.
 
Cleary, J., Witten, I. (1984). Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications, 32(4), 396–402.
 
Elias, P. (1975). Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, 21(2), 194–203.
 
Ferragina, P., Rotundo, M., Vinciguerra, G. (2025). Two-level massive string dictionaries. Information Systems, 128, 102490
 
Fruchtman, A., Gross, Y., Klein, S.T., Shapira, D. (2021). Backward weighted coding. In: Bilgin, A., Marcellin, M.W., Serra-Sagristà, J., Storer, J.A. (Eds.), 31st Data Compression Conference, DCC 2021, Snowbird, UT, USA, March 23–26, 2021. IEEE, pp. 93–102.
 
Fruchtman, A., Gross, Y., Klein, S.T., Shapira, D. (2022). Weighted forward looking adaptive coding. Theoretical Computer Science, 930, 86–99.
 
Fruchtman, A., Gross, Y., Klein, S.T., Shapira, D. (2023). Bidirectional adaptive compression. Discrete Applied Mathematics, 330, 40–50.
 
Grossi, R., Vitter, J.S. (2005). Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM Journal on Computing, 35(2), 378–407. https://doi.org/10.1137/S0097539702402354.
 
Grossi, R., Gupta, A., Vitter, J.S. (2003). High-order entropy-compressed text indexes. In: SODA ’03: Proceedings of the 14th Annual SIAM/ACM Symposium on Discrete Algorithms (SODA), pp. 841–850.
 
Huffman, D. (1952). A method for the construction of minimum redundancy codes. Proceedings of the IRE, 40(9), 1098–1101.
 
Klein, S.T., Saadia, S., Shapira, D. (2020). Forward looking Huffman coding. Theory of Computing Systems, 65, 593–612.
 
Klein, S.T., Shapira, D. (2009). On the usefulness of backspace. In: Proceedings of the Prague Stringology Conference, 2009, Prague, Czech Republic, August 31–September 2, pp. 80–89.
 
Moffat, A. (1989). Word-based text compression. Software: Practice and Experience, 19(2), 185–198.
 
Navarro, G. (2016). Compact Data Structures: A Practical Approach. Cambridge University Press, Cambridge, UK.
 
Seward, J. (2019). bzip2 and libbzip2, version 1.0.8. A program and library for data compression.
 
Vitter, J.S. (1987). Design and analysis of dynamic Huffman codes. Journal of the ACM, 34(4), 825–845.
 
Zavadskyi, I.O., Anisimov, A.V. (2020). Reverse multi-delimiter compression codes. In: Bilgin, A., Marcellin, M.W., Serra-Sagristà, J., Storer, J.A. (Eds.), Data Compression Conference, DCC 2020, Snowbird, UT, USA, March 24–27, 2020. IEEE, pp. 173–182.
 
Zavadskyi, I.O., Klein, S.T., Shapira, D. (2024). Word-based forward coding. In: Data Compression Conference, DCC 2024, Snowbird, UT, USA, pp. 352–361.
 
Zipf, G.K. (1949). Human Behavior and the Principle of Least Effort: An Introduction to Human Eoclogy. Addison-Wesley Press, Cambridge, UK.

Biographies

Zavadskyi Igor
https://orcid.org/0000-0002-4826-5265
ihorzavadskyi@knu.ua

I. Zavadskyi, PhD, DSc, professor at the Department of Mathematical Informatics, Faculty of Computer Science and Cybernetics, Taras Shevchenko National University of Kyiv, Ukraine. He received his PhD in computer science from the National Academy of Sciences of Ukraine in 2001 and his DSc degree in 2020. He is the author of more than 100 research papers and 25 textbooks and instructional manuals in computer science. His current research interests include information and coding theory, data compression, database management systems, and e-learning technologies. Prof. Zavadskyi is a member of the programme committee of the Data Compression Conference (DCC).

Klein Shmuel T.
https://orcid.org/0000-0002-9478-3303
tomi@cs.biu.ac.il

S.T. Klein received a BSc in mathematics and statistics from the Hebrew University of Jerusalem in 1978, an MSc degree in applied mathematics, and a PhD in computer science, both from the Weizmann Institute of Science in Rehovot, Israel, in 1981 and 1988, respectively. He then spent three years as a visiting assistant professor at the University of Chicago, and returned in 1990 to Israel, where he has been with the Department of Computer Science of Bar Ilan University in Ramat Gan, Israel.

Prof. Klein’s research focuses on text processing and, in particular, data compression. He has worked on two of the largest full-text information retrieval systems: the Responsa Project at Bar Ilan and the Trésor de la Langue Française at the University of Chicago. He has authored two books, more than a hundred refereed journal and conference papers, and thirteen patents. He is currently a professor emeritus at Bar Ilan University, where he has been the chair of the Computer Science Department for several years.

Shapira Dana
https://orcid.org/0000-0002-2320-9064
shapird@g.ariel.ac.il

D. Shapira received the BSc (cum laude), MSc (magna cum laude), and PhD degrees in mathematics and computer science from Bar Ilan University, Israel, in 1993, 1996 and 2001, respectively. She was then a postdoctoral fellow with Brandeis University, Waltham, MA, USA. She is currently a professor at Ariel University, Israel. She has coauthored over 50 refereed articles in scientific journals and over 70 in conference proceedings, and has been granted 4 patents. Her research interests include data compression, algorithm development and analysis and scheduling theory. She has served as the head of the Department of Computer Sciences, and is currently the head of the MSc Degree Program. Prof. Shapira has been a member of the programme committee of the Data Compression Conference (DCC) since 2011.


Full article Related articles PDF XML
Full article Related articles PDF XML

Copyright
© 2026 Vilnius University
by logo by logo
Open access article under the CC BY license.

Keywords
data compression adaptive coding arithmetic coding entropy forward coding

Metrics
since January 2020
176

Article info
views

39

Full article
views

42

PDF
downloads

21

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

INFORMATICA

  • Online ISSN: 1822-8844
  • Print ISSN: 0868-4952
  • Copyright © 2023 Vilnius University

About

  • About journal

For contributors

  • OA Policy
  • Submit your article
  • Instructions for Referees
    •  

    •  

Contact us

  • Institute of Data Science and Digital Technologies
  • Vilnius University

    Akademijos St. 4

    08412 Vilnius, Lithuania

    Phone: (+370 5) 2109 338

    E-mail: informatica@mii.vu.lt

    https://informatica.vu.lt/journal/INFORMATICA
Powered by PubliMill  •  Privacy policy