<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">INFORMATICA</journal-id>
<journal-title-group><journal-title>Informatica</journal-title></journal-title-group>
<issn pub-type="epub">1822-8844</issn><issn pub-type="ppub">0868-4952</issn><issn-l>0868-4952</issn-l>
<publisher>
<publisher-name>Vilnius University</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">INFOR592</article-id>
<article-id pub-id-type="doi">10.15388/25-INFOR592</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Research Article</subject></subj-group></article-categories>
<title-group>
<article-title>Open Llama2 Models for the Lithuanian Language</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Nakvosas</surname><given-names>Artūras</given-names></name><email xlink:href="arturas@neurotechnology.com">arturas@neurotechnology.com</email><xref ref-type="aff" rid="j_infor592_aff_001"/><xref ref-type="corresp" rid="cor1">∗</xref><bio>
<p><bold>A. Nakvosas</bold> was born in Lithuania in 1986. He received a bachelor’s degree in 2009 and a master’s degree in 2012 from Šiauliai University. Since 2013, he has been working in R&amp;D at Neurotechnology. His work spans machine learning and biometric systems, including fingerprint, iris, and facial recognition, and has been recognized in NIST evaluations. In recent years, his research has expanded to natural language processing, with a focus on speech-to-text, text-to-speech, and large language models. His interests include deep learning, transformer architectures, signal processing, and software engineering.</p></bio>
</contrib>
<contrib contrib-type="author">
<name><surname>Daniušis</surname><given-names>Povilas</given-names></name><email xlink:href="povilasd@neurotechnology.com">povilasd@neurotechnology.com</email><xref ref-type="aff" rid="j_infor592_aff_001"/><bio>
<p><bold>P. Daniušis</bold> was born in Lithuania in 1983. He received a bachelor’s degree (mathematics) from Šiauliai University in 2005, a master’s degree (mathematics) from Vilnius University in 2007, and a PhD (computer science) from Vilnius University in 2012. He has been working at Neurotechnology since 2010. His research interests include AI, artificial neural networks, adaptive robotics, causal inference, and statistical dependence estimation.</p></bio>
</contrib>
<contrib contrib-type="author">
<name><surname>Mulevičius</surname><given-names>Vytas</given-names></name><email xlink:href="vytas.mulevicius@neurotechnology.com">vytas.mulevicius@neurotechnology.com</email><xref ref-type="aff" rid="j_infor592_aff_001"/><bio>
<p><bold>V. Mulevičius</bold> (born in 1997) earned his bachelor’s degree in computer science from the University of Birmingham in 2020. During his studies, he developed a strong interest in artificial intelligence and natural language processing (NLP), which led him to join the NLP team at Neurotechnology in 2018. Since then, he has been actively involved in the development of language technologies, contributing to various projects ranging from speech recognition and text analysis to the creation of large-scale language models.</p></bio>
</contrib>
<aff id="j_infor592_aff_001"><institution>Neurotechnology</institution>, Laisvės av. 125A, LT-06118, Vilnius, <country>Lithuania</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2025</year></pub-date><pub-date pub-type="epub"><day>25</day><month>4</month><year>2025</year></pub-date><volume>36</volume><issue>2</issue><fpage>385</fpage><lpage>406</lpage><history><date date-type="received"><month>9</month><year>2024</year></date><date date-type="accepted"><month>4</month><year>2025</year></date></history>
<permissions><copyright-statement>© 2025 Vilnius University</copyright-statement><copyright-year>2025</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>In this paper, we focus on the problem of whether efficient Lithuanian large language models (LLMs) can be achieved from Llama2 LLMs, which lack Lithuanian-specific components. Although the Llama2 architecture was previously successfully utilised to derive various regional LLMs, we propose and describe the first open Llama2 LLMs for the Lithuanian language (7 and 13 billion parameter versions), an accompanying question/answer (Q/A) dataset, and translations of popular language understanding benchmarks (Arc, Belebele, Hellaswag, MMLU, TruthfulQA, and Winogrande), which contribute to the standardisation of Lithuanian LLM evaluation. We empirically evaluate the proposed models by investigating their perplexity and performance in the translated language understanding benchmarks. The perplexity experiments show that it decreases consistently during pretraining, reflecting enhanced next-token prediction capabilities. Benchmarking the proposed LLMs against language understanding tasks reveals that high-quality pretraining datasets may be essential to achieve models that perform efficiently on these benchmarks. Comparison of the proposed LLMs with the latest open multilingual LLM shows that our model with 13 billion parameters is ranked 4th of 8 models in tasks such as Arc, Hellaswag, and Winogrande, but is generally outperformed in other tasks. These benchmarks allow us to hypothesise that from recent LLMs more efficient Lithuanian language models can be derived in the future. The complete realisations of the LLMs and other contributed components are available in the accompanying open repository <uri>https://huggingface.co/neurotechnology</uri>.</p>
</abstract>
<kwd-group>
<label>Key words</label>
<kwd>Llama2</kwd>
<kwd>Regional LLMs</kwd>
<kwd>LLMs for the Lithuanian language</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="j_infor592_s_001">
<label>1</label>
<title>Introduction</title>
<p>Large language models (LLMs), relying on the Transformer architecture proposed by Vaswani <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor592_ref_044">2017</xref>) have shown remarkable effectiveness in many natural language processing (NLP) tasks (Minaee <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_028">2024</xref>; Naveed <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_029">2024</xref>). This has primarily been fuelled by increasingly large model parameterisations and training datasets, which are deemed essential according to neural scaling laws (Hernandez <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_016">2022</xref>). On the other hand, with the consistent advancement of computational linguistics and NLP, open LLMs such as Llama2 (Touvron <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_042">2023</xref>), Mistral (Jiang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_019">2023</xref>), Mixtral (Jiang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_020">2024</xref>), Falcon (Almazrouei <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_001">2023</xref>) were released. The performance characteristics of these open models are comparable with their commercial counterparts. Such LLMs usually require massive datasets and considerable computational resources. For example, the pretraining of the Llama2 family was carried out using a 2 trillion token set and required 3311616 GPU hours (Touvron <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_042">2023</xref>). In addition to direct applications, these models can be further trained for various downstream problems (Minaee <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_028">2024</xref>), including regional language modelling.</p>
<p>For this application, it is important to note that open LLMs are usually trained with largely English texts (e.g. Touvron <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor592_ref_042">2023</xref>) indicate that <inline-formula id="j_infor592_ineq_001"><alternatives><mml:math>
<mml:mo mathvariant="normal">&gt;</mml:mo>
<mml:mn>89</mml:mn>
<mml:mi mathvariant="normal">%</mml:mi></mml:math><tex-math><![CDATA[$\gt 89\% $]]></tex-math></alternatives></inline-formula> of the dataset, which was used to pretrain Llama2 consisted of English texts), resulting in a lack of performance for less common languages. Although commercial LLMs usually better support underrepresented languages, as a rule, they are exposed only via APIs, which do not provide access to the model’s parameters or its intermediate representations. Since there are <inline-formula id="j_infor592_ineq_002"><alternatives><mml:math>
<mml:mo stretchy="false">≈</mml:mo>
<mml:mn>380</mml:mn></mml:math><tex-math><![CDATA[$\approx 380$]]></tex-math></alternatives></inline-formula> non-English languages with at least 1 million speakers (Eberhard <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_007">2021</xref>), open regional LLMs constitute an important research direction and there have been multiple recent attempts to achieve efficient open LLMs, tailored for various regional languages (see Section <xref rid="j_infor592_s_002">2</xref>, Table <xref rid="j_infor592_tab_001">1</xref>).</p>
<p>Regional LLM training is a challenging technical task not only computationally but also from the perspective of training data, which should reflect a rich structure of the language of interest, local cultural nuances, and domain-specific knowledge in multiple areas. Although this is only partially solved by massive multilingual datasets, such as CulturaX (Nguyen <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_031">2023</xref>), it is still an important challenge to collect representative datasets in regional languages.</p>
<p>Open LLMs are also potentially useful for NLP research as their internal mechanism is fully transparent. There are also related applications outside the scope of NLP. For example, successful regional LLMs can significantly impact areas such as education, public services, healthcare, and cultural preservation.</p>
<p>This article describes Neurotechnology’s<xref ref-type="fn" rid="j_infor592_fn_001">1</xref><fn id="j_infor592_fn_001"><label><sup>1</sup></label>
<p>Neurotechnology (<ext-link ext-link-type="uri" xlink:href="http://www.neurotechnology.com">http://www.neurotechnology.com</ext-link>) is a Lithuanian company, specialising in artificial intelligence, biometrics, computer vision, and deep neural networks.</p></fn> contribution to regional LLM research, consisting of</p>
<list>
<list-item id="j_infor592_li_001">
<label>•</label>
<p>Llama2-based 7 and 13 billion parameter LLMs for the Lithuanian language, and their empirical evaluation;</p>
</list-item>
<list-item id="j_infor592_li_002">
<label>•</label>
<p>A new dataset, consisting of 13,848 Q/A pairs primarily about Lithuania and Lithuanian history (in the Lithuanian language);</p>
</list-item>
<list-item id="j_infor592_li_003">
<label>•</label>
<p>Translations of popular LLM benchmarks to the Lithuanian language;</p>
</list-item>
<list-item id="j_infor592_li_004">
<label>•</label>
<p>Open repository, containing all the mentioned components.</p>
</list-item>
</list>
<p>In this article, we investigate whether efficient Lithuanian LLMs can be achieved from Llama2 LLMs (which do not have the Lithuanian component, according to Touvron <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_042">2023</xref>). In our opinion, for this research the Llama2 architecture is potentially advantageous against other similar open LLMs without Lithuanian language support (e.g. Mistral), since it allows experimentation with different model sizes, and its 13 billion parameter version nearly matches the performance of Mistral, as shown by Jiang <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor592_ref_019">2023</xref>).</p>
<p>We structure our paper by starting with a short review of the related work in Section <xref rid="j_infor592_s_002">2</xref>. Section <xref rid="j_infor592_s_003">3</xref> describes the proposed LLMs and other contributed components, and Section <xref rid="j_infor592_s_004">4</xref> is devoted to an empirical evaluation. Finally, the conclusive Section <xref rid="j_infor592_s_008">5</xref> summarises the research conducted from different perspectives.</p>
</sec>
<sec id="j_infor592_s_002">
<label>2</label>
<title>Related Work</title>
<p><bold>Llama2 model.</bold> Transformer-based Llama2 is available in different parameter sizes (e.g. 7, 13, and 70 billion parameters). The model is first pretrained using a 2 trillion token set, collected from public sources, and utilising a self-supervised autoregressive approach with cross-entropy loss. Afterward, it is fine-tuned using publicly available instruction datasets, augmented with human-annotated data, and Reinforcement Learning with Human Feedback (RLHF) methodologies (Touvron <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_042">2023</xref>).</p>
<p>This model can support the maximum context length of the 4096 tokens. According to benchmarks, Llama2 generally performs on par with various open alternatives (e.g. Falcon (Almazrouei <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_001">2023</xref>), Mistral (Jiang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_019">2023</xref>) and Mixtral (Jiang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_020">2024</xref>)), which also may be advantageous in specific scenarios. For example, Falcon is recognized for its strong performance at higher parameter counts, and Mistral/Mixtral are generally lighter-weight models that emphasize efficiency and specialized use cases. Compared to these models, Llama2 aims to maintain a balance between robust performance and scalability. As is common with large foundational models, it can be further successfully tuned for various downstream tasks, including regional language modelling.</p>
<p><bold>LLMs for regional languages.</bold> Table <xref rid="j_infor592_tab_001">1</xref> summarises LLMs tailored for common European languages, reflecting the recent contributions from the research and engineering community working in this direction. We include only those regional LLMs, that meet the following criteria:</p>
<list>
<list-item id="j_infor592_li_005">
<label>•</label>
<p>The model should be published in an open repository (e.g. Hugging Face<xref ref-type="fn" rid="j_infor592_fn_002">2</xref><fn id="j_infor592_fn_002"><label><sup>2</sup></label>
<p><ext-link ext-link-type="uri" xlink:href="https://huggingface.co/">https://huggingface.co/</ext-link></p></fn>),</p>
</list-item>
<list-item id="j_infor592_li_006">
<label>•</label>
<p>It should contain at least a minimal description (architecture, training data, and other details).</p>
</list-item>
</list>
<p>According to Table <xref rid="j_infor592_tab_001">1</xref>, open LLMs are released for the majority of common European languages. Table <xref rid="j_infor592_tab_001">1</xref> shows that Llama2 and Mistral are the leading architectures for open LLMs for regional European languages, and 7 billion parameter models are the most common. Table <xref rid="j_infor592_tab_001">1</xref> also reveals that full parameter training is conducted in the majority of cases (19 cases from 20), instead of the parameter-efficient fine-tuning (PEFT) based approach. However, in some instances (2 cases from 20) regional LLMs were trained using PEFT methods, such as LoRA (Hu <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_017">2022</xref>), which may result in less accurate models compared to full-parameter training, although with the lower computational costs. In addition, quite often only the model itself is published (11 out of 20 cases), without an accompanying citable document (e.g. technical report/peer-reviewed publication), or training and evaluation datasets. In our opinion, the lack of accompanying scientific documentation limits the potential usefulness of the released regional LLMs in various important aspects, including their reproducibility, empirical performance assessment, and establishing a connection to the existing related results.</p>
<p><bold>Multilingual LLMs that support the Lithuanian language.</bold> Another way of achieving LLMs with regional language support is to train models for multiple languages simultaneously. Although this approach requires much more computational and data resources (for instance, EuroLLM was pretrained using 256 Nvidia H100 GPU’s and 4 trillion token set, as indicated by Martins <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_026">2024</xref>), compared to learning models only for single language, recent open LLMs that support Lithuanian language are multilingual (e.g. Llama3.X (Grattafiori <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_014">2024</xref>), and Gemma2 (Riviere and et al., <xref ref-type="bibr" rid="j_infor592_ref_037">2024</xref>), and EuroLLM). Although these LLMs perform quite similarly on various benchmarks, there are applications in which some of these models are advantageous against other counterparts. For example, Gemma2 is potentially more suitable for general knowledge and reasoning tasks, Llama3.1 is efficient in coding and complex problem-solving tasks, and EuroLLM is optimised for European languages. All these multilingual models with Lithuanian language support were published in later stages or after our research and currently represent state-of-the-art (SOTA) in the field of open LLMs.</p>
<table-wrap id="j_infor592_tab_001">
<label>Table 1</label>
<caption>
<p>Open LLM models for regional European languages. The F/P column denotes whether the model was full-parameter trained (F), or trained via PEFT (P), and Doc. column shows whether the corresponding model has an accompanying publication.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Language and reference</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Architecture</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Size</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">F/P</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Doc.</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Bulgarian (INSAIT, <xref ref-type="bibr" rid="j_infor592_ref_018">2024</xref>)</td>
<td style="vertical-align: top; text-align: left">Mistral</td>
<td style="vertical-align: top; text-align: left">7B</td>
<td style="vertical-align: top; text-align: left">F</td>
<td style="vertical-align: top; text-align: left">No</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Danish (Mabeck, <xref ref-type="bibr" rid="j_infor592_ref_025">2024</xref>)</td>
<td style="vertical-align: top; text-align: left">Mistral</td>
<td style="vertical-align: top; text-align: left">7B</td>
<td style="vertical-align: top; text-align: left">F</td>
<td style="vertical-align: top; text-align: left">No</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Dutch (Rijgersberg, <xref ref-type="bibr" rid="j_infor592_ref_036">2024</xref>)</td>
<td style="vertical-align: top; text-align: left">Mistral</td>
<td style="vertical-align: top; text-align: left">7B</td>
<td style="vertical-align: top; text-align: left">F</td>
<td style="vertical-align: top; text-align: left">No</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">French-English (Faysse <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_009">2024</xref>)</td>
<td style="vertical-align: top; text-align: left">Llama</td>
<td style="vertical-align: top; text-align: left">1.3B</td>
<td style="vertical-align: top; text-align: left">F</td>
<td style="vertical-align: top; text-align: left">Yes</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">German (Plüster and Schuhmann, <xref ref-type="bibr" rid="j_infor592_ref_034">2024</xref>)</td>
<td style="vertical-align: top; text-align: left">Llama2</td>
<td style="vertical-align: top; text-align: left">7B,13B</td>
<td style="vertical-align: top; text-align: left">F</td>
<td style="vertical-align: top; text-align: left">No</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Greek (SPAHE, <xref ref-type="bibr" rid="j_infor592_ref_040">2024</xref>)</td>
<td style="vertical-align: top; text-align: left">Mistral</td>
<td style="vertical-align: top; text-align: left">7B</td>
<td style="vertical-align: top; text-align: left">F</td>
<td style="vertical-align: top; text-align: left">No</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Hungarian-English (Csaki <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_005">2024</xref>)</td>
<td style="vertical-align: top; text-align: left">Llama2</td>
<td style="vertical-align: top; text-align: left">7B</td>
<td style="vertical-align: top; text-align: left">F</td>
<td style="vertical-align: top; text-align: left">Yes</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Finnish and other (LumiOpen, <xref ref-type="bibr" rid="j_infor592_ref_024">2024</xref>)</td>
<td style="vertical-align: top; text-align: left">Llama2</td>
<td style="vertical-align: top; text-align: left">7B–33B</td>
<td style="vertical-align: top; text-align: left">F</td>
<td style="vertical-align: top; text-align: left">No</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Icelandic (Snæbjarnarson <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_039">2022</xref>)</td>
<td style="vertical-align: top; text-align: left">RoBERTa</td>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">F</td>
<td style="vertical-align: top; text-align: left">Yes</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Italian (Bacciu <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_002">2023</xref>)</td>
<td style="vertical-align: top; text-align: left">Llama2</td>
<td style="vertical-align: top; text-align: left">7B,13B</td>
<td style="vertical-align: top; text-align: left">P</td>
<td style="vertical-align: top; text-align: left">Yes</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Lithuanian (<bold>Ours</bold>)</td>
<td style="vertical-align: top; text-align: left">Llama2</td>
<td style="vertical-align: top; text-align: left">7B,13B</td>
<td style="vertical-align: top; text-align: left">F</td>
<td style="vertical-align: top; text-align: left">Yes</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Norwegian (Norallm, <xref ref-type="bibr" rid="j_infor592_ref_032">2024</xref>)</td>
<td style="vertical-align: top; text-align: left">Mistral</td>
<td style="vertical-align: top; text-align: left">7B</td>
<td style="vertical-align: top; text-align: left">F</td>
<td style="vertical-align: top; text-align: left">No</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Serbian, Bosnian, Croatian (Gordić, <xref ref-type="bibr" rid="j_infor592_ref_013">2024</xref>)</td>
<td style="vertical-align: top; text-align: left">Mistral</td>
<td style="vertical-align: top; text-align: left">7B</td>
<td style="vertical-align: top; text-align: left">F</td>
<td style="vertical-align: top; text-align: left">No</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Spanish (Projecte AINA, <xref ref-type="bibr" rid="j_infor592_ref_035">2024</xref>)</td>
<td style="vertical-align: top; text-align: left">Falcon</td>
<td style="vertical-align: top; text-align: left">7B</td>
<td style="vertical-align: top; text-align: left">F</td>
<td style="vertical-align: top; text-align: left">No</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Swedish (Ekgren <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_008">2023</xref>)</td>
<td style="vertical-align: top; text-align: left">GPT-SW3</td>
<td style="vertical-align: top; text-align: left">126M–40B</td>
<td style="vertical-align: top; text-align: left">F</td>
<td style="vertical-align: top; text-align: left">Yes</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Slovenian (Ulčar and Robnik-Šikonja, <xref ref-type="bibr" rid="j_infor592_ref_043">2021</xref>)</td>
<td style="vertical-align: top; text-align: left">RoBERTa</td>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">F</td>
<td style="vertical-align: top; text-align: left">Yes</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Polish (Speakleash, <xref ref-type="bibr" rid="j_infor592_ref_041">2024</xref>)</td>
<td style="vertical-align: top; text-align: left">Mistral</td>
<td style="vertical-align: top; text-align: left">7B</td>
<td style="vertical-align: top; text-align: left">F</td>
<td style="vertical-align: top; text-align: left">No</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Ukrainian (Boros <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_003">2024</xref>)</td>
<td style="vertical-align: top; text-align: left">Mistral</td>
<td style="vertical-align: top; text-align: left">7B</td>
<td style="vertical-align: top; text-align: left">F</td>
<td style="vertical-align: top; text-align: left">Yes</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Portuguese (Garcia <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_011">2024</xref>)</td>
<td style="vertical-align: top; text-align: left">Phi-2B</td>
<td style="vertical-align: top; text-align: left">1.3B–7B</td>
<td style="vertical-align: top; text-align: left">P</td>
<td style="vertical-align: top; text-align: left">Yes</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Romanian (Masala <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_027">2024</xref>)</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Llama2</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">7B</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">F</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Yes</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="j_infor592_s_003">
<label>3</label>
<title>Proposed Open LLMs and Accompanying Components</title>
<p><bold>Proposed open LLMs and their training details.</bold> We trained the proposed LLMs (including tokenizers) from Llama2-7B and Llama2-13B, respectively (Table <xref rid="j_infor592_tab_002">2</xref>).</p>
<table-wrap id="j_infor592_tab_002">
<label>Table 2</label>
<caption>
<p>Overview of Llama-based LLMs.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Model name</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Description</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Llama2-7B</td>
<td style="vertical-align: top; text-align: left">A second-generation Llama foundational language model with 7 billion parameters (no Lithuanian language support).<xref ref-type="fn" rid="j_infor592_fn_003">3</xref></td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">LT-Llama2-7B</td>
<td style="vertical-align: top; text-align: left">Proposed Lithuanian LLM derived from Llama2-7B model, according to information, provided in Section<xref rid="j_infor592_s_003">3</xref>.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Llama2-13B</td>
<td style="vertical-align: top; text-align: left">A second-generation Llama foundational language model with 13 billion parameters (no Lithuanian language support).<xref ref-type="fn" rid="j_infor592_fn_004">4</xref></td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">LT-Llama2-13B</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Proposed Lithuanian LLM derived from Llama2-13B model, according to information, provided in Section<xref rid="j_infor592_s_003">3</xref>.</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><fn id="j_infor592_fn_003"><label><sup>3</sup></label>
<p><uri>https://huggingface.co/meta-llama/Llama-2-7b</uri></p></fn></p>
<p><fn id="j_infor592_fn_004"><label><sup>4</sup></label>
<p><uri>https://huggingface.co/meta-llama/Llama-2-13b</uri></p></fn></p>
<p>The training follows a standard two-step approach, consisting of autoregressive pretraining and supervised fine-tuning, which schematically is depicted in Fig. <xref rid="j_infor592_fig_001">1</xref>.</p>
<p><bold>Autoregressive pretraining</bold> was performed on the Lithuanian component of the CulturaX dataset (Nguyen <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_031">2023</xref>). It is the most intensive step computationally (Table <xref rid="j_infor592_tab_003">3</xref>), and corresponds to the integration of the Lithuanian language into the model. During this step, the cross-entropy loss for the next token prediction task was minimised (hence, no labelled data are required for pretraining). The complete set of model’s parameters was optimised (i.e. no PEFT was used). Figure <xref rid="j_infor592_fig_002">2</xref> shows the loss during the pretraining process. From it we see that, although loss minimisation tends to saturate in the end, one may hypothesise that the learning would continue for more than one epoch.</p>
<p>Figure <xref rid="j_infor592_fig_007">6</xref> (Appendix <xref rid="j_infor592_app_001">A</xref>) shows the distribution of the source of the Lithuanian component of the CulturaX dataset. From it we see that this dataset is quite rich in quantity. However, it is collected mainly from common web sites. Figure <xref rid="j_infor592_fig_008">7</xref> (Appendix <xref rid="j_infor592_app_001">A</xref>) shows the distribution of the length of the record in tokens. In order to speed up pretraining, we reduced the context length to 2048 tokens (reflected in the peak near 2048, in Fig. <xref rid="j_infor592_fig_008">7</xref> (Appendix <xref rid="j_infor592_app_001">A</xref>)). See Table <xref rid="j_infor592_tab_003">3</xref> for more details on the pretraining process.</p>
<p><bold>Supervised fine-tuning (SFT)</bold> explicitly guides the pretrained model toward task-specific outputs using labelled data. It is much less computationally intensive, since the model already has the Lithuanian language integrated (Table <xref rid="j_infor592_tab_003">3</xref>). We conducted SFT using the Alpaca dataset (Dubois <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_006">2024</xref>), which has been translated into Lithuanian using the ChatGPT (gpt-4-1106-preview) and dataset (Neurotechnology, <xref ref-type="bibr" rid="j_infor592_ref_030">2024</xref>). SFT tunes the LLMs to process formatted prompts <monospace>"[INST] «SYS» {system_level_instruction} «/SYS»{instruction}[/INST]"</monospace>, where parameter <monospace>system_level_instruction</monospace> sets desired behaviour constraints (e.g. tone, response style), and parameter <monospace>instruction</monospace> specifies task (see the caption of Table <xref rid="j_infor592_tab_009">9</xref> (Appendix <xref rid="j_infor592_app_001">A</xref>), for an example). SFT was conducted with the same parameters as in Table <xref rid="j_infor592_tab_003">3</xref>, except for the learning rate, which was set to 0.00001, and the context length was restored to 4096.</p>
<p>Hyperparameters, such as learning rate, warmup ratio, weight decay provided in Table <xref rid="j_infor592_tab_003">3</xref> were selected according to Touvron <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor592_ref_042">2023</xref>) guidelines, but slightly adjusting the values provided to ensure faster and more stable loss minimisation. During pretraining we also observed gradient exploding effects. We mitigated them by tuning gradient accumulation steps (see Table <xref rid="j_infor592_tab_003">3</xref>).</p>
<p>Table <xref rid="j_infor592_tab_010">10</xref> (Appendix <xref rid="j_infor592_app_001">A</xref>) provides text generation examples (pretrained models), and Table <xref rid="j_infor592_tab_009">9</xref> (Appendix <xref rid="j_infor592_app_001">A</xref>) provides examples of answers to questions (pretrained and fine-tuned models). If not stated otherwise, in all experiments and benchmarks with the proposed LLMs, we used pretrained-only models, which corresponds to the common practice. The download links for the proposed LLMs are provided in Table <xref rid="j_infor592_tab_008">8</xref> (Appendix <xref rid="j_infor592_app_001">A</xref>).</p>
<fig id="j_infor592_fig_001">
<label>Fig. 1</label>
<caption>
<p>Overview of the two-step process for creating LT-Llama2-7B/LT-Llama2-13B.</p>
</caption>
<graphic xlink:href="infor592_g001.jpg"/>
</fig>
<table-wrap id="j_infor592_tab_003">
<label>Table 3</label>
<caption>
<p>Hyperparameters and other details.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Learning parameter</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Llama2-7B</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Llama2-13B</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Number of epochs</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">1</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Learning rate</td>
<td style="vertical-align: top; text-align: left">0.0002</td>
<td style="vertical-align: top; text-align: left">0.00004</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Warmup ratio</td>
<td style="vertical-align: top; text-align: left">0.05</td>
<td style="vertical-align: top; text-align: left">0.05</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Weight decay</td>
<td style="vertical-align: top; text-align: left">0.07</td>
<td style="vertical-align: top; text-align: left">0.05</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Per-device batch size</td>
<td style="vertical-align: top; text-align: left">8</td>
<td style="vertical-align: top; text-align: left">4</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Gradient accumulation steps</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">4</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Duration of pretraining in hours for a single H100 GPU</td>
<td style="vertical-align: top; text-align: left">1722.0</td>
<td style="vertical-align: top; text-align: left">2980.5</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Duration of fine-tuning in hours for a single H100 GPU</td>
<td style="vertical-align: top; text-align: left"><inline-formula id="j_infor592_ineq_003"><alternatives><mml:math>
<mml:mo mathvariant="normal">&lt;</mml:mo>
<mml:mn>1</mml:mn></mml:math><tex-math><![CDATA[$\lt 1$]]></tex-math></alternatives></inline-formula></td>
<td style="vertical-align: top; text-align: left"><inline-formula id="j_infor592_ineq_004"><alternatives><mml:math>
<mml:mo mathvariant="normal">&lt;</mml:mo>
<mml:mn>1</mml:mn></mml:math><tex-math><![CDATA[$\lt 1$]]></tex-math></alternatives></inline-formula></td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Total number of tokens</td>
<td colspan="2" style="vertical-align: top; text-align: center">14761219995</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Records in dataset</td>
<td colspan="2" style="vertical-align: top; text-align: center">13339785</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Mean number of tokens per record</td>
<td colspan="2" style="vertical-align: top; text-align: center">1106.5560</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Standard deviation of tokens per record</td>
<td colspan="2" style="vertical-align: top; text-align: center">697.0089</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Optimiser</td>
<td colspan="2" style="vertical-align: top; text-align: center">AdamW</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Hardware</td>
<td colspan="2" style="vertical-align: top; text-align: center; border-bottom: solid thin">8xH100 GPUs</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><bold>Proposed open Q/A dataset.</bold> This dataset was constructed from the ChatGPT (OpenAI <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_033">2024</xref>) summarisations of a subset of Lithuanian Wikipedia using the procedure described below. First, the Lithuanian Wikipedia was downloaded and the titles of its pages were filtered with the following prompt: <monospace>"I will provide a list of titles in Lithuanian language. From the list provide me the titles without any explanation which are directly or indirectly related with Lithuania except fauna and flora. List: {list}</monospace>", where variable <monospace>list</monospace> represents a list of titles of all pages. After this filtering and manual check, the resulting list of Lithuanian Wikipedia pages represented by pair (<monospace>title</monospace>, <monospace>text</monospace>) was transformed into Q/A pairs using the prompt, returned by Algorithm <xref rid="j_infor592_fig_003">1</xref>.</p>
<fig id="j_infor592_fig_002">
<label>Fig. 2</label>
<caption>
<p>Losses (<italic>y</italic>-axis) vs training steps (<italic>x</italic>-axis) during the model’s pretraining.</p>
</caption>
<graphic xlink:href="infor592_g002.jpg"/>
</fig>
<fig id="j_infor592_fig_003">
<label>Algorithm 1</label>
<caption>
<p>Generate prompt for Q/A summarisation</p>
</caption>
<graphic xlink:href="infor592_g003.jpg"/>
</fig>
<p>The proposed dataset consists of 13,848 such pairs, and represents various facts about Lithuania and Lithuanian history. Note that it was not used in the pretraining process. Table <xref rid="j_infor592_tab_004">4</xref> presents a set of examples from the proposed Q/A dataset, which can be accessed through the download links provided in Table <xref rid="j_infor592_tab_008">8</xref> (Appendix <xref rid="j_infor592_app_001">A</xref>).</p>
<p><bold>Proposed open translations of language understanding benchmarks</bold>. Language model evaluation harness (LMEH, Gao <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_010">2023</xref>) are language understanding benchmarks which are created for the evaluation of LLMs across a wide range of tasks. LMEH includes a set of popular LLM evaluation benchmarks:</p>
<table-wrap id="j_infor592_tab_004">
<label>Table 4</label>
<caption>
<p>Examples from the accompanying Q/A dataset.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Question</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Answer</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Koks yra Vilniaus miesto statusas Lietuvoje?</td>
<td style="vertical-align: top; text-align: left">Vilnius yra Lietuvos sostinė.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Kur yra Gedimino pilis?</td>
<td style="vertical-align: top; text-align: left">Gedimino pilis yra Vilniuje, ant Gedimino kalno.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Kas buvo vadinamas „Lito tėvu“?</td>
<td style="vertical-align: top; text-align: left">Vladas Jurgutis buvo vadinamas „Lito tėvu“, nes jam buvo patikėta spręsti visus naujos valiutos įvedimo niuansus.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Kokios upės teka per Vilnių?</td>
<td style="vertical-align: top; text-align: left">Per Vilnių teka Neris ir Vilnia.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Kada buvo įkurtas Vilniaus universitetas?</td>
<td style="vertical-align: top; text-align: left">Vilniaus universitetas buvo įkurtas 1579 metais, Vilniuje, po Lietuvos didžiojo kunigaikščio Stepono Batoro privilegijos suteikimo jėzuitų ordino kolegijai.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Kada ir kur įvyko Žalgirio mūšis?</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Žalgirio mūšis įvyko 1410 m. liepos 15 d. netoli Tanenbergo ir Griunvaldo (Žalgirio) kaimelių, dabartinės Lenkijos teritorijoje, į pietvakarius nuo Olštyno.</td>
</tr>
</tbody>
</table>
</table-wrap>
<list>
<list-item id="j_infor592_li_007">
<label>•</label>
<p>Arc (Lai <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_023">2023</xref>) benchmark consists of multiple choice science questions at school level.</p>
</list-item>
<list-item id="j_infor592_li_008">
<label>•</label>
<p>GSM8K (Cobbe <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_004">2021</xref>) benchmark consists of linguistically diverse mathematical problems.</p>
</list-item>
<list-item id="j_infor592_li_009">
<label>•</label>
<p>Hellaswag (Zellers <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_045">2019</xref>) benchmark consists of common-sense inference challenge dataset.</p>
</list-item>
<list-item id="j_infor592_li_010">
<label>•</label>
<p>Massive multitask language understanding (MMLU) (Hendrycks <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_015">2021</xref>) benchmark covers different tasks from a diverse set of academic disciplines and is designed to measure the accuracy of the model in a multitask setting.</p>
</list-item>
<list-item id="j_infor592_li_011">
<label>•</label>
<p>Truthful-qa (Lai <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_023">2023</xref>) benchmark is designed to measure whether an LLM is truthful in generating answers to questions that span different categories (health, law, finance, and politics).</p>
</list-item>
<list-item id="j_infor592_li_012">
<label>•</label>
<p>Winogrande (Sakaguchi <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_038">2019</xref>) is a set of pronoun resolution problems originally designed to be unsolvable for statistical models that are based on selectional preferences or word associations.</p>
</list-item>
</list>
<p>These benchmarks produce prompts consisting of question and answer options, and evaluate the accuracy of the responses of LLMs. The accuracy can be measured conveniently because of the structured prompt, which asks the LLM to select an option (e.g. "a", "b" or "c"). We translated the LMEH benchmarks into Lithuanian using GPT-4. The download links are provided in Table <xref rid="j_infor592_tab_008">8</xref> (Appendix <xref rid="j_infor592_app_001">A</xref>).</p>
</sec>
<sec id="j_infor592_s_004">
<label>4</label>
<title>Empirical Evaluation</title>
<sec id="j_infor592_s_005">
<label>4.1</label>
<title>Perplexity During Pretraining</title>
<p>We analysed the LLMs by examining their perplexity (measured on the proposed Q/A dataset), which is defined as 
<disp-formula id="j_infor592_eq_001">
<label>(1)</label><alternatives><mml:math display="block">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:mi mathvariant="italic">P</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">W</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo movablelimits="false">exp</mml:mo>
<mml:mo mathvariant="normal" fence="true" maxsize="2.45em" minsize="2.45em">(</mml:mo>
<mml:mo>−</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">N</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">N</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:mo movablelimits="false">log</mml:mo>
<mml:mi mathvariant="italic">p</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">∣</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo mathvariant="normal">&lt;</mml:mo>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo mathvariant="normal" fence="true" maxsize="2.45em" minsize="2.45em">)</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable></mml:math><tex-math><![CDATA[\[ P(W)=\exp \Bigg(-\frac{1}{N}{\sum \limits_{i=1}^{N}}\log p({w_{i}}\mid {w_{\lt i}})\Bigg),\]]]></tex-math></alternatives>
</disp-formula> 
where</p>
<list>
<list-item id="j_infor592_li_013">
<label>•</label>
<p><inline-formula id="j_infor592_ineq_005"><alternatives><mml:math>
<mml:mi mathvariant="italic">W</mml:mi>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">N</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[$W={w_{1}},\dots ,{w_{N}}$]]></tex-math></alternatives></inline-formula> is the sequence of tokens,</p>
</list-item>
<list-item id="j_infor592_li_014">
<label>•</label>
<p><inline-formula id="j_infor592_ineq_006"><alternatives><mml:math>
<mml:mi mathvariant="italic">p</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">∣</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo mathvariant="normal">&lt;</mml:mo>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$p({w_{i}}\mid {w_{\lt i}})$]]></tex-math></alternatives></inline-formula> is the conditional probability of the token <inline-formula id="j_infor592_ineq_007"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${w_{i}}$]]></tex-math></alternatives></inline-formula> given all the previous tokens <inline-formula id="j_infor592_ineq_008"><alternatives><mml:math>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo mathvariant="normal">&lt;</mml:mo>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math><tex-math><![CDATA[${w_{\lt i}}$]]></tex-math></alternatives></inline-formula> (if <inline-formula id="j_infor592_ineq_009"><alternatives><mml:math>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:math><tex-math><![CDATA[$i=1$]]></tex-math></alternatives></inline-formula>, probability <inline-formula id="j_infor592_ineq_010"><alternatives><mml:math>
<mml:mi mathvariant="italic">p</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">∣</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo mathvariant="normal">&lt;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$p({w_{1}}\mid {w_{\lt 1}})$]]></tex-math></alternatives></inline-formula> is defined as <inline-formula id="j_infor592_ineq_011"><alternatives><mml:math>
<mml:mi mathvariant="italic">p</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$p({w_{1}})$]]></tex-math></alternatives></inline-formula>).</p>
</list-item>
</list>
<p>The perplexity can be interpreted as the model’s ability to predict the next token, given the previous ones. From the definition (Eq. (<xref rid="j_infor592_eq_001">1</xref>)), the lower perplexity values indicate better performance, and for any input sequence <italic>W</italic>, <inline-formula id="j_infor592_ineq_012"><alternatives><mml:math>
<mml:mi mathvariant="italic">P</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">W</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
<mml:mo>⩾</mml:mo>
<mml:mn>1</mml:mn></mml:math><tex-math><![CDATA[$P(W)\geqslant 1$]]></tex-math></alternatives></inline-formula>. The selection of input perplexity was motivated by Gonen <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor592_ref_012">2023</xref>), where the authors reveal that for a wide range of tasks, the lower the perplexity of the prompt, the better the prompt can perform the task.</p>
<p>We investigated the association of average perplexity (averaged over all Q/A concatenations from the proposed Q/A dataset), and the percentage of data from CulturaX Lithuanian component, exposed to the model in the pretraining process. We conducted this experiment using both LT-Llama2-7B and LT-Llama2-13B models, measuring perplexity every <inline-formula id="j_infor592_ineq_013"><alternatives><mml:math>
<mml:mn>10</mml:mn>
<mml:mi mathvariant="normal">%</mml:mi></mml:math><tex-math><![CDATA[$10\% $]]></tex-math></alternatives></inline-formula> of the total number of iterations during a pretraining epoch. Figure <xref rid="j_infor592_fig_004">3</xref> reveals that with the inclusion of additional pretraining data, perplexity tends to decrease, although, in the end, increasing saturation is visible in both cases. The initial and final perplexities in Table <xref rid="j_infor592_tab_005">5</xref> reflect the integration of the Lithuanian language component into the proposed Llama2 models. Note that the proposed Q/A dataset was not used in the pretraining.</p>
<fig id="j_infor592_fig_004">
<label>Fig. 3</label>
<caption>
<p>Percentage of the Lithuanian component of the CulturaX dataset used in the pretraining (<italic>x</italic>-axis) vs. corresponding average perplexity (<italic>y</italic>-axis).</p>
</caption>
<graphic xlink:href="infor592_g004.jpg"/>
</fig>
<table-wrap id="j_infor592_tab_005">
<label>Table 5</label>
<caption>
<p>Average perplexities before (Llama2-7B and Llama2-13B) and after (LT-Llama2-7B and LT-Llama2-13B) pretraining.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Model</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Average perplexity</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Llama2-7B</td>
<td style="vertical-align: top; text-align: left">17.4613</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">LT-Llama2-7B</td>
<td style="vertical-align: top; text-align: left">3.8096</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Llama2-13B</td>
<td style="vertical-align: top; text-align: left">13.8849</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">LT-Llama2-13B</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">3.4520</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="j_infor592_s_006">
<label>4.2</label>
<title>Language Understanding During Pretraining</title>
<p>We evaluated the proposed open LLMs with the proposed open translations of LMEH benchmarks, using the same scheme as in perplexity experiments.</p>
<p>Figures <xref rid="j_infor592_fig_005">4</xref> and <xref rid="j_infor592_fig_006">5</xref> showcase the accuracies for a sequence of checkpoints, which correspond to the percentage of the pretraining data from CulturaX Lithuanian component, starting with <inline-formula id="j_infor592_ineq_014"><alternatives><mml:math>
<mml:mn>0</mml:mn>
<mml:mi mathvariant="normal">%</mml:mi></mml:math><tex-math><![CDATA[$0\% $]]></tex-math></alternatives></inline-formula> (which corresponds to the initial Llama2-7B), with the step of <inline-formula id="j_infor592_ineq_015"><alternatives><mml:math>
<mml:mn>10</mml:mn>
<mml:mi mathvariant="normal">%</mml:mi></mml:math><tex-math><![CDATA[$10\% $]]></tex-math></alternatives></inline-formula>. Similarly, Fig. <xref rid="j_infor592_fig_009">8</xref> (Appendix <xref rid="j_infor592_app_001">A</xref>) and Fig. <xref rid="j_infor592_fig_010">9</xref> (Appendix <xref rid="j_infor592_app_001">A</xref>) provide information about individual benchmarks from the MMLU set.</p>
<p>Although for some tasks (e.g. <monospace>Arc</monospace>, <monospace>Hellaswag</monospace>, <monospace>Winogrande</monospace>) we see consistent improvement throughout the entire pretraining process, this benchmark surprisingly reveals that in most cases of MMLU (see Fig. <xref rid="j_infor592_fig_009">8</xref> (Appendix <xref rid="j_infor592_app_001">A</xref>) and Fig. <xref rid="j_infor592_fig_010">9</xref> (Appendix <xref rid="j_infor592_app_001">A</xref>)), there is no improvement compared to the initial model. We hypothesise that this may be because the Lithuanian component of CulturaX is almost exclusively collected through web crawling of common websites (see Fig. <xref rid="j_infor592_fig_007">6</xref> (Appendix <xref rid="j_infor592_app_001">A</xref>)), which does not include data that is relevant to those specific tasks. Therefore, extension of regional components of CulturaX with high-quality data may improve LLMs, tailored for the corresponding regional languages.</p>
<fig id="j_infor592_fig_005">
<label>Fig. 4</label>
<caption>
<p>Accuracies (<italic>y</italic>-axis) of LMEH benchmarks for LT-Llama2-7B model, pretrained with different proportions of Lithuanian component of CulturaX dataset (<italic>x</italic>-axis). The MMLU benchmarks are summarized in mmlu_lt.</p>
</caption>
<graphic xlink:href="infor592_g005.jpg"/>
</fig>
<fig id="j_infor592_fig_006">
<label>Fig. 5</label>
<caption>
<p>Accuracies (<italic>y</italic>-axis) of LMEH benchmarks for LT-Llama2-13B model, pretrained with different proportions of Lithuanian component of CulturaX dataset (<italic>x</italic>-axis). The MMLU benchmarks are summarized in mmlu_lt.</p>
</caption>
<graphic xlink:href="infor592_g006.jpg"/>
</fig>
</sec>
<sec id="j_infor592_s_007">
<label>4.3</label>
<title>Comparison with Recent Multilingual LLMs that Support the Lithuanian Language</title>
<p><bold>Language understanding benchmarks.</bold> We compared the proposed LLMs and the latest multilingual models that support the Lithuanian language (Gemma2, EuroLLM, Llama3.1, and Llama3.2) in the translated LMEH benchmarks. We provide these evaluations in Table <xref rid="j_infor592_tab_006">6</xref>. Table <xref rid="j_infor592_tab_007">7</xref> summarises Table <xref rid="j_infor592_tab_006">6</xref> by reflecting the rankings of the proposed LT-Llama2-7B and LT-Llama2-13B LLMs in these benchmarks. It shows that in 3 of 6 benchmarks (<monospace>Arc</monospace>, <monospace>Hellaswag</monospace>, and <monospace>Winogrande</monospace>), our LT-Llama2-13B model was ranked as 4th of 8, although the more recent SOTA open LLMs generally performed better than our models.</p>
<table-wrap id="j_infor592_tab_006">
<label>Table 6</label>
<caption>
<p>Accuracies of LLMs in LMEH benchmarks.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Model</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">MMLU</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Arc</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Winogrande</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">TruthfulQA</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Hellaswag</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Belebele</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Gemma2-27B</td>
<td style="vertical-align: top; text-align: left">64.82</td>
<td style="vertical-align: top; text-align: left">77.4</td>
<td style="vertical-align: top; text-align: left">66.77</td>
<td style="vertical-align: top; text-align: left">42.06</td>
<td style="vertical-align: top; text-align: left">50.82</td>
<td style="vertical-align: top; text-align: left">89.22</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Gemma2-9B</td>
<td style="vertical-align: top; text-align: left">60.09</td>
<td style="vertical-align: top; text-align: left">68.31</td>
<td style="vertical-align: top; text-align: left">65.15</td>
<td style="vertical-align: top; text-align: left">39.69</td>
<td style="vertical-align: top; text-align: left">45.32</td>
<td style="vertical-align: top; text-align: left">86.78</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">EuroLLM-9B</td>
<td style="vertical-align: top; text-align: left">51.95</td>
<td style="vertical-align: top; text-align: left">71.55</td>
<td style="vertical-align: top; text-align: left">64.17</td>
<td style="vertical-align: top; text-align: left">42.13</td>
<td style="vertical-align: top; text-align: left">46.32</td>
<td style="vertical-align: top; text-align: left">69.44</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Llama3.1-8B</td>
<td style="vertical-align: top; text-align: left">44.86</td>
<td style="vertical-align: top; text-align: left">48.65</td>
<td style="vertical-align: top; text-align: left">54.22</td>
<td style="vertical-align: top; text-align: left">37.61</td>
<td style="vertical-align: top; text-align: left">35.19</td>
<td style="vertical-align: top; text-align: left">67.56</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Gemma2-2B</td>
<td style="vertical-align: top; text-align: left">35.84</td>
<td style="vertical-align: top; text-align: left">45.45</td>
<td style="vertical-align: top; text-align: left">51.85</td>
<td style="vertical-align: top; text-align: left">54.78</td>
<td style="vertical-align: top; text-align: left">34.8</td>
<td style="vertical-align: top; text-align: left">52.44</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Llama3.2-3B</td>
<td style="vertical-align: top; text-align: left">36.41</td>
<td style="vertical-align: top; text-align: left">39.39</td>
<td style="vertical-align: top; text-align: left">51.85</td>
<td style="vertical-align: top; text-align: left">38.87</td>
<td style="vertical-align: top; text-align: left">31.51</td>
<td style="vertical-align: top; text-align: left">46.22</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">LT-Llama2-13B</td>
<td style="vertical-align: top; text-align: left">26.44</td>
<td style="vertical-align: top; text-align: left">54.5</td>
<td style="vertical-align: top; text-align: left">61.72</td>
<td style="vertical-align: top; text-align: left">35.23</td>
<td style="vertical-align: top; text-align: left">40.61</td>
<td style="vertical-align: top; text-align: left">27.67</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">LT-Llama2-7B</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">26.01</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">43.18</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">53.67</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">41.38</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">33.17</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">27.23</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><bold>Quality of text generation.</bold> The paper by Kapočiūtė-Dzikienė <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor592_ref_021">2025</xref>) provides an empirical evaluation of recent LLMs (GPT-4o, Llama3.1, Gemma2, and ours). Based on their findings, our LT-Llama2-13B model outperformed its competitors (GPT-4o, Llama 3.1, and Gemma2) in benchmarks for text generation quality in the Lithuanian language. It achieved an error rate of <inline-formula id="j_infor592_ineq_016"><alternatives><mml:math>
<mml:mn>0.98</mml:mn>
<mml:mi mathvariant="normal">%</mml:mi></mml:math><tex-math><![CDATA[$0.98\% $]]></tex-math></alternatives></inline-formula>, and the closest competitor (GPT-4o) achieved an error rate of <inline-formula id="j_infor592_ineq_017"><alternatives><mml:math>
<mml:mn>3.44</mml:mn>
<mml:mi mathvariant="normal">%</mml:mi></mml:math><tex-math><![CDATA[$3.44\% $]]></tex-math></alternatives></inline-formula>. However, in a benchmark of the accuracy of the answer, the other models were more accurate than ours.</p>
<table-wrap id="j_infor592_tab_007">
<label>Table 7</label>
<caption>
<p>Rankings of the proposed LLMs in LMEH benchmarks (1 means that the model was the most accurate, and 8 means that it was the least accurate).</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Model</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">MMLU</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Arc</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Winogrande</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">TruthfulQA</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Hellaswag</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Belebele</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">LT-Llama2-13B</td>
<td style="vertical-align: top; text-align: left">7</td>
<td style="vertical-align: top; text-align: left">4</td>
<td style="vertical-align: top; text-align: left">4</td>
<td style="vertical-align: top; text-align: left">8</td>
<td style="vertical-align: top; text-align: left">4</td>
<td style="vertical-align: top; text-align: left">7</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">LT-Llama2-7B</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">8</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">7</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">6</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">4</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">7</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">8</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="j_infor592_s_008">
<label>5</label>
<title>Conclusions</title>
<p>We presented the first Llama2-based open LLMs tailored especially for the Lithuanian language, the accompanying Q/A dataset, and the translated LMEH benchmarks, which contribute to the standardisation of the evaluation of Lithuanian language models.</p>
<p>We also provided an overview of the existing LLMs for common European languages. It shows that most regional models follow the Llama2 or Mistral architecture. In addition, some authors do not train a full parameter set, but instead rely on PEFT approaches, which are less computationally demanding but also potentially less efficient in performance. On the other hand, PEFT methods partially allow one to retain the original parameter structure, and thereby they may be beneficial for achieving more efficient regional LLMs from the perspective of language understanding benchmarks. Our findings also reveal a lack of scientific documentation of the published open regional LLMs.</p>
<p>We evaluated the proposed LLMs based on perplexity and translated LMEH benchmarks. During the pretraining epoch, we evaluated average perplexities (measured with independent dataset) every <inline-formula id="j_infor592_ineq_018"><alternatives><mml:math>
<mml:mn>10</mml:mn>
<mml:mi mathvariant="normal">%</mml:mi></mml:math><tex-math><![CDATA[$10\% $]]></tex-math></alternatives></inline-formula> of the training iterations. These benchmarks show that perplexity decreases consistently during pretraining, reflecting enhanced next-token prediction capabilities. The initial and final perplexities (17.4613 versus 3.8096 for LT-Llama2-7B and 13.8849 versus 3.4520 for LT-Llama2-13B) show the integration of the Lithuanian language component in the proposed Llama2 models. Using the same scheme, we also evaluated our models with the translated LMEH set, which includes a conceptually diverse set of language model benchmarks. The results of these experiments hint that the Lithuanian component of CulturaX may not be sufficiently rich for modern LLM architectures. Although we positively answer the question of whether efficient Lithuanian LLMs (which were non-existent at the beginning and during most of this research) can be achieved from Llama2 LLMs, which lack Lithuanian components, the latest open multilingual models (Llama3.1, Llama3.2, Gemma2, and EuroLLM) already have a strong Lithuanian component. According to our benchmarks, these open SOTA LLMs generally performed better than our models, however, the proposed LT-Llama2-13B was ranked average (<inline-formula id="j_infor592_ineq_019"><alternatives><mml:math>
<mml:mn>4</mml:mn>
<mml:mo mathvariant="normal" stretchy="false">/</mml:mo>
<mml:mn>8</mml:mn></mml:math><tex-math><![CDATA[$4/8$]]></tex-math></alternatives></inline-formula>) in half of the LMEH benchmarks. This also leads to the hypothesis that by deriving Lithuanian LLMs from these recent models, one may obtain more efficient Lithuanian LLMs. In our opinion, the good performance of our model in the external benchmark by Kapočiūtė-Dzikienė <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor592_ref_021">2025</xref>) may be due to the fact that it was trained in a single language and the other LLMs were multilingual.</p>
<p>In the context of regional LLMs, the proposed models open up further research perspectives not only for NLP, but also for other directions, since LLM representations are potentially useful in various scenarios (e.g. sentiment analysis (Zhang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_046">2024</xref>), robotics (Kim <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_022">2024</xref>)). The important limitations of our contribution are related to the rapid progress of LLM research, leading to the continuous emergence of more advanced models. In addition, we used automatically translated and generated data in the contributed components, which may also cause negative effects. To achieve stable loss minimisation during pretraining we faced and solved several challenges related to the selection of hyperparameters (learning rates, batch size, gradient accumulation steps). Our future work will include fully trained small language models tailored for Baltic languages and English.</p>
</sec>
</body>
<back>
<app-group>
<app id="j_infor592_app_001"><label>A</label>
<title>Appendix</title>
<table-wrap id="j_infor592_tab_008">
<label>Table 8</label>
<caption>
<p>Download links for proposed LLMs and data.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">URL</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Description</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left"><ext-link ext-link-type="uri" xlink:href="https://tinyurl.com/3vrjt5u3">https://tinyurl.com/3vrjt5u3</ext-link></td>
<td style="vertical-align: top; text-align: left">Proposed LLM LT-Llama2-7B (pretrained version).</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"><ext-link ext-link-type="uri" xlink:href="https://tinyurl.com/236mab8b">https://tinyurl.com/236mab8b</ext-link></td>
<td style="vertical-align: top; text-align: left">Proposed LLM LT-Llama2-7B (pretrained and fine-tuned version).</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"><ext-link ext-link-type="uri" xlink:href="https://tinyurl.com/bdzcae84">https://tinyurl.com/bdzcae84</ext-link></td>
<td style="vertical-align: top; text-align: left">Proposed LLM LT-Llama2-13B (pretrained version).</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"><ext-link ext-link-type="uri" xlink:href="https://tinyurl.com/2wr9npfh">https://tinyurl.com/2wr9npfh</ext-link></td>
<td style="vertical-align: top; text-align: left">Proposed LLM LT-Llama2-13B (pretrained and fine-tuned version).</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"><ext-link ext-link-type="uri" xlink:href="https://tinyurl.com/5y88x7ym">https://tinyurl.com/5y88x7ym</ext-link></td>
<td style="vertical-align: top; text-align: left">Proposed open Q/A dataset.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"><ext-link ext-link-type="uri" xlink:href="https://tinyurl.com/3srtmv46">https://tinyurl.com/3srtmv46</ext-link></td>
<td style="vertical-align: top; text-align: left">LT-Arc is the Lithuanian translation of Arc dataset (Lai <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_023">2023</xref>), which consists of a set of genuine grade-school level, multiple-choice science questions assembled to encourage research in advanced question-answering.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"><ext-link ext-link-type="uri" xlink:href="https://tinyurl.com/3s2khf7f">https://tinyurl.com/3s2khf7f</ext-link></td>
<td style="vertical-align: top; text-align: left">LT-GSM8K is a Lithuanian translation of GSM8K dataset (Cobbe <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_004">2021</xref>) that consists of linguistically diverse mathematical problems.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"><ext-link ext-link-type="uri" xlink:href="https://tinyurl.com/bdzcayrw">https://tinyurl.com/bdzcayrw</ext-link></td>
<td style="vertical-align: top; text-align: left">LT-Hellaswag is a Lithuanian translation of Hellaswag benchmark (Zellers <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_045">2019</xref>), consisting of a common sense inference challenge dataset.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"><ext-link ext-link-type="uri" xlink:href="https://tinyurl.com/38w2m94c">https://tinyurl.com/38w2m94c</ext-link></td>
<td style="vertical-align: top; text-align: left">LT-MMLU is a Lithuanian translation of MMLU benchmark (Hendrycks <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_015">2021</xref>) that measures a text model’s multitask accuracy. The test covers 57 tasks including elementary mathematics, history, computer science, law, and more.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"><ext-link ext-link-type="uri" xlink:href="https://tinyurl.com/5e53mmjh">https://tinyurl.com/5e53mmjh</ext-link></td>
<td style="vertical-align: top; text-align: left">LT-Truthful-qa is a Lithuanian translation of Truthful-qa benchmark (Lai <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor592_ref_023">2023</xref>) to measure whether an LLM is truthful in generating answers to questions that span different categories (health, law, finance and politics).</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><ext-link ext-link-type="uri" xlink:href="https://tinyurl.com/2s5wcatx">https://tinyurl.com/2s5wcatx</ext-link></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">LT-Winogrande is a Lithuanian translation of Winogrande dataset (Sakaguchi <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor592_ref_038">2019</xref>)), which consists of expert-crafted pronoun resolution problems originally designed to be unsolvable for statistical models that rely on selectional preferences or word associations.</td>
</tr>
</tbody>
</table>
</table-wrap> 
<table-wrap id="j_infor592_tab_009">
<label>Table 9</label>
<caption>
<p>Examples of answers from the proposed LLMs (pretrained and fine-tuned versions) to the given prompts. The prompt format: <monospace>[INST] «SYS» Pateik lakonišką, bet išsamų atsakymą į klausimą. «/SYS» Question given in the first column of this table [/INST]</monospace>.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Question</td>
<td style="vertical-align: top; text-align: justify; border-top: solid thin; border-bottom: solid thin">LT-Llama2-7B</td>
<td style="vertical-align: top; text-align: justify; border-top: solid thin; border-bottom: solid thin">LT-Llama2-13B</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>Kuo užsiima bendrovė Neurotechnology?</bold></td>
<td style="vertical-align: top; text-align: justify; border-bottom: solid thin">Bendrovė Neurotechnology yra įmonė, kuri specializuojasi neuroninių tinklų ir dirbtinio intelekto srityse, teikianti paslaugas įvairioms pramonės šakoms ir organizacijoms.</td>
<td style="vertical-align: top; text-align: justify; border-bottom: solid thin">Bendrovė Neurotechnology yra dirbtinio intelekto ir kompiuterinės regos technologijų kūrėja, teikianti sprendimus robotikai, kompiuterinei regai, mobiliosioms aplikacijoms ir kitoms pramonės šakoms.</td>
</tr>
</tbody><tbody>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>Kokių svarbiausių tarptautinių organizacijų narė yra Lietuva?</bold></td>
<td style="vertical-align: top; text-align: justify; border-bottom: solid thin">Lietuva yra Europos Sąjungos (ES) narė nuo 2004 m. ir NATO narė nuo 2004 m.</td>
<td style="vertical-align: top; text-align: justify; border-bottom: solid thin">Lietuva yra Jungtinių Tautų, Europos Sąjungos, NATO, Ekonominio bendradarbiavimo ir plėtros organizacijos, Pasaulio prekybos organizacijos ir kitų tarptautinių organizacijų narė.</td>
</tr>
</tbody><tbody>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>Ką žinai apie lietuvių kalbą?</bold></td>
<td style="vertical-align: top; text-align: justify; border-bottom: solid thin">Atsiprašau, bet aš negaliu pateikti konkrečios informacijos apie lietuvių kalbą, nes neturiu prieigos prie jūsų pateiktos informacijos. Jei norite sužinoti apie lietuvių kalbą, galite kreiptis į patikimus šaltinius, pavyzdžiui, Lietuvių kalbos institutą arba Lietuvių kalbos žodyną.</td>
<td style="vertical-align: top; text-align: justify; border-bottom: solid thin">Lietuvių kalba yra viena iš dviejų išlikusių gyvų gyvų baltų kalbų, priklausanti indoeuropiečių kalbų šeimai. Ji yra oficiali Lietuvos Respublikos kalba ir turi apie 3,5 mln. kalbančiųjų. Lietuvių kalba turi turtingą žodyną, įskaitant daugiau nei 5000 skirtingų žodžių, ir yra žinoma dėl savo archajiškumo bei išlaikytų archajiškų formų. Lietuvių kalba taip pat yra svarbi Lietuvos kultūros ir tapatybės dalis, ir ji yra įtraukta į UNESCO pasaulio paveldo sąrašą.</td>
</tr>
</tbody><tbody>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>Koks yra 50 km/h važiuojančio automobilio greitis?</bold></td>
<td style="vertical-align: top; text-align: justify; border-bottom: solid thin">50 km/h yra maksimalus greitis, kurį automobilis gali pasiekti važiuodamas 100 km/h greičiu. Tai yra maksimalus greitis, kurį automobilis gali pasiekti, kai jis važiuoja 100 km/h greičiu.</td>
<td style="vertical-align: top; text-align: justify; border-bottom: solid thin">50 km/h važiuojantis automobilis yra 50 km per valandą.</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="j_infor592_fig_007">
<label>Fig. 6</label>
<caption>
<p>Source distribution of the Lithuanian component of the CulturaX dataset.</p>
</caption>
<graphic xlink:href="infor592_g007.jpg"/>
</fig> 
<fig id="j_infor592_fig_008">
<label>Fig. 7</label>
<caption>
<p>Distribution of the record length of the Lithuanian component of the CulturaX dataset (in tokens), which was used in pretraining of the proposed LLMs.</p>
</caption>
<graphic xlink:href="infor592_g008.jpg"/>
</fig> 
<fig id="j_infor592_fig_009">
<label>Fig. 8</label>
<caption>
<p>Accuracies (<italic>y</italic>-axis) of individual MMLU benchmarks for LT-Llama2-7B model, pretrained with different proportions of Lithuanian component of CulturaX dataset (<italic>x</italic>-axis).</p>
</caption>
<graphic xlink:href="infor592_g009.jpg"/>
</fig> 
<fig id="j_infor592_fig_010">
<label>Fig. 9</label>
<caption>
<p>Accuracies (<italic>y</italic>-axis) of individual MMLU benchmarks for LT-Llama2-13B model, pretrained with different proportions of Lithuanian component of CulturaX dataset (<italic>x</italic>-axis).</p>
</caption>
<graphic xlink:href="infor592_g010.jpg"/>
</fig> 
<table-wrap id="j_infor592_tab_010">
<label>Table 10</label>
<caption>
<p>Examples of text generation with the proposed LLMs (pretrained versions). Prompts are indicated in gray text.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: justify; border-top: solid thin; border-bottom: solid thin"><bold>LT-Llama2-7B</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: justify">Kartą gyveno senelis ir senelė. Senelis buvo labai geras žmogus, tai jis visada pasiūlydavo kokį nors darbelį, kad tik jį padarytų. Senelė buvo labai gera, tai ji visada viską išbandydavo, kad tik jos vaikai būtų sveiki. Vieną dieną senelis išėjo į mišką, o senelė pasiėmė savo vaikus ir išėjo. Senelis sako: "O kodėl aš</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: justify">Jungtinės Valstijos pirmadienį išreiškė susirūpinimą dėl Vengrijos ministro pirmininko Viktoro Orbano susitikimo su Kinijos prezidentu Xi Jinpingu Pekine aptarti karo Ukrainoje. "Mes atidžiai stebime padėtį Ukrainoje ir raginame Vengriją gerbti tarptautinę teisę ir laikytis savo įsipareigojimų", – sakoma JAV valstybės sekretoriaus Antony Blinkeno pareiškime. JAV valstybės sekretorius pabrėžė, kad Jungtinės Valstijos "palaiko Ukrainos teritorinį vientisumą ir suverenitetą". "Mes raginame Vengriją gerbti tarptautinę teisę...</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: justify">Jau saulelė vėl atkopdama budino svietą. Ir žiemos šaltos triūsus pargriaudama juokės. Šalčių pramonės vėjai, Šaltųjų karų pelenai, Tamsos dulkės, Sniego gniūžtės, Tylos siena, Kurią, rodos, kadaise sukūrė žmonės. O kadaise – šaltis, Sniegas, Šaltis, Šaltis, Šaltis, Šaltis, Šaltis, Šaltis, Šaltis, Šaltis...</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: justify; border-bottom: solid thin">Broliai, seserys, imkiat mani ir skaitykiat, Ir tatai skaitydami permanykiat. Mokslo šito tėvai jūsų trokšdavo turėti, Ale to negalėjo nė vienu būdu gauti. Ir tatai skaitydami permanykiat. Mokslo šito tėvai jūsų trokšdavo turėti, Ale to negalėjo nė vienu būdu gauti. Broliai, seserys, imkiat mani ir skaitykiat, Ir tatai skaitydami permanykiat. Mokslo šito tėvai jūsų trokšdavo turėti, Ale to negalėjo nė vienu būdu gauti. Ir tatai skaitydami...</td>
</tr>
</tbody>
</table>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: justify; border-top: solid thin; border-bottom: solid thin"><bold>LT-Llama2-13B</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: justify">Kartą gyveno senelis ir senelė. Jie turėjo vieną vaiką. Vaikas buvo labai smalsus. Jis klausinėjo savo tėvų, kodėl jie turi tiek daug vaikų. Kartą senelis sugalvojo nuvykti į kitą miestą. Jis norėjo nusipirkti daugiau maisto ir kitų reikalingų daiktų. Senelis išsiruošė į kelionę...</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: justify">Jungtinės Valstijos pirmadienį išreiškė susirūpinimą dėl Vengrijos ministro pirmininko Viktoro Orbano susitikimo su Kinijos prezidentu Xi Jinpingu Pekine aptarti karo Ukrainoje. JAV prezidento Joe Bideno atstovas spaudai Ridas Bouchardas sakė, kad susitikimas įvyks kitą savaitę, bet nepatikslino, ar jame dalyvaus JAV prezidentas. V. Orbanas ir Xi Jinpingas susitiks aptarti karo Ukrainoje, sakė Vengrijos užsienio reikalų ministras Peteris Szijjarto. Kinijos valstybinė naujienų agentūra „Xinhua“ pranešė, kad per susitikimą bus aptarta padėtis Ukrainoje, bet nenurodė, ar V. Orbanas vyks į Pekiną</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: justify">Jau saulelė vėl atkopdama budino svietą. Ir žiemos šaltos triūsus pargriaudama juokės.Šalčių pramonės darbuotojai, atsidėkodami už tą jų šventę, ėmė švęsti visi kartu. Ir šventė jie septynias dienas. Paskui prasidėjo kalėdinės atostogos. O tada prasidėjo tikros linksmybės. Ir šventės tęsėsi ištisus metus. Kaip gi šventė? Ogi šventė visi kartu. Ir kiekvienas kartu. Ir visi kartu. Ir kiekvienas kartu. Ir visi kartu</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: justify; border-bottom: solid thin">Broliai, seserys, imkiat mani ir skaitykiat, Ir tatai skaitydami permanykiat. Mokslo šito tėvai jūsų trokšdavo turėti, Ale to negalėjo nė vienu būdu gauti. Ale dabar tatai jums atsivėrė. Jūs esate gavę tai, kas anksčiau buvo paslėpta nuo jūsų akių; taip pat ir tatai, kas buvo paslėpta nuo tėvų akių, dabar yra jums atskleista. O kadangi šitas mokslas yra jums atskleistas, tai dabar jūs, broliai, seserys, imkitės to, kad skaitytumėtės jį dieną naktį, kad tiktai jūsų širdys būtų atvertos, kad tiktai</td>
</tr>
</tbody>
</table>
</table-wrap>
</app></app-group>
<ack id="j_infor592_ack_001">
<title>Acknowledgements</title>
<p>This research was funded by Neurotechnology. We are grateful to Neurotechnology for providing resources and support for this research. We thank Rasa Kundrotaitė and Greta Tikužytė for editing the English language, Ignas Mataitis, and other colleagues for useful remarks and discussions. We also extend our thanks to anonymous reviewers for their valuable feedback.</p></ack>
<ref-list id="j_infor592_reflist_001">
<title>References</title>
<ref id="j_infor592_ref_001">
<mixed-citation publication-type="other"><string-name><surname>Almazrouei</surname>, <given-names>E.</given-names></string-name>, <string-name><surname>Alobeidli</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Alshamsi</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Cappelli</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Cojocaru</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Debbah</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Goffinet</surname>, <given-names>É.</given-names></string-name>, <string-name><surname>Hesslow</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Launay</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Malartic</surname>, <given-names>Q.</given-names></string-name>, <string-name><surname>Mazzotta</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Noune</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Pannier</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Penedo</surname>, <given-names>G.</given-names></string-name> (2023). <italic>The Falcon Series of Open Language Models</italic>. <uri>https://arxiv.org/abs/2311.16867</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_002">
<mixed-citation publication-type="other"><string-name><surname>Bacciu</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Trappolini</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Santilliand</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Rodolà</surname>, <given-names>E.</given-names></string-name>, <string-name><surname>Silvestri</surname>, <given-names>F.</given-names></string-name> (2023). <italic>Fauno: The Italian Large Language Model that will Leave you Senza Parole</italic>! <uri>https://arxiv.org/abs/2306.14457</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_003">
<mixed-citation publication-type="chapter"><string-name><surname>Boros</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Chivereanu</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Dumitrescu</surname>, <given-names>S.D.</given-names></string-name>, <string-name><surname>Purcaru</surname>, <given-names>O.</given-names></string-name> (<year>2024</year>). <chapter-title>Fine-tuning and retrieval augmented generation for question answering using affordable large language models</chapter-title>. In: <source>Proceedings of the Third Ukrainian Natural Language Processing Workshop, LREC-COLING</source>. <publisher-name>European Language Resources Association</publisher-name>, <publisher-loc>Torino, Italy</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_004">
<mixed-citation publication-type="other"><string-name><surname>Cobbe</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Kosaraju</surname>, <given-names>V.</given-names></string-name>, <string-name><surname>Bavarian</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Jun</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Kaiser</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Plappert</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Tworek</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Hilton</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Nakano</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Hesse</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Schulman</surname>, <given-names>J.</given-names></string-name> (2021). <italic>Training Verifiers to Solve Math Word Problems</italic>. <uri>https://arxiv.org/abs/2110.14168</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_005">
<mixed-citation publication-type="other"><string-name><surname>Csaki</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Xu</surname>, <given-names>Q.</given-names></string-name>, <string-name><surname>Pawakapan</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Du</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Zhao</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Hu</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Thakker</surname>, <given-names>U.</given-names></string-name> (2024). <italic>SambaLingo: Teaching Large Language Models New Languages</italic>. <uri>https://arxiv.org/abs/2404.05829</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_006">
<mixed-citation publication-type="other"><string-name><surname>Dubois</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Taori</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Gulrajani</surname>, <given-names>I.</given-names></string-name>, <string-name><surname>Ba</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Guestrin</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Liang</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Hashimoto</surname>, <given-names>T.B.</given-names></string-name> (2024). <italic>AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback</italic>. <uri>https://arxiv.org/abs/2305.14387</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_007">
<mixed-citation publication-type="book"><string-name><surname>Eberhard</surname>, <given-names>D.M.</given-names></string-name>, <string-name><surname>Simons</surname>, <given-names>G.F.</given-names></string-name>, <string-name><surname>Fennig</surname>, <given-names>C.D.</given-names></string-name> (<year>2021</year>). <source>Ethnologue: Languages of the World</source>, <edition>24th</edition> ed. <publisher-name>SIL International</publisher-name>. <comment>Accessed: 2024-01-04</comment>. <uri>https://www.ethnologue.com/</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_008">
<mixed-citation publication-type="other"><string-name><surname>Ekgren</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Gyllensten</surname>, <given-names>A.C.</given-names></string-name>, <string-name><surname>Stollenwerk</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Öhman</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Isbister</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Gogoulou</surname>, <given-names>E.</given-names></string-name>, <string-name><surname>Carlsson</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Heiman</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Casademont</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Sahlgren</surname>, <given-names>M.</given-names></string-name> (2023). <italic>GPT-SW3: An Autoregressive Language Model for the Nordic Languages</italic>. <uri>https://arxiv.org/abs/2305.12987</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_009">
<mixed-citation publication-type="other"><string-name><surname>Faysse</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Fernandes</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Guerreiro</surname>, <given-names>N.M.</given-names></string-name>, <string-name><surname>Loison</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Alves</surname>, <given-names>D.M.</given-names></string-name>, <string-name><surname>Corro</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Boizard</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Alves</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Rei</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Martins</surname>, <given-names>P.H.</given-names></string-name>, <string-name><surname>Bigata Casademunt</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Yvon</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Martins</surname>, <given-names>A.F.T.</given-names></string-name>, <string-name><surname>Viaud</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Hudelot</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Colombo</surname>, <given-names>P.</given-names></string-name> (2024). <italic>CroissantLLM: A Truly Bilingual French-English Language Model</italic>. <uri>https://arxiv.org/abs/2402.00786</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_010">
<mixed-citation publication-type="other"><string-name><surname>Gao</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Tow</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Abbasi</surname>, <given-names>B.</given-names></string-name>, <etal>et al.</etal> (2023). A framework for few-shot language model evaluation. Zenodo. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.5281/zenodo.10256836" xlink:type="simple">https://doi.org/10.5281/zenodo.10256836</ext-link>. <uri>https://zenodo.org/records/10256836</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_011">
<mixed-citation publication-type="other"><string-name><surname>Garcia</surname>, <given-names>G.L.</given-names></string-name>, <string-name><surname>Paiola</surname>, <given-names>P.H.</given-names></string-name>, <string-name><surname>Morelli</surname>, <given-names>L.H.</given-names></string-name>, <string-name><surname>Candido</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Cândido</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Jodas</surname>, <given-names>D.S.</given-names></string-name>, <string-name><surname>Afonso</surname>, <given-names>L.C.S.</given-names></string-name>, <string-name><surname>Rizzo Guilherme</surname>, <given-names>I.</given-names></string-name>, <string-name><surname>Penteado</surname>, <given-names>B.E.</given-names></string-name>, <string-name><surname>Papa</surname>, <given-names>J.P.</given-names></string-name> (2024). <italic>Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task</italic>. <uri>https://arxiv.org/abs/2401.02909</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_012">
<mixed-citation publication-type="chapter"><string-name><surname>Gonen</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Iyer</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Blevins</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Smith</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Zettlemoyer</surname>, <given-names>L.</given-names></string-name> (<year>2023</year>). <chapter-title>Demystifying prompts in language models via perplexity estimation</chapter-title>. In: <string-name><surname>Bouamor</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Pino</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Bali</surname>, <given-names>K.</given-names></string-name> (Eds.), <source>Findings of the Association for Computational Linguistics: EMNLP 2023</source>. <publisher-name>Association for Computational Linguistics</publisher-name>, <publisher-loc>Singapore</publisher-loc>, pp. <fpage>10136</fpage>–<lpage>10148</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.18653/v1/2023.findings-emnlp.679" xlink:type="simple">https://doi.org/10.18653/v1/2023.findings-emnlp.679</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_013">
<mixed-citation publication-type="other"><string-name><surname>Gordić</surname>, <given-names>A.</given-names></string-name> (2024). YugoGPT Model. <uri>https://huggingface.co/gordicaleksa/YugoGPT</uri>. Accessed: 2024-07-15.</mixed-citation>
</ref>
<ref id="j_infor592_ref_014">
<mixed-citation publication-type="other"><string-name><surname>Grattafiori</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Dubey</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Jauhriand</surname>, <given-names>A.</given-names></string-name>, <etal>et al.</etal> (2024). <italic>The Llama 3 Herd of Models</italic>. <uri>https://arxiv.org/abs/2407.21783</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_015">
<mixed-citation publication-type="other"><string-name><surname>Hendrycks</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Burns</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Basart</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Zou</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Mazeika</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Song</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Steinhardt</surname>, <given-names>J.</given-names></string-name> (2021). <italic>Measuring Massive Multitask Language Understanding</italic>. <uri>https://arxiv.org/abs/2009.03300</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_016">
<mixed-citation publication-type="other"><string-name><surname>Hernandez</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Brown</surname>, <given-names>T.B.</given-names></string-name>, <string-name><surname>Conerly</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>DasSarma</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Drain</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>El-Showk</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Elhage</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Hatfield-Dodds</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Henighan</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Hume</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Johnston</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Mann</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Olah</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Olsson</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Amodei</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Joseph</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Kaplan</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>McCandlish</surname>, <given-names>S.</given-names></string-name> (2022). <italic>Scaling Laws and Interpretability of Learning from Repeated Data</italic>. <uri>https://arxiv.org/abs/2205.10487</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_017">
<mixed-citation publication-type="chapter"><string-name><surname>Hu</surname>, <given-names>E.J.</given-names></string-name>, <string-name><surname>Yelong S.</surname></string-name>, <string-name><surname>Wallis</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Allen-Zhu</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>W.</given-names></string-name> (<year>2022</year>). <chapter-title>LoRA: low-rank adaptation of large language models</chapter-title>. In: <source>International Conference on Learning Representations</source>. <uri>https://openreview.net/forum?id=nZeVKeeFYf9</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_018">
<mixed-citation publication-type="other"><string-name><surname>INSAIT</surname></string-name> (2024). BgGPT-7B-Instruct-v0.1 model. <ext-link ext-link-type="uri" xlink:href="https://huggingface.co/INSAIT-Institute/BgGPT-7B-Instruct-v0.1">https://huggingface.co/INSAIT-Institute/BgGPT-7B-Instruct-v0.1</ext-link>. Accessed: 2024-07-17.</mixed-citation>
</ref>
<ref id="j_infor592_ref_019">
<mixed-citation publication-type="other"><string-name><surname>Jiang</surname>, <given-names>A.Q.</given-names></string-name>, <string-name><surname>Sablayrolles</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Mensch</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Bamford</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Singh Chaplot</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>de las Casas</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Bressand</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Lengyel</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Lample</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Saulnier</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Lavaud</surname>, <given-names>L.R.</given-names></string-name>, <string-name><surname>Lachaux</surname>, <given-names>M.-A.</given-names></string-name>, <string-name><surname>Stock</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Le Scao</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Lavril</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Lacroix</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>El Sayed</surname>, <given-names>W.</given-names></string-name> (2023). <italic>Mistral 7B</italic>. <uri>https://arxiv.org/abs/2310.06825</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_020">
<mixed-citation publication-type="other"><string-name><surname>Jiang</surname>, <given-names>A.Q.</given-names></string-name>, <string-name><surname>Sablayrolles</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Roux</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Mensch</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Savary</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Bamford</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Chaplot</surname>, <given-names>D.S.</given-names></string-name>, <string-name><surname>de las Casas</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Hanna</surname>, <given-names>E.B.</given-names></string-name>, <string-name><surname>Bressand</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Lengyel</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Bour</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Lample</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Lavaud</surname>, <given-names>L.R.</given-names></string-name>, <string-name><surname>Saulnier</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Lachaux</surname>, <given-names>M.-A.</given-names></string-name>, <string-name><surname>Stock</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Subramanian</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Yang</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Antoniak</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Le Scao</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Gervet</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Lavril</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Lacroix</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>El Sayed</surname>, <given-names>W.</given-names></string-name> (2024). <italic>Mixtral of Experts</italic>. <uri>https://arxiv.org/abs/2401.04088</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_021">
<mixed-citation publication-type="other"><string-name><surname>Kapočiūtė-Dzikienė</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Bergmanis</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Pinnis</surname>, <given-names>M.</given-names></string-name> (2025). <italic>Localizing AI: Evaluating Open-Weight Language Models for Languages of Baltic States</italic>. <uri>https://arxiv.org/abs/2501.03952</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_022">
<mixed-citation publication-type="other"><string-name><surname>Kim</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Kim</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Choi</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Park</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Oh</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Park</surname>, <given-names>D.</given-names></string-name> (2024). <italic>A Survey on Integration of Large Language Models with Intelligent Robots</italic>. <uri>https://arxiv.org/abs/2404.09228</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_023">
<mixed-citation publication-type="other"><string-name><surname>Lai</surname>, <given-names>V.D.</given-names></string-name>, <string-name><surname>Nguyen</surname>, <given-names>C.V.</given-names></string-name>, <string-name><surname>Ngo</surname>, <given-names>N.T.</given-names></string-name>, <string-name><surname>Nguyen</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Dernoncourt</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Rossi</surname>, <given-names>R.A.</given-names></string-name>, <string-name><surname>Nguyen</surname>, <given-names>T.H.</given-names></string-name> (2023). <italic>Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback</italic>. <uri>https://arxiv.org/abs/2307.16039</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_024">
<mixed-citation publication-type="other"><string-name><surname>LumiOpen</surname></string-name> (2024). Viking 13B Model. <uri>https://huggingface.co/LumiOpen/Viking-13B</uri>. Accessed: 2024-07-15.</mixed-citation>
</ref>
<ref id="j_infor592_ref_025">
<mixed-citation publication-type="other"><string-name><surname>Mabeck</surname></string-name> (2024). Heidrun-Mistral-7B-chat model. <uri>https://huggingface.co/Mabeck/Heidrun-Mistral-7B-chat</uri>. Accessed: 2024-07-20.</mixed-citation>
</ref>
<ref id="j_infor592_ref_026">
<mixed-citation publication-type="other"><string-name><surname>Martins</surname>, <given-names>P.H.</given-names></string-name>, <string-name><surname>Fernandes</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Alves</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Guerreiro</surname>, <given-names>N.M.</given-names></string-name>, <string-name><surname>Rei</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Alves</surname>, <given-names>D.M.</given-names></string-name>, <string-name><surname>Pombal</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Farajian</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Faysse</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Klimaszewski</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Colombo</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Haddow</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>de Souza</surname>, <given-names>J.G.C.</given-names></string-name>, <string-name><surname>Birch</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Martins</surname>, <given-names>A.F.T.</given-names></string-name> (2024). <italic>EuroLLM: Multilingual Language Models for Europe</italic>. <uri>https://arxiv.org/abs/2409.16235</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_027">
<mixed-citation publication-type="other"><string-name><surname>Masala</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Ilie-Ablachim</surname>, <given-names>D.C.</given-names></string-name>, <string-name><surname>Corlatescu</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Zavelca</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Leordeanu</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Velicu</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Popescu</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Dascalu</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Rebedea</surname>, <given-names>T.</given-names></string-name> (2024). <italic>OpenLLM-Ro – Technical Report on Open-source Romanian LLMs</italic>. <uri>https://arxiv.org/abs/2405.07703</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_028">
<mixed-citation publication-type="other"><string-name><surname>Minaee</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Mikolov</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Nikzad</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Chenaghlu</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Socher</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Amatriain</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Gao</surname>, <given-names>J.</given-names></string-name> (2024). <italic>Large Language Models: A Survey</italic>. <uri>https://arxiv.org/abs/2402.06196</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_029">
<mixed-citation publication-type="other"><string-name><surname>Naveed</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Khan</surname>, <given-names>A.U.</given-names></string-name>, <string-name><surname>Qiu</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Saqib</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Anwar</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Usman</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Akhtar</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Barnes</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Mian</surname>, <given-names>A.</given-names></string-name> (2024). <italic>A Comprehensive Overview of Large Language Models</italic>. <uri>https://arxiv.org/abs/2307.06435</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_030">
<mixed-citation publication-type="other"><string-name><surname>Neurotechnology</surname></string-name> (2024). Lt-QA-V1 dataset. <ext-link ext-link-type="uri" xlink:href="https://huggingface.co/datasets/neurotechnology/lithuanian-qa-v1">https://huggingface.co/datasets/neurotechnology/lithuanian-qa-v1</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_031">
<mixed-citation publication-type="other"><string-name><surname>Nguyen</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Nguyen</surname>, <given-names>C.V.</given-names></string-name>, <string-name><surname>Lai</surname>, <given-names>V.D.</given-names></string-name>, <string-name><surname>Man</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Trung Ngo</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Dernoncourt</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Rossi</surname>, <given-names>R.A.</given-names></string-name>, <string-name><surname>Nguyen</surname>, <given-names>T.H.</given-names></string-name> (2023). <italic>CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages</italic>. <uri>https://arxiv.org/abs/2309.09400</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_032">
<mixed-citation publication-type="other"><string-name><surname>Norallm</surname></string-name> (2024). Normistral-7B-Warm. <uri>https://huggingface.co/norallm/normistral-7b-warm</uri>. Accessed: 2024-07-20.</mixed-citation>
</ref>
<ref id="j_infor592_ref_033">
<mixed-citation publication-type="other"><string-name><surname>OpenAI</surname></string-name>, <string-name><surname>Achiam</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Adler</surname>, <given-names>S.</given-names></string-name>, <etal>et al.</etal> (2024). <italic>GPT-4 Technical Report</italic>. <uri>https://arxiv.org/abs/2303.08774</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_034">
<mixed-citation publication-type="other"><string-name><surname>Plüster</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Schuhmann</surname>, <given-names>C.</given-names></string-name> (2024). <italic>LAION LeoLM: Linguistically Enhanced Open Language Model</italic>. <uri>https://huggingface.co/LeoLM</uri>. Accessed: 2024-07-17.</mixed-citation>
</ref>
<ref id="j_infor592_ref_035">
<mixed-citation publication-type="other"><string-name><surname>Projecte AINA</surname></string-name> (2024). Aguila 7B Model. <uri>https://huggingface.co/projecte-aina/aguila-7b</uri>. Accessed: 2024-07-15.</mixed-citation>
</ref>
<ref id="j_infor592_ref_036">
<mixed-citation publication-type="other"><string-name><surname>Rijgersberg</surname>, <given-names>L.</given-names></string-name> (2024). GEITje. <uri>https://github.com/Rijgersberg/GEITje</uri>. Accessed: 2024-07-17.</mixed-citation>
</ref>
<ref id="j_infor592_ref_037">
<mixed-citation publication-type="other"><string-name><surname>Riviere</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Pathak</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Sessa</surname>, <given-names>P.G.</given-names></string-name> <etal>et al.</etal> (2024). <italic>Gemma 2: Improving Open Language Models at a Practical Size</italic>. <uri>https://arxiv.org/abs/2408.00118</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_038">
<mixed-citation publication-type="other"><string-name><surname>Sakaguchi</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Bras</surname>, <given-names>R.L.</given-names></string-name>, <string-name><surname>Bhagavatula</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Choi</surname>, <given-names>Y.</given-names></string-name> (2019). <italic>WinoGrande: An Adversarial Winograd Schema Challenge at Scale</italic>. <uri>https://arxiv.org/abs/1907.10641</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_039">
<mixed-citation publication-type="other"><string-name><surname>Snæbjarnarson</surname>, <given-names>V.</given-names></string-name>, <string-name><surname>Símonarson</surname>, <given-names>H.B.</given-names></string-name>, <string-name><surname>Ragnarsson</surname>, <given-names>P.O.</given-names></string-name>, <string-name><surname>Ingólfsdóttir</surname>, <given-names>S.L.</given-names></string-name>, <string-name><surname>Jónsson</surname>, <given-names>H.P.</given-names></string-name>, <string-name><surname>Þorsteinsson</surname>, <given-names>V.</given-names></string-name>, <string-name><surname>Einarsson</surname>, <given-names>H.</given-names></string-name> (2022). <italic>A Warm Start and a Clean Crawled Corpus – A Recipe for Good Language Models</italic>. <uri>https://arxiv.org/abs/2201.05601</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_040">
<mixed-citation publication-type="other"><string-name><surname>SPAHE</surname></string-name> (2024). Meltemi-7B-Instruct-v1-GGUF. <ext-link ext-link-type="uri" xlink:href="https://huggingface.co/SPAHE/Meltemi-7B-Instruct-v1-GGUF">https://huggingface.co/SPAHE/Meltemi-7B-Instruct-v1-GGUF</ext-link>. Accessed: 2024-07-17.</mixed-citation>
</ref>
<ref id="j_infor592_ref_041">
<mixed-citation publication-type="other"><string-name><surname>Speakleash</surname></string-name> (2024). Bielik-7B-Instruct-v0.1. <uri>https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1</uri>. Accessed: 2024-07-20.</mixed-citation>
</ref>
<ref id="j_infor592_ref_042">
<mixed-citation publication-type="other"><string-name><surname>Touvron</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Martin</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Stone</surname>, <given-names>K.</given-names></string-name>, <etal>et al.</etal> (2023). <italic>Llama 2: Open Foundation and Fine-Tuned Chat Models</italic>. <uri>https://arxiv.org/abs/2307.09288</uri>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_043">
<mixed-citation publication-type="chapter"><string-name><surname>Ulčar</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Robnik-Šikonja</surname>, <given-names>M.</given-names></string-name> (<year>2021</year>). <chapter-title>SloBERTa: Slovene Monolingual Large Pretrained Masked Language Model</chapter-title>. In: <source>24th International Multiconference Information Society 2021, Volume C. Data Mining and Data Warehouses</source>. <publisher-name>Ljubljana</publisher-name>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_044">
<mixed-citation publication-type="chapter"><string-name><surname>Vaswani</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Shazeer</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Parmar</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Uszkoreit</surname> <given-names>J.</given-names></string-name>, <string-name><surname>Jones</surname> <given-names>L.</given-names></string-name>, <string-name><surname>Gomez</surname> <given-names>A.N.</given-names></string-name>, <string-name><surname>Kaiser</surname> <given-names>L.</given-names></string-name>, <string-name><surname>Polosukhin</surname> <given-names>I.</given-names></string-name> (<year>2017</year>). <chapter-title>Attention Is All You Need</chapter-title>. In: <string-name><surname>Guyon</surname>, <given-names>I.</given-names></string-name>, <string-name><surname>Luxburg</surname>, <given-names>U.V.</given-names></string-name>, <string-name><surname>Bengio</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Wallach</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Fergus</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Vishwanathan</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Garnett</surname>, <given-names>R.</given-names></string-name> (Eds.), <source>Advances in Neural Information Processing Systems</source>, Vol. <volume>30</volume>. <publisher-name>Curran Associates, Inc</publisher-name>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_045">
<mixed-citation publication-type="chapter"><string-name><surname>Zellers</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Holtzman</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Bisk</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Farhadi</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Choi</surname>, <given-names>Y.</given-names></string-name> (<year>2019</year>). <chapter-title>HellaSwag: Can a Machine Really Finish Your Sentence?</chapter-title> In: <source>Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>.</mixed-citation>
</ref>
<ref id="j_infor592_ref_046">
<mixed-citation publication-type="chapter"><string-name><surname>Zhang</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Deng</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Pan</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Bing</surname>, <given-names>L.</given-names></string-name> (<year>2024</year>). <chapter-title>Sentiment analysis in the Era of Large Language Models: a reality check</chapter-title>. In: <string-name><surname>Duh</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Gomez</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Bethard</surname>, <given-names>S.</given-names></string-name> (Eds.), <source>Findings of the Association for Computational Linguistics: NAACL 2024</source>. <publisher-name>Association for Computational Linguistics</publisher-name>, <publisher-loc>Mexico City, Mexico</publisher-loc>, pp. <fpage>3881</fpage>–<lpage>3906</lpage>. <uri>https://aclanthology.org/2024.findings-naacl.246</uri>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
