<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
	<front>
		<journal-meta>
			<journal-id journal-id-type="publisher-id">INFORMATICA</journal-id>
			<journal-title-group>
				<journal-title>Informatica</journal-title>
			</journal-title-group>
			<issn pub-type="epub">0868-4952</issn>
			<issn pub-type="ppub">0868-4952</issn>
			
			<publisher>
				<publisher-name>VU</publisher-name>
				
			</publisher>
			</journal-meta>
		<article-meta>
			<article-id pub-id-type="publisher-id">INFO1039</article-id><article-id pub-id-type="doi">10.15388/Informatica.2014.29</article-id>
			<article-categories>
				<subj-group subj-group-type="heading"><subject>Article</subject></subj-group>
			</article-categories>
			<title-group>
				<article-title>Building Text Corpus for Unit Selection Synthesis</article-title>
			</title-group>
			<contrib-group>
				<contrib contrib-type="Author">
				<name>
					<surname>Kasparaitis</surname>
					<given-names>Pijus</given-names>
				</name><email xlink:href="mailto:pkasparaitis@yahoo.com">pkasparaitis@yahoo.com</email>
				<xref ref-type="aff" rid="j_INFORMATICA_aff_000"/><xref ref-type="corresp" rid="thanks1">*</xref></contrib>
				<contrib contrib-type="Author">
				<name>
					<surname>Anbinderis</surname>
					<given-names>Tomas</given-names>
				</name><email xlink:href="mailto:tomas@anbinderis.lt">tomas@anbinderis.lt</email>
				<xref ref-type="aff" rid="j_INFORMATICA_aff_000"/></contrib>
				<aff id="j_INFORMATICA_aff_000">
					Department of Computer Science II, Faculty of Mathematics and Informatics, Vilnius University, Naugarduko 24, LT-03225 Vilnius, Lithuania</aff>
				</contrib-group>
			<author-notes>
				<corresp id="thanks1">
					<label>*</label>Corresponding author.
				</corresp>
				</author-notes>
			<pub-date pub-type="epub"><day>01</day><month>01</month><year>2014</year></pub-date><volume>25</volume><issue>4</issue><fpage>551</fpage><lpage>562</lpage>
			<history>
				<date date-type="received"><day>01</day><month>02</month><year>2012</year></date>
				<date date-type="accepted"><day>01</day><month>10</month><year>2014</year></date>
				</history>
			<permissions>
				<copyright-statement>Vilnius University</copyright-statement>
				<copyright-year>2014</copyright-year>
			</permissions>
			<abstract>
				<label>Abstract</label>
				<p>The present paper deals with building the text corpus for unit selection text-to-speech synthesis. During synthesis the target and concatenation costs are calculated and these costs are usually based on the prosodic and acoustic features of sounds. If the cost calculation is moved to the phonological level, it is possible to simulate unit selection synthesis without any real recordings; in this case text transcriptions are sufficient. We propose to use the cost calculated during the test data synthesis simulation to evaluate the text corpus quality. The greedy algorithm that maximizes coverage of certain phonetic units will be used to build the corpus. In this work the corpora optimized to cover phonetic units of different size and weight are evaluated.</p>
				</abstract>
			<kwd-group>
				<label>Keywords</label>
				<kwd>text-to-speech synthesis</kwd>
				<kwd>unit selection</kwd>
				<kwd>greedy algorithm</kwd>
			</kwd-group>
			</article-meta>
		</front>
	</article>
