<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
	<front>
		<journal-meta>
			<journal-id journal-id-type="publisher-id">INFORMATICA</journal-id>
			<journal-title-group>
				<journal-title>Informatica</journal-title>
			</journal-title-group>
			<issn pub-type="epub">0868-4952</issn>
			<issn pub-type="ppub">0868-4952</issn>
			<publisher>
				<publisher-name>VU</publisher-name>
			</publisher>
		</journal-meta>
		<article-meta>
			<article-id pub-id-type="publisher-id">inf18302</article-id>
			<article-id pub-id-type="doi">10.15388/Informatica.2007.181</article-id>
			<article-categories>
				<subj-group subj-group-type="heading">
					<subject>Research article</subject>
				</subj-group>
			</article-categories>
			<title-group>
				<article-title>Assessment of Classification Models with Small Amounts of Data</article-title>
			</title-group>
			<contrib-group>
				<contrib contrib-type="Author">
					<name>
						<surname>Brumen</surname>
						<given-names>Boštjan</given-names>
					</name>
					<email xlink:href="mailto:bostjan.brumen@uni-mb.si">bostjan.brumen@uni-mb.si</email>
					<xref ref-type="aff" rid="j_INFORMATICA_aff_000"/>
				</contrib>
				<contrib contrib-type="Author">
					<name>
						<surname>Jurič</surname>
						<given-names>Matjaž B.</given-names>
					</name>
					<xref ref-type="aff" rid="j_INFORMATICA_aff_000"/>
				</contrib>
				<contrib contrib-type="Author">
					<name>
						<surname>Welzer</surname>
						<given-names>Tatjana</given-names>
					</name>
					<xref ref-type="aff" rid="j_INFORMATICA_aff_000"/>
				</contrib>
				<contrib contrib-type="Author">
					<name>
						<surname>Rozman</surname>
						<given-names>Ivan</given-names>
					</name>
					<xref ref-type="aff" rid="j_INFORMATICA_aff_000"/>
				</contrib>
				<contrib contrib-type="Author">
					<name>
						<surname>Jaakkola</surname>
						<given-names>Hannu</given-names>
					</name>
					<xref ref-type="aff" rid="j_INFORMATICA_aff_001"/>
				</contrib>
				<contrib contrib-type="Author">
					<name>
						<surname>Papadopoulos</surname>
						<given-names>Apostolos</given-names>
					</name>
					<xref ref-type="aff" rid="j_INFORMATICA_aff_002"/>
				</contrib>
				<aff id="j_INFORMATICA_aff_000">Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova 17, Si-2000 Maribor, Slovenia</aff>
				<aff id="j_INFORMATICA_aff_001">Tampere University of Technology, Pori, PO BOX 300, Fi-28101 Pori, Finland</aff>
				<aff id="j_INFORMATICA_aff_002">Department of Informatics, Aristotle University, PO BOX 451, Thessaloniki, GR-54124, Greece</aff>
			</contrib-group>
			<pub-date pub-type="epub">
				<day>01</day>
				<month>01</month>
				<year>2007</year>
			</pub-date>
			<volume>18</volume>
			<issue>3</issue>
			<fpage>343</fpage>
			<lpage>362</lpage>
			<history>
				<date date-type="received">
					<day>01</day>
					<month>10</month>
					<year>2006</year>
				</date>
			</history>
			<abstract>
				<p>One of the tasks of data mining is classification, which provides a mapping from attributes (observations) to pre-specified classes. Classification models are built by using underlying data. In principle, the models built with more data yield better results. However, the relationship between the available data and the performance is not well understood, except that the accuracy of a classification model has diminishing improvements as a function of data size. In this paper, we present an approach for an early assessment of the extracted knowledge (classification models) in the terms of performance (accuracy), based on the amount of data used. The assessment is based on the observation of the performance on smaller sample sizes. The solution is formally defined and used in an experiment. In experiments we show the correctness and utility of the approach.</p>
			</abstract>
			<kwd-group>
				<label>Keywords</label>
				<kwd>assessment</kwd>
				<kwd>classification</kwd>
				<kwd>accuracy</kwd>
				<kwd>learning curve</kwd>
				<kwd>sampling</kwd>
			</kwd-group>
		</article-meta>
	</front>
</article>