<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
	<front>
		<journal-meta>
			<journal-id journal-id-type="publisher-id">INFORMATICA</journal-id>
			<journal-title-group>
				<journal-title>Informatica</journal-title>
			</journal-title-group>
			<issn pub-type="epub">0868-4952</issn>
			<issn pub-type="ppub">0868-4952</issn>
			<publisher>
				<publisher-name>VU</publisher-name>
			</publisher>
		</journal-meta>
		<article-meta>
			<article-id pub-id-type="publisher-id">inf20203</article-id>
			<article-id pub-id-type="doi">10.15388/Informatica.2009.245</article-id>
			<article-categories>
				<subj-group subj-group-type="heading">
					<subject>Research article</subject>
				</subj-group>
			</article-categories>
			<title-group>
				<article-title>On a Minimal Spanning Tree Approach in the Cluster Validation Problem</article-title>
			</title-group>
			<contrib-group>
				<contrib contrib-type="Author">
					<name>
						<surname>Barzily</surname>
						<given-names>Zeev</given-names>
					</name>
					<email xlink:href="mailto:zbarzily@braude.ac.il">zbarzily@braude.ac.il</email>
					<xref ref-type="aff" rid="j_INFORMATICA_aff_000"/>
				</contrib>
				<contrib contrib-type="Author">
					<name>
						<surname>Volkovich</surname>
						<given-names>Zeev</given-names>
					</name>
					<email xlink:href="mailto:vlvolkov@braude.ac.il">vlvolkov@braude.ac.il</email>
					<xref ref-type="aff" rid="j_INFORMATICA_aff_000"/>
				</contrib>
				<contrib contrib-type="Author">
					<name>
						<surname>Akteke-Öztürk</surname>
						<given-names>Başak</given-names>
					</name>
					<email xlink:href="mailto:bozturk@metu.edu.tr">bozturk@metu.edu.tr</email>
					<xref ref-type="aff" rid="j_INFORMATICA_aff_001"/>
				</contrib>
				<contrib contrib-type="Author">
					<name>
						<surname>Weber</surname>
						<given-names>Gerhard-Wilhelm</given-names>
					</name>
					<email xlink:href="mailto:gweber@metu.edu.tr">gweber@metu.edu.tr</email>
					<xref ref-type="aff" rid="j_INFORMATICA_aff_001"/>
					<xref ref-type="aff" rid="j_INFORMATICA_aff_002"/>
				</contrib>
				<aff id="j_INFORMATICA_aff_000">ORT Braude College of Engineering, 21982 Karmiel, Israel</aff>
				<aff id="j_INFORMATICA_aff_001">Institute of Applied Mathematics, Middle East Technical University, 06531 Ankara, Turkey</aff>
				<aff id="j_INFORMATICA_aff_002">Faculty of Economics, Business and Law, University Siegen, Hölderlinstrasse 3, 57076 Germany</aff>
			</contrib-group>
			<pub-date pub-type="epub">
				<day>01</day>
				<month>01</month>
				<year>2009</year>
			</pub-date>
			<volume>20</volume>
			<issue>2</issue>
			<fpage>187</fpage>
			<lpage>202</lpage>
			<history>
				<date date-type="received">
					<day>01</day>
					<month>08</month>
					<year>2008</year>
				</date>
				<date date-type="accepted">
					<day>01</day>
					<month>12</month>
					<year>2008</year>
				</date>
			</history>
			<abstract>
				<p>In this paper, a method for the study of cluster stability is purposed. We draw pairs of samples from the data, according to two sampling distributions. The first distribution corresponds to the high density zones of data-elements distribution. Thus it is associated with the clusters cores. The second one, associated with the cluster margins, is related to the low density zones. The samples are clustered and the two obtained partitions are compared. The partitions are considered to be consistent if the obtained clusters are similar. The resemblance is measured by the total number of edges, in the clusters minimal spanning trees, connecting points from different samples. We use the Friedman and Rafsky two sample test statistic. Under the homogeneity hypothesis, this statistic is normally distributed. Thus, it can be expected that the true number of clusters corresponds to the statistic empirical distribution which is closest to normal. Numerical experiments demonstrate the ability of the approach to detect the true number of clusters.</p>
			</abstract>
			<kwd-group>
				<label>Keywords</label>
				<kwd>clustering</kwd>
				<kwd>cluster validation</kwd>
				<kwd>minimal spanning tree</kwd>
				<kwd>two sample test</kwd>
			</kwd-group>
		</article-meta>
	</front>
</article>