<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
	<front>
		<journal-meta>
			<journal-id journal-id-type="publisher-id">INFORMATICA</journal-id>
			<journal-title-group>
				<journal-title>Informatica</journal-title>
			</journal-title-group>
			<issn pub-type="epub">0868-4952</issn>
			<issn pub-type="ppub">0868-4952</issn>
			<publisher>
				<publisher-name>VU</publisher-name>
			</publisher>
		</journal-meta>
		<article-meta>
			<article-id pub-id-type="publisher-id">inf17405</article-id>
			<article-id pub-id-type="doi">10.15388/Informatica.2006.153</article-id>
			<article-categories>
				<subj-group subj-group-type="heading">
					<subject>Research article</subject>
				</subj-group>
			</article-categories>
			<title-group>
				<article-title>Efficient Adaptive Algorithms for Transposing Small and Large Matrices on Symmetric Multiprocessors</article-title>
			</title-group>
			<contrib-group>
				<contrib contrib-type="Author">
					<name>
						<surname>Na'mneh</surname>
						<given-names>Rami Al</given-names>
					</name>
					<xref ref-type="aff" rid="j_INFORMATICA_aff_000"/>
				</contrib>
				<contrib contrib-type="Author">
					<name>
						<surname>Pan</surname>
						<given-names>W. David</given-names>
					</name>
					<email xlink:href="mailto:dwpan@ece.uah.edu">dwpan@ece.uah.edu</email>
					<xref ref-type="aff" rid="j_INFORMATICA_aff_000"/>
				</contrib>
				<contrib contrib-type="Author">
					<name>
						<surname>Yoo</surname>
						<given-names>Seong-Moo</given-names>
					</name>
					<xref ref-type="aff" rid="j_INFORMATICA_aff_000"/>
				</contrib>
				<aff id="j_INFORMATICA_aff_000">Department of Electrical and Computer Engineering, University of Alabama in Huntsville, 301 Sparkman Drive, Huntsville, Alabama 35899, USA</aff>
			</contrib-group>
			<pub-date pub-type="epub">
				<day>01</day>
				<month>01</month>
				<year>2006</year>
			</pub-date>
			<volume>17</volume>
			<issue>4</issue>
			<fpage>535</fpage>
			<lpage>550</lpage>
			<history>
				<date date-type="received">
					<day>01</day>
					<month>11</month>
					<year>2005</year>
				</date>
			</history>
			<abstract>
				<p>Matrix transpose in parallel systems typically involves costly all-to-all communications. In this paper, we provide a comparative characterization of various efficient algorithms for transposing small and large matrices using the popular symmetric multiprocessors (SMP) architecture, which carries a relatively low communication cost due to its large aggregate bandwidth and low-latency inter-process communication. We conduct analysis on the cost of data sending / receiving and the memory requirement of these matrix-transpose algorithms. We then propose an adaptive algorithm that can minimize the overhead of the matrix transpose operations given the parameters such as the data size, number of processors, start-up time, and the effective communication bandwidth.</p>
			</abstract>
			<kwd-group>
				<label>Keywords</label>
				<kwd>matrix transpose</kwd>
				<kwd>SMP</kwd>
				<kwd>MPI</kwd>
				<kwd>all-to-all communication</kwd>
			</kwd-group>
		</article-meta>
	</front>
</article>