<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">INFORMATICA</journal-id>
<journal-title-group><journal-title>Informatica</journal-title></journal-title-group>
<issn pub-type="epub">1822-8844</issn>
<issn pub-type="ppub">0868-4952</issn>
<issn-l>0868-4952</issn-l>
<publisher>
<publisher-name>Vilnius University</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">INFO1229</article-id>
<article-id pub-id-type="doi">10.15388/Informatica.2019.223</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Research Article</subject></subj-group></article-categories>
<title-group>
<article-title>Multi-Pose Face Recognition Using Pairwise Supervised Dictionary Learning</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Farahani</surname><given-names>Ali</given-names></name><email xlink:href="cpt.mazi@gmail.com">cpt.mazi@gmail.com</email><xref ref-type="aff" rid="j_info1229_aff_001"/><bio>
<p><bold>A. Farahani</bold> received his BSc from Arak University, Arak, Iran, in software engineering in 2014. Then he pursued his MSc in artificial intelligence in Shahid Bahonar University of Kerman, Iran, and received his master degree in 2017 under the supervision of Dr Hadis Mohseni. His research interests include pattern recognition, supervised and unsupervised learning methods and their applications in image processing.</p></bio>
</contrib>
<contrib contrib-type="author">
<name><surname>Mohseni</surname><given-names>Hadis</given-names></name><email xlink:href="hmohseni@uk.ac.ir">hmohseni@uk.ac.ir</email><xref ref-type="aff" rid="j_info1229_aff_001"/><xref ref-type="corresp" rid="cor1">∗</xref><bio>
<p><bold>H. Mohseni</bold> received his BSc in hardware engineering from Sharif University of Technology (SUT), Tehran, Iran, in 2004. Then she continued her MSc in artificial intelligence in SUT and received her degree in 2007 working on medical image processing. She then pursued her PhD in artificial intelligence in SUT working on multi-pose face recognition and received her PhD degree in 2013 under the supervision of Prof. Shohreh Kasaei. Now she is an assistant professor in Shahid Bahonar University of Kerman and her research interests include pattern recognition, image and video processing, medical image processing and deep learning.</p></bio>
</contrib>
<aff id="j_info1229_aff_001">Department of Computer Engineering, <institution>Shahid Bahonar University of Kerman</institution>, Pazhouhesh Square, Kerman, <country>Iran</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2019</year></pub-date>
<pub-date pub-type="epub"><day>1</day><month>1</month><year>2019</year></pub-date><volume>30</volume><issue>4</issue><fpage>647</fpage><lpage>670</lpage>
<history>
<date date-type="received"><month>3</month><year>2018</year></date>
<date date-type="accepted"><month>4</month><year>2019</year></date>
</history>
<permissions><copyright-statement>© 2019 Vilnius University</copyright-statement><copyright-year>2019</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>A major challenge in face recognition is handling large pose variations. Here, we proposed to tackle this challenge by a three step sparse representation based method: estimating the pose of an unseen non-frontal face image, generating its virtual frontal view using learned view-dependent dictionaries, and classifying the generated frontal view. It is assumed that for a specific identity, the representation coefficients based on the view dictionary are invariant to pose and view-dependent frontal view generation transformations are learned based on pair-wise supervised dictionary learning. Experiments conducted on FERET and CMU-PIE face databases depict the efficacy of the proposed method.</p>
</abstract>
<kwd-group>
<label>Key words</label>
<kwd>face recognition</kwd>
<kwd>multi-pose</kwd>
<kwd>sparse representation</kwd>
<kwd>supervised dictionary learning</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="j_info1229_s_001">
<label>1</label>
<title>Introduction</title>
<p>Face recognition is one of the most important biometric techniques that has clear advantages over other biometric techniques, e.g. it is non-intrusive, natural and passive, where other biometric techniques such as fingerprint recognition and iris recognition require cooperative subjects. To enjoy the non-intrusive nature of face recognition, a system should be able to recognize a face in an uncontrolled environment and an arbitrary situation without the notice of the subject (Zhang and Gao, <xref ref-type="bibr" rid="j_info1229_ref_032">2009</xref>). This brings serious challenges to face recognition techniques due to this generality in the environment and situation. Conducted evaluations on state-of-the-art face recognition techniques during the past several years, such as the FERET evaluation and the FAT 2004 (Phillips <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_024">2000</xref>; Messer <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_019">2004</xref>), have confirmed that there are several major challenges in current face recognition systems which are age, pose, illumination, expression, size, etc. variations. Also, face occlusion with hair, sunglasses, make-up, etc. can bring inconvenience for face recognition techniques. Although most of current face recognition techniques work well under constrained conditions, they fail under uncontrolled cases (e.g. outdoor with uncooperative subjects), since they are sensitive to the mentioned variations on face image.</p>
<p>One of the interesting facts about face images is that face changes occurred by pose variation in images of one identity may be larger than face changes occurred by identity variation in a fixed pose. Face images of different identities in the same pose resemble each other and the differences between them are subtle (Nastar and Mitschke, <xref ref-type="bibr" rid="j_info1229_ref_022">1998</xref>). This proves the difficulty of multi-pose face recognition, which is a bottleneck for most current face recognition technologies. Therefore, among all variations on the face, in this paper, we concentrate on pose variation challenge. However, as the experimental results will show, the proposed method can tolerate, to some extent, the illumination variation in face images, too.</p>
<p>We organized the rest of the paper as follows. Section <xref rid="j_info1229_s_002">2</xref> reviews some related works in literature for face recognition application. Section <xref rid="j_info1229_s_003">3</xref> reviews related concepts on sparse representation based classification. The proposed method for multi-pose face recognition is presented in Section <xref rid="j_info1229_s_004">4</xref>. Extensive experiments are carried out in Section <xref rid="j_info1229_s_009">5</xref> and the experimental results are compared with the results from remarkable algorithms developed before in the literature. Finally, we conclude this paper in Section <xref rid="j_info1229_s_014">6</xref>.</p>
</sec>
<sec id="j_info1229_s_002">
<label>2</label>
<title>Related Works</title>
<p>As mentioned in Introduction, one major problem in multi-view face recognition is that the variation in pose may cause changes in face image that are larger than that caused by variation in identity. There are many methods that perform well when training and testing face images are within similar condition and poses, but due to the mentioned difficulties, fewer methods have been proposed in handling the problem of recognizing faces in arbitrary poses.</p>
<p>Multi-Pose face recognition approaches that already have been proposed in the literature can be summarized in two main categories as follows (Zhang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_031">2013</xref>):</p>
<list>
<list-item id="j_info1229_li_001">
<label>1.</label>
<p>Multi-view approaches that expend the training and/or testing set to encompass more face poses to form a relatively more robust feature set.</p>
</list-item>
<list-item id="j_info1229_li_002">
<label>2.</label>
<p>Invariant approaches which perform some particular transformation to eliminate the variation caused by pose change or to reduce its adverse effect on the final recognition.</p>
</list-item>
</list>
<p>Among the methods of the first category, the view-based methods are widely used (Murase and Nayar, <xref ref-type="bibr" rid="j_info1229_ref_020">1993</xref>; Pentl <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_023">1994</xref>; McKenna <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_018">1996</xref>; Zhou <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_036">2001</xref>). For instance, view-based Eigenface was proposed to extend the Eigenface to handle the pose problem. One disadvantage of view-based methods is that for each subject, these methods usually require multiple face images with different poses, which is infeasible in real-world applications.</p>
<p>Gross <italic>et al</italic>. proposed the Eigen Light Field (ELF) (Gross <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_012">2004</xref>) method to tackle the pose problem. ELF first estimates the ELF’s of the identity’s head from the input images. Then, the test and gallery images are matched by comparing the ELF coefficients. Compared to view-based methods, ELF needs an extra independent training set (different from the gallery) that contains multiple images of varying poses for each subject. However, in the recognition stage, one face is recognized even if he/she has only one image in the gallery. Providing additional images in different poses brought more depth information of the human face structure, and consequently results in better-reconstructed models compared to the models that use only a single gallery image. However, this method puts a restriction on data collections requirements, because many existing face databases might only contain a few (even single) gallery images (Zhou <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_036">2001</xref>).</p>
<p>Pentl <italic>et al.</italic> (<xref ref-type="bibr" rid="j_info1229_ref_023">1994</xref>) proposed tied factor analysis model (TFA) to describe pose variation on face images and achieved state-of-the-art face recognition performance under large pose variations. They made an assumption that each identity can be described by an identity vector and images of a single identity in different poses can be generated using this identity vector. This is done by performing identity-independent (but pose-dependent) linear transformation. The identity vectors and the parameters of the linear transformation are estimated using a set of training images in different known poses and the EM algorithm. In fact, TFA searches the transformations to achieve feature extractions that are pose-independent. However, as just linear transformations are considered due to computational feasibility, it could not properly describe pose variations for 2D mapped face images which are non-linear transformations (Zhang and Gao, <xref ref-type="bibr" rid="j_info1229_ref_032">2009</xref>).</p>
<p>From the second category of multi-view face recognition approaches, one can mention the methods that generate virtual views. In these methods, all face images are normalized to a pre-defined pose (e.g. frontal pose) or the gallery is expanded to cover large pose variations by generating virtual views. As changing pose causes variations that are closely related to the 3D structure of the face, it is a natural idea to build a 3D model from the 2D input face image (Chai <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_006">2007</xref>). For instance, multi-level quadratic variation minimization (MQVM) (Zhang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_033">2008</xref>) uses two gallery images of the frontal view and side view to reconstruct 3D human face for recognition. One of the most successful methods for 3D face model recovery is 3D Morphable Model (3DMM) (Blanz and Vetter, <xref ref-type="bibr" rid="j_info1229_ref_005">2003</xref>). In this method, PCA is used to model the prior knowledge of face shape and texture. Then any unseen face can be modelled by the linear combination of the prototypes, in which the corresponding shape and texture are expressed by the exemplar faces. The specified 3D face can be recovered from one or more images by optimizing the shape, texture and mapping parameters through an analysis-by-synthesis strategy. However, 3DMM is time-consuming for most real-world applications. To reduce the complexity, Jiang <italic>et al.</italic> (<xref ref-type="bibr" rid="j_info1229_ref_013">2005</xref>) proposed a simplified version of 3DMM to reconstruct the specified 3D face from a single frontal view. They used facial features to reconstruct more efficient personalized 3D face models and their method is based on the automatic detection of facial features on the frontal view.</p>
<p>Unlike 3D model-based approaches, learning-based approaches generally try to learn how to estimate a virtual view directly in 2D space. In Lee and Kim (<xref ref-type="bibr" rid="j_info1229_ref_015">2006</xref>), a method is proposed to generate frontal view face images using a linear transformation in feature space. Features are extracted from non-frontal face images using kernel PCA and then, a transformation from non-frontal view face image to its corresponding frontal view is applied. The transformation is obtained by a least-squared error learning process. As another example of learning-based methods, the Active Appearance Model (AAM) (Cootes <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_007">2001</xref>) fits an input face image to the pre-learned face model, which consists of separated shape and appearance models. Beymer (<xref ref-type="bibr" rid="j_info1229_ref_004">1994</xref>) proposed a parallel deformation to generate virtual views covering a set of possible poses from a single example view using feature-based 2D wrapping. In this method, a 2D transformation from a standard pose to a target pose is learned. To synthesize a virtual view of gallery faces in the same target pose, the real view in the standard pose is parallel deformed based on the learned 2D transformation on the prototype face.</p>
<fig id="j_info1229_fig_001">
<label>Fig. 1</label>
<caption>
<p>Flowchart of the proposed method.</p>
</caption>
<graphic xlink:href="info1229_g001.jpg"/>
</fig>
<p>Another class of methods lying in the second category of multi-view face recognition approaches are subspace-based methods such as Belhumeur <italic>et al.</italic> (<xref ref-type="bibr" rid="j_info1229_ref_003">1997</xref>). These methods seek for the most representative subspace for dimension reduction and feature extraction. Fisherface approach is applied to expressly provide the discrimination among classes when multiple training data per class are available. Through the training process, the ratio of the between-class difference to the within-class difference is to be maximized to find a base of vectors that best discriminate the classes. In Modular PCA (MPCA) (Gottumukkal and Asari, <xref ref-type="bibr" rid="j_info1229_ref_011">2004</xref>) face images are divided into smaller regions and the PCA approach is applied to each of these regions. Since some of the local facial features of a subject do not vary in pose variation, it is expected that the MPCA be able to cope with pose variation.</p>
<p>The nearest subspace (NS) is a method of the second category that generalizes NN (Nearest Neighbour) method in the sense it classifies the test sample based on the best linear representation of all training samples in each class. The sparse representation based classification (SRC) method (Wright <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_029">2009</xref>) is a further generalization of NS by representing the test sample using the training samples selected from all samples, both within and across different classes.</p>
<p>In this paper, we propose a novel multi-pose face recognition method by formulating virtual frontal view generation via sparse representation as a prediction problem. The proposed method lies in the second category of multi-pose face recognition. Figure <xref rid="j_info1229_fig_001">1</xref> shows a flowchart of the proposed method. As shown in Fig. <xref rid="j_info1229_fig_001">1</xref>, the proposed method includes 3 steps:</p>
<list>
<list-item id="j_info1229_li_003">
<label>1.</label>
<p>Pose estimation step: in many experiments, it has been proved that knowing the pose of the unseen face image can be helpful in recognizing its identity. Contrary to most face recognition methods that assume prior knowledge about the pose, the proposed method estimates the pose of face image using the sparse representation idea. The proposed pose estimation method is based on the assumption that sparse representation coefficients of different identities in the same pose are closer than representation coefficients of images of the same identity in different poses. Therefore, using the sparse representation of unseen face image over training images of the same pose and repeating this for all poses, one can estimate the pose based on minimizing the reconstruction error on different poses.</p>
</list-item>
<list-item id="j_info1229_li_004">
<label>2.</label>
<p>Virtual frontal view generation step: according to the estimated pose in the previous step, a non-linear mapping is applied to the non-frontal face image in order to generate its virtual frontal view image. This virtual frontal view is then used for the aim of classification. The mapping used in this step is based on sparse representation and the supervised dictionary learning concept. In fact, this step aims to learn view-dependent dictionaries which will be used in the generation of the virtual frontal image from a specific view.</p>
</list-item>
<list-item id="j_info1229_li_005">
<label>3.</label>
<p>Classification (recognition) step: in this step, recognition of the unseen face image is done based on its virtual generated frontal view and an SRC-based classifier.</p>
</list-item>
</list>
<p>Therefore, all three steps of the proposed method are based on sparse representation which could be considered as an advantage of the proposed method. Extensive experiments have been conducted on the CMU-PIE (Sim <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_027">2002</xref>) and FERET (Phillips <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_024">2000</xref>) face databases to evaluate the efficacy of the proposed method. Section <xref rid="j_info1229_s_003">3</xref> will explain the basic concepts of sparse representation which is the main component in all steps of the proposed method.</p>
</sec>
<sec id="j_info1229_s_003">
<label>3</label>
<title>Sparse Representation</title>
<p>Sparse Coding or Sparse Representation (SR) is a powerful tool in high-dimensional signal processing which has shown strong performance in applications of computer vision, especially face recognition (Wright <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_028">2010</xref>). It uses a dictionary of base functions (atoms), so each input signal can be approximated by a linear combination of just a sparse subset of atoms. It should be noted that based on the sparse representation theory, similar signals in the same class are expected to be approximated by a similar subset of atoms.</p>
<p>Suppose that there are <italic>N</italic> training samples from <italic>C</italic> different classes that are arranged in matrix <inline-formula id="j_info1229_ineq_001"><alternatives>
<mml:math><mml:mi mathvariant="italic">A</mml:mi><mml:mo>=</mml:mo><mml:mo fence="true" stretchy="false">[</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">N</mml:mi></mml:mrow></mml:msub><mml:mo fence="true" stretchy="false">]</mml:mo><mml:mo stretchy="false">∈</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="italic">R</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">d</mml:mi><mml:mo>×</mml:mo><mml:mi mathvariant="italic">N</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[$A=[{a_{1}},{a_{2}},\dots ,{a_{N}}]\in {R^{(d\times N)}}$]]></tex-math></alternatives></inline-formula> where each sample has <italic>d</italic> features and the label vector <inline-formula id="j_info1229_ineq_002"><alternatives>
<mml:math><mml:mi mathvariant="italic">L</mml:mi><mml:mo>=</mml:mo><mml:mo fence="true" stretchy="false">[</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">N</mml:mi></mml:mrow></mml:msub><mml:mo fence="true" stretchy="false">]</mml:mo></mml:math>
<tex-math><![CDATA[$L=[{l_{1}},{l_{2}},\dots ,{l_{N}}]$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_info1229_ineq_003"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">∈</mml:mo><mml:mo fence="true" stretchy="false">[</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">C</mml:mi><mml:mo fence="true" stretchy="false">]</mml:mo></mml:math>
<tex-math><![CDATA[${l_{i}}\in [1,\dots ,C]$]]></tex-math></alternatives></inline-formula> stores the label of samples. The sparse representation of test sample <inline-formula id="j_info1229_ineq_004"><alternatives>
<mml:math><mml:mi mathvariant="italic">y</mml:mi><mml:mo stretchy="false">∈</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="italic">R</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">d</mml:mi></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[$y\in {R^{d}}$]]></tex-math></alternatives></inline-formula> over training samples <italic>A</italic> can be obtained by solving Eq. (<xref rid="j_info1229_eq_001">1</xref>) (Elad, <xref ref-type="bibr" rid="j_info1229_ref_009">2010</xref>). 
<disp-formula id="j_info1229_eq_001">
<label>(1)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mo movablelimits="false">arg</mml:mo><mml:munder><mml:mrow><mml:mo movablelimits="false">min</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow></mml:munder><mml:mo stretchy="false">‖</mml:mo><mml:mi mathvariant="italic">y</mml:mi><mml:mo>−</mml:mo><mml:mi mathvariant="italic">A</mml:mi><mml:mi mathvariant="italic">x</mml:mi><mml:msubsup><mml:mrow><mml:mo stretchy="false">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi mathvariant="italic">λ</mml:mi><mml:mo stretchy="false">‖</mml:mo><mml:mi mathvariant="italic">x</mml:mi><mml:msub><mml:mrow><mml:mo stretchy="false">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ \hat{x}=\arg \underset{x}{\min }\| y-Ax{\| _{2}^{2}}+\lambda \| x{\| _{1}},\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_info1229_ineq_005"><alternatives>
<mml:math><mml:mo stretchy="false">‖</mml:mo><mml:mi mathvariant="italic">x</mml:mi><mml:msub><mml:mrow><mml:mo stretchy="false">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[$\| x{\| _{1}}$]]></tex-math></alternatives></inline-formula> is the <inline-formula id="j_info1229_ineq_006"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${l_{1}}$]]></tex-math></alternatives></inline-formula>-norm of <italic>x</italic> and it is a measure of sparsity and <inline-formula id="j_info1229_ineq_007"><alternatives>
<mml:math><mml:mo stretchy="false">‖</mml:mo><mml:mi mathvariant="italic">x</mml:mi><mml:msub><mml:mrow><mml:mo stretchy="false">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[$\| x{\| _{2}}$]]></tex-math></alternatives></inline-formula> is the <inline-formula id="j_info1229_ineq_008"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${l_{2}}$]]></tex-math></alternatives></inline-formula>-norm of <italic>x. λ</italic> is the Lagrangian coefficient and regularizes the pressure of sparsity of <italic>x</italic> and corresponding reconstruction error in the first term. Eq. (<xref rid="j_info1229_eq_001">1</xref>) is a relaxed version of a non-convex and an NP-Hard problem that uses <inline-formula id="j_info1229_ineq_009"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${l_{0}}$]]></tex-math></alternatives></inline-formula>-norm instead of <inline-formula id="j_info1229_ineq_010"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${l_{1}}$]]></tex-math></alternatives></inline-formula>-norm.</p>
<p>If sparse representation is used for classification aim (SRC), the class label of test sample <italic>y</italic> can be obtained based on the minimum reconstruction error criteria as follows: 
<disp-formula id="j_info1229_eq_002">
<label>(2)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">c</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mo movablelimits="false">arg</mml:mo><mml:munder><mml:mrow><mml:mo movablelimits="false">min</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">c</mml:mi></mml:mrow></mml:munder><mml:msub><mml:mrow><mml:mo fence="true" maxsize="1.19em" minsize="1.19em" stretchy="true">‖</mml:mo><mml:mi mathvariant="italic">y</mml:mi><mml:mo>−</mml:mo><mml:mi mathvariant="italic">A</mml:mi><mml:msub><mml:mrow><mml:mi mathvariant="italic">Z</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">c</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">x</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo fence="true" maxsize="1.19em" minsize="1.19em" stretchy="true">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ \hat{c}=\arg \underset{c}{\min }{\big\| y-A{Z_{c}}(x)\big\| _{2}},\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_info1229_ineq_011"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">Z</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">c</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">x</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo>:</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">ℜ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">N</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">→</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">ℜ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">N</mml:mi></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[${Z_{c}}(x):{\mathrm{\Re }^{N}}\to {\mathrm{\Re }^{N}}$]]></tex-math></alternatives></inline-formula> is a selection operator that selects coefficients associated with class <italic>c</italic> from vector <italic>x</italic> and sets other coefficients to zero.</p>
<p>As already mentioned, a dictionary is a set of basis data (atoms) based on which the sparse representation is obtained. Dictionary atoms can be chosen from raw training data or pre-constructed dictionaries, such as undecimated wavelets, contourlets, curvelets, and more. Although pre-constructed dictionaries result in fast transforms, they are usually limited in sparsifying the signals that they are designed to represent. Alternatively, one can use learning methods to obtain a tunable dictionary in which each atom is generated by controlling some parameters. This is called Supervised Dictionary Learning (SDL) and learned dictionaries are expected to be able to adapt to different input samples (Elad, <xref ref-type="bibr" rid="j_info1229_ref_009">2010</xref>). The proposed method uses both pre-constructed dictionaries and supervised dictionary learning to have a proper dictionary in each step and the best performance in general.</p>
<p>The next section will explain the proposed method and how the sparse representation concept can be extended to have good performance on multi-pose face recognition.</p>
</sec>
<sec id="j_info1229_s_004">
<label>4</label>
<title>Proposed Method</title>
<p>In this section, a sparse representation based multi-pose face recognition method is proposed which consists of three main steps:</p>
<list>
<list-item id="j_info1229_li_006">
<label>1.</label>
<p>Estimating the pose of a given face image based on SRC.</p>
</list-item>
<list-item id="j_info1229_li_007">
<label>2.</label>
<p>Generating a virtual frontal view of the given face image according to the estimated pose and the learned SR-based non-frontal to frontal view mapping.</p>
</list-item>
<list-item id="j_info1229_li_008">
<label>3.</label>
<p>Recognizing the face image using the generated frontal view and SRC.</p>
</list-item>
</list>
<p>The most important step in the proposed method is generating a virtual frontal view of a non-frontal face image. Since the proposed method learns view-dependent transformations to map the non-frontal face image to the frontal one, choosing the proper transformation needs the pose of the face image. Therefore, the first step of the proposed method is devoted to pose estimation. According to the estimated pose, a non-frontal to frontal view mapping is applied which generates a virtual frontal face image. Having the virtual frontal view in hand, SRC is used for the aim of face recognition. The following subsections explain these three steps of the proposed method in more details.</p>
<sec id="j_info1229_s_005">
<label>4.1</label>
<title>Pose Estimation Based on SRC</title>
<p>As mentioned earlier, prior knowledge of the pose of a face is an essential information in many face recognition techniques. It is often beneficial if the pose angle of the input face image can be estimated before recognition such as in modular PCA (MPCA) (Pentl <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_023">1994</xref>) and eigen light-field (Gross <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_012">2004</xref>). There are many efforts for automatic pose estimation in the literature. As the focus of this paper is on face recognition and not pose estimation, an interested reader is referred to Murphy-Chutorian and Trivedi (<xref ref-type="bibr" rid="j_info1229_ref_021">2009</xref>) and Ding and Tao (<xref ref-type="bibr" rid="j_info1229_ref_008">2016</xref>) as good surveys on face pose estimation.</p>
<p>It is obvious that two face images from different identities in the same pose are visually more similar than two face images of an identity in different poses. This can be used as a clue for face pose estimation aim, a face image in a specific pose can be estimated by a linear combination of face images of other subjects in the same pose. The proposed pose estimation method is based on the assumption that sparse representation coefficients of different identities in the same pose are closer than representation coefficients of images of the same identity in different poses. Therefore, using the sparse representation of unseen face image over training images of the same pose and repeating this for all poses, one can estimate the pose based on minimizing the reconstruction error on different poses. A similar idea has been used in Yu and Liu (<xref ref-type="bibr" rid="j_info1229_ref_030">2014</xref>) where it is assumed that a face image in a specific pose cannot be approximated by a combination of face images in other poses.</p>
<p>Suppose there are <italic>P</italic> classes of different poses <inline-formula id="j_info1229_ineq_012"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${A_{p}}$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_info1229_ineq_013"><alternatives>
<mml:math><mml:mi mathvariant="italic">p</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">P</mml:mi></mml:math>
<tex-math><![CDATA[$p=1,\dots ,P$]]></tex-math></alternatives></inline-formula>. The <italic>p</italic>-th class <inline-formula id="j_info1229_ineq_014"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo fence="true" stretchy="false">[</mml:mo><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo mathvariant="normal">,</mml:mo><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">a</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="italic">n</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:mo fence="true" stretchy="false">]</mml:mo><mml:mo stretchy="false">∈</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">ℜ</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">d</mml:mi><mml:msub><mml:mrow><mml:mi mathvariant="italic">n</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[${A_{p}}=[{a_{p}^{1}},{a_{p}^{2}},\dots ,{a_{p}^{{n_{p}}}}]\in {\mathrm{\Re }^{(d{n_{p}})}}$]]></tex-math></alternatives></inline-formula> is called the view dictionary of pose <italic>p</italic> that has <inline-formula id="j_info1229_ineq_015"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">n</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${n_{p}}$]]></tex-math></alternatives></inline-formula> training face images from different identities in this pose and <italic>d</italic> is the dimension of each face image. Based on SR theory, every unseen face image <inline-formula id="j_info1229_ineq_016"><alternatives>
<mml:math><mml:mi mathvariant="italic">y</mml:mi><mml:mo stretchy="false">∈</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">ℜ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">d</mml:mi></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[$y\in {\mathrm{\Re }^{d}}$]]></tex-math></alternatives></inline-formula> is expected to be expressed as a sparse representation of images in matrix <inline-formula id="j_info1229_ineq_017"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${A_{p}}$]]></tex-math></alternatives></inline-formula> in a particular pose. The sparse coefficients of image <italic>y</italic> over the view dictionary <inline-formula id="j_info1229_ineq_018"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${A_{p}}$]]></tex-math></alternatives></inline-formula> is the <inline-formula id="j_info1229_ineq_019"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\hat{x}_{p}}$]]></tex-math></alternatives></inline-formula> vector that can be obtained as follows: 
<disp-formula id="j_info1229_eq_003">
<label>(3)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo movablelimits="false">arg</mml:mo><mml:munder><mml:mrow><mml:mo movablelimits="false">min</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munder><mml:mo stretchy="false">‖</mml:mo><mml:mi mathvariant="italic">y</mml:mi><mml:mo>−</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mo stretchy="false">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi mathvariant="italic">λ</mml:mi><mml:mo stretchy="false">‖</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mo stretchy="false">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {\hat{x}_{p}}=\arg \underset{{x_{p}}}{\min }\| y-{A_{p}}{x_{p}}{\| _{2}^{2}}+\lambda \| {x_{p}}{\| _{1}},\]]]></tex-math></alternatives>
</disp-formula> 
where <italic>λ</italic> is the regularization parameter as before. Therefore, face image <italic>y</italic> is reconstructed based on different view dictionaries. The view dictionary that reconstructs the face image with minimum error determines the pose of the face image <italic>y</italic>. In other words, the pose of face image <italic>y</italic> is estimated based on minimizing the reconstruction error among all view dictionaries: 
<disp-formula id="j_info1229_eq_004">
<label>(4)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mo movablelimits="false">min</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:munder><mml:mo stretchy="false">‖</mml:mo><mml:mi mathvariant="italic">y</mml:mi><mml:mo>−</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mo stretchy="false">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo><mml:mspace width="1em"/><mml:mi mathvariant="italic">p</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mo movablelimits="false">…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">P</mml:mi><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ \hat{p}=\underset{p}{\min }\| y-{A_{p}}{x_{p}}{\| _{2}},\hspace{1em}p=1,\dots ,P,\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_info1229_ineq_020"><alternatives>
<mml:math><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:math>
<tex-math><![CDATA[$\hat{p}$]]></tex-math></alternatives></inline-formula> is the estimated pose. Actually, this shows that <inline-formula id="j_info1229_ineq_021"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${A_{\hat{p}}}$]]></tex-math></alternatives></inline-formula> is the best view dictionary that can reconstruct the input face image from a linear combination of its set of face images. The proposed pose estimation algorithm is summarized in Algorithm <xref rid="j_info1229_fig_002">1</xref>. Figure <xref rid="j_info1229_fig_003">2</xref> represents an example of the proposed pose estimation method where there are seven different poses (seven view dictionaries).</p>
<fig id="j_info1229_fig_002">
<label>Algorithm 1</label>
<caption>
<p>Sparse representation based pose estimation.</p>
</caption>
<graphic xlink:href="info1229_g002.jpg"/>
</fig>
<fig id="j_info1229_fig_003">
<label>Fig. 2</label>
<caption>
<p>An example of pose estimation based on sparse representation. (a) reconstructed input face image over 7 different view dictionaries <inline-formula id="j_info1229_ineq_022"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${A_{p}}$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_info1229_ineq_023"><alternatives>
<mml:math><mml:mi mathvariant="italic">p</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:mn>7</mml:mn></mml:math>
<tex-math><![CDATA[$p=1,\dots ,7$]]></tex-math></alternatives></inline-formula> and (b) reconstruction error for each pose. Reconstruction error for 7-th pose is minimum and the input face image is supposed to be in this pose.</p>
</caption>
<graphic xlink:href="info1229_g003.jpg"/>
</fig>
<p>Figure <xref rid="j_info1229_fig_003">2</xref>(a) shows the input face image on the top and the 7-th reconstructed face images with respect to seven view dictionaries below that. As it is obvious, reconstructed image from the last dictionary <inline-formula id="j_info1229_ineq_024"><alternatives>
<mml:math><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mn>7</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$({A_{7}})$]]></tex-math></alternatives></inline-formula> seems to be the most similar one to the input face image, where the reconstruction error plot in Fig. <xref rid="j_info1229_fig_003">2</xref>(b) confirms this consequence. Thus, the input face image is supposed to be in the last pose.</p>
<p>The proposed pose estimation method has some advantages over many other pose estimation methods. First, there is no assumption on the number of training face images in each pose and view dictionaries can have a different number of atoms. Also, no feature selection or 3D model of the face is required for pose estimation, so no image registration and heavy computation is used. However, the main shortcoming of the proposed pose estimation method is its accuracy drop in small pose intervals which will be discussed more in Section <xref rid="j_info1229_s_011">5.2</xref>.</p>
</sec>
<sec id="j_info1229_s_006">
<label>4.2</label>
<title>Virtual Frontal View Generation</title>
<p>In many face recognition methods, one of the key steps for achieving multi-pose face recognition is pose normalization or virtual frontal view generation. Obviously, a frontal face image contains the most details of the face which are beneficial for face recognition, compared to a non-frontal face image. In order to compensate for the loss of details in non-frontal views, one can try to generate a virtual frontal view from a non-frontal view. In this paper, this task is formulated as a general prediction framework which predicts a mapping from each non-frontal view to the frontal view, where the mapping is identity-independent. The purpose of this mapping is to estimate a frontal face image <inline-formula id="j_info1229_ineq_025"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">∈</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">ℜ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">d</mml:mi></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[${\hat{b}_{1}}\in {\mathrm{\Re }^{d}}$]]></tex-math></alternatives></inline-formula> given its non-frontal face image <inline-formula id="j_info1229_ineq_026"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">∈</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">ℜ</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">d</mml:mi></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[${b_{p}}\in {\mathrm{\Re }^{d}}$]]></tex-math></alternatives></inline-formula> in pose <italic>p</italic>. Modelling the virtual frontal view generation with linear mapping is as follows: 
<disp-formula id="j_info1229_eq_005">
<label>(5)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">V</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">W</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {b_{1}}={V_{p}}({b_{p}})={W_{p}}{b_{p}},\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_info1229_ineq_027"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">V</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mo>.</mml:mo><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[${V_{p}}(.)$]]></tex-math></alternatives></inline-formula> is the linear mapping function and <inline-formula id="j_info1229_ineq_028"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">W</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${W_{p}}$]]></tex-math></alternatives></inline-formula> is the linear mapping matrix for pose <italic>p</italic>. Linear mapping function <inline-formula id="j_info1229_ineq_029"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">V</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mo>.</mml:mo><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[${V_{p}}(.)$]]></tex-math></alternatives></inline-formula> can be achieved via a learning process. GLR and LLR (Chai <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_006">2007</xref>) are general least square problems that use regression-based methods to find a good mapping. Another idea to find the mapping function is introduced in LSRR (Zhang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_031">2013</xref>) which assumes that the face images of one identity observed from different views share the same sparse representation coefficients over different view dictionaries. In other words, suppose <inline-formula id="j_info1229_ineq_030"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">f</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${f_{1}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_info1229_ineq_031"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">f</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${f_{2}}$]]></tex-math></alternatives></inline-formula> are two face images of one identity in poses <inline-formula id="j_info1229_ineq_032"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${p_{1}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_info1229_ineq_033"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${p_{2}}$]]></tex-math></alternatives></inline-formula> , respectively. The sparse representation coefficients of these two face images over view-dependent dictionaries related to <inline-formula id="j_info1229_ineq_034"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${p_{1}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_info1229_ineq_035"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${p_{2}}$]]></tex-math></alternatives></inline-formula> poses are supposed to be similar. Therefore, if sparse representation coefficients of a non-frontal face image of an identity are available over its view dictionary, these coefficients can be used to generate the virtual frontal view of that identity, using the frontal view dictionary. Consequently, considering face images of the <italic>i</italic>-th identity, we have the following set of equations: 
<disp-formula id="j_info1229_eq_006">
<label>(6)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mfenced separators="" open="{" close=""><mml:mrow><mml:mtable equalrows="false" equalcolumns="false" columnalign="left"><mml:mtr><mml:mtd class="array"><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:mspace width="1em"/><mml:mo>⋮</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:mspace width="1em"/><mml:mo>⋮</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mfenced></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ \left\{\begin{array}{l}{b_{1}^{i}}={A_{1}}{x^{i}}+{e_{1}}\\ {} \hspace{1em}\vdots \\ {} {b_{p}^{i}}={A_{p}}{x^{i}}+{e_{p}}\\ {} \hspace{1em}\vdots \\ {} {b_{P}^{i}}={A_{P}}{x^{i}}+{e_{P}},\end{array}\right.\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_info1229_ineq_036"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${A_{p}}$]]></tex-math></alternatives></inline-formula> is the view dictionary of pose <italic>p</italic>, <inline-formula id="j_info1229_ineq_037"><alternatives>
<mml:math><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup></mml:math>
<tex-math><![CDATA[${b_{p}^{i}}$]]></tex-math></alternatives></inline-formula> is the face image of identity <italic>i</italic> in pose <italic>p</italic> and <inline-formula id="j_info1229_ineq_038"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${e_{p}}$]]></tex-math></alternatives></inline-formula> is the reconstruction error in pose <italic>p</italic>. The sparse representation coefficients <inline-formula id="j_info1229_ineq_039"><alternatives>
<mml:math><mml:msup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[${x^{i}}$]]></tex-math></alternatives></inline-formula> are shared among all the <italic>P</italic> views of the <italic>i</italic>-th identity. These equations say that the face image from pose <italic>p</italic> can be generated from sparse representation coefficients <inline-formula id="j_info1229_ineq_040"><alternatives>
<mml:math><mml:msup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[${x^{i}}$]]></tex-math></alternatives></inline-formula> with the corresponding view dictionary <inline-formula id="j_info1229_ineq_041"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${A_{p}}$]]></tex-math></alternatives></inline-formula>. Therefore, the key of virtual view generation lies in the recovery of the sparse representation coefficients <inline-formula id="j_info1229_ineq_042"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${x_{i}}$]]></tex-math></alternatives></inline-formula>. The idea of sharing the sparse representation coefficients among different poses somehow reminds the idea used in Prince <italic>et al.</italic> (<xref ref-type="bibr" rid="j_info1229_ref_025">2008</xref>) where the authors assumed a face manifold and an identity space (latent space) and declared that the representation of each identity does not vary with pose. As another example, one can mention the research done in Sharma <italic>et al.</italic> (<xref ref-type="bibr" rid="j_info1229_ref_026">2012</xref>) which aims to find the sets of projection directions for different poses such that the projected images of the same identity in different poses are maximally correlated in the latent space.</p>
<p>Based on the discussion above, given training samples arranged in different view dictionaries <inline-formula id="j_info1229_ineq_043"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${A_{p}}$]]></tex-math></alternatives></inline-formula> <inline-formula id="j_info1229_ineq_044"><alternatives>
<mml:math><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">p</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">P</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$(p=1,\dots ,P)$]]></tex-math></alternatives></inline-formula>, the non-frontal to frontal mapping function for input face images in pose <italic>p</italic> (<inline-formula id="j_info1229_ineq_045"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${b_{p}}$]]></tex-math></alternatives></inline-formula>s) can be obtained by first finding the sparse representation coefficients of <inline-formula id="j_info1229_ineq_046"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${b_{p}}$]]></tex-math></alternatives></inline-formula> on view dictionary of pose <italic>p</italic>, then utilizing these coefficients with the frontal view dictionary <inline-formula id="j_info1229_ineq_047"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${A_{1}}$]]></tex-math></alternatives></inline-formula> as follows: <disp-formula-group id="j_info1229_dg_001">
<disp-formula id="j_info1229_eq_007">
<label>(7)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="align-odd"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo movablelimits="false">arg</mml:mo><mml:munder><mml:mrow><mml:mo movablelimits="false">min</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munder><mml:mo stretchy="false">‖</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:mo>−</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mo stretchy="false">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi mathvariant="italic">λ</mml:mi><mml:mo stretchy="false">‖</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mo stretchy="false">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {\hat{x}_{p}}=\arg \underset{{x_{p}}}{\min }\| {b_{p}}-{A_{p}}{x_{p}}{\| _{2}^{2}}+\lambda \| {x_{p}}{\| _{1}},\]]]></tex-math></alternatives>
</disp-formula>
<disp-formula id="j_info1229_eq_008">
<label>(8)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true" columnalign="right"><mml:mtr><mml:mtd class="align-odd"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">V</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ {\hat{b}_{1}}={V_{p}}({b_{p}})={A_{1}}{\hat{x}_{p}},\]]]></tex-math></alternatives>
</disp-formula>
</disp-formula-group> where <inline-formula id="j_info1229_ineq_048"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\hat{x}_{p}}$]]></tex-math></alternatives></inline-formula> is the best vector of sparse representation coefficients of <inline-formula id="j_info1229_ineq_049"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${b_{p}}$]]></tex-math></alternatives></inline-formula> over view dictionary <inline-formula id="j_info1229_ineq_050"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${A_{p}}$]]></tex-math></alternatives></inline-formula>, <italic>λ</italic> is the regularization parameter as mentioned before and <inline-formula id="j_info1229_ineq_051"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\hat{b}_{1}}$]]></tex-math></alternatives></inline-formula> is the virtual frontal view corresponding to the non-frontal view <inline-formula id="j_info1229_ineq_052"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${b_{p}}$]]></tex-math></alternatives></inline-formula>.</p>
<p>It is worth noticing that the mapping in Eq. (<xref rid="j_info1229_eq_008">8</xref>) is based on two parameters, view dictionaries and sparse representation coefficients. As the sparse representation coefficients are obtained via an optimization problem based on a view dictionary, one of the factors that play an important role in high accuracy mapping is the selection of view dictionaries. These dictionaries can be simply made form training images of each pose or can be learned more effectively in a dictionary learning process. As training images are accompanied by identity label, using them as view dictionary atoms might not successfully generate a face image from a new identity. In other words, identity-independent view dictionaries are expected to be more efficient in generating face images from new identities. Therefore, one of the key steps for increasing the accuracy of the proposed method in obtaining the sparse representation coefficients and generating the virtual frontal view is to learn desirable identity-independent view dictionaries. The next subsection explains a supervised dictionary learning process to learn <inline-formula id="j_info1229_ineq_053"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${A_{p}}$]]></tex-math></alternatives></inline-formula>s as efficient as possible.</p>
</sec>
<sec id="j_info1229_s_007">
<label>4.3</label>
<title>Supervised View Dictionary Learning</title>
<fig id="j_info1229_fig_004">
<label>Fig. 3</label>
<caption>
<p>Example of sparse dictionary learning. <inline-formula id="j_info1229_ineq_054"><alternatives>
<mml:math><mml:mi mathvariant="italic">B</mml:mi><mml:mo stretchy="false">∈</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">ℜ</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">d</mml:mi><mml:mi mathvariant="italic">P</mml:mi><mml:mo>×</mml:mo><mml:mi mathvariant="italic">I</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[$B\in {\mathrm{\Re }^{(dP\times I)}}$]]></tex-math></alternatives></inline-formula> is the learning face images in <italic>P</italic> different poses for <italic>I</italic> identities, <inline-formula id="j_info1229_ineq_055"><alternatives>
<mml:math><mml:mi mathvariant="italic">A</mml:mi><mml:mo stretchy="false">∈</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">ℜ</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">d</mml:mi><mml:mi mathvariant="italic">P</mml:mi><mml:mo>×</mml:mo><mml:mi mathvariant="italic">K</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[$A\in {\mathrm{\Re }^{(dP\times K)}}$]]></tex-math></alternatives></inline-formula> is the dictionary includes <italic>P</italic> different view dictionaries which each view dictionary can be extracted by separating <italic>d</italic> rows related to each pose, and <inline-formula id="j_info1229_ineq_056"><alternatives>
<mml:math><mml:mi mathvariant="italic">X</mml:mi><mml:mo stretchy="false">∈</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">ℜ</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">K</mml:mi><mml:mo>×</mml:mo><mml:mi mathvariant="italic">I</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[$X\in {\mathrm{\Re }^{(K\times I)}}$]]></tex-math></alternatives></inline-formula> is the sparse representation matrix.</p>
</caption>
<graphic xlink:href="info1229_g004.jpg"/>
</fig>
<p>Suppose that <inline-formula id="j_info1229_ineq_057"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${b_{p}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_info1229_ineq_058"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${b_{1}}$]]></tex-math></alternatives></inline-formula> are two face images of one identity in non-frontal pose <italic>p</italic> and frontal pose 1, respectively, <inline-formula id="j_info1229_ineq_059"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\hat{x}_{p}}$]]></tex-math></alternatives></inline-formula> is the sparse representation of <inline-formula id="j_info1229_ineq_060"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${b_{p}}$]]></tex-math></alternatives></inline-formula> over view dictionary <inline-formula id="j_info1229_ineq_061"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${A_{p}}$]]></tex-math></alternatives></inline-formula>, and <inline-formula id="j_info1229_ineq_062"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\hat{x}_{1}}$]]></tex-math></alternatives></inline-formula> is the sparse representation of <inline-formula id="j_info1229_ineq_063"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${b_{1}}$]]></tex-math></alternatives></inline-formula> over view dictionary <inline-formula id="j_info1229_ineq_064"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${A_{1}}$]]></tex-math></alternatives></inline-formula>. As mentioned in the previous section, view dictionaries <inline-formula id="j_info1229_ineq_065"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${A_{p}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_info1229_ineq_066"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${A_{1}}$]]></tex-math></alternatives></inline-formula> are called desirable if the sparse representations <inline-formula id="j_info1229_ineq_067"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\hat{x}_{p}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_info1229_ineq_068"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\hat{x}_{1}}$]]></tex-math></alternatives></inline-formula> are the same or at least close enough. So, the aim of this subsection is to learn view dictionaries that share similar sparse coefficients for face images of one identity in different poses. This is achieved via a supervised view dictionary learning process. By concatenating the <italic>P</italic> equations in Eq. (<xref rid="j_info1229_eq_006">6</xref>), while omitting the identity parameter <italic>i</italic> for simplicity, we have: 
<disp-formula id="j_info1229_eq_009">
<label>(9)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mtable equalrows="false" equalcolumns="false" columnalign="left"><mml:mtr><mml:mtd class="array"><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:mtable equalrows="false" equalcolumns="false" columnalign="center"><mml:mtr><mml:mtd class="array"><mml:msub><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:mo>⋮</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:msub><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:mo>⋮</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:msub><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mfenced><mml:mo>=</mml:mo><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:mtable equalrows="false" equalcolumns="false" columnalign="center"><mml:mtr><mml:mtd class="array"><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:mo>⋮</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:mo>⋮</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:msub><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mfenced><mml:mi mathvariant="italic">x</mml:mi><mml:mo>+</mml:mo><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:mtable equalrows="false" equalcolumns="false" columnalign="center"><mml:mtr><mml:mtd class="array"><mml:msub><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:mo>⋮</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:msub><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:mo>⋮</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:msub><mml:mrow><mml:mi mathvariant="italic">e</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mfenced><mml:mo stretchy="false">→</mml:mo><mml:mi mathvariant="italic">B</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="italic">A</mml:mi><mml:mi mathvariant="italic">x</mml:mi><mml:mo>+</mml:mo><mml:mi mathvariant="italic">e</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array"><mml:mtext>subject to</mml:mtext><mml:mspace width="2.5pt"/><mml:mo stretchy="false">‖</mml:mo><mml:mi mathvariant="italic">x</mml:mi><mml:msub><mml:mrow><mml:mo stretchy="false">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>⩽</mml:mo><mml:mi mathvariant="italic">C</mml:mi><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ \begin{array}{l}\left(\begin{array}{c}{b_{1}}\\ {} \vdots \\ {} {b_{p}}\\ {} \vdots \\ {} {b_{P}}\end{array}\right)=\left(\begin{array}{c}{A_{1}}\\ {} \vdots \\ {} {A_{p}}\\ {} \vdots \\ {} {A_{P}}\end{array}\right)x+\left(\begin{array}{c}{e_{1}}\\ {} \vdots \\ {} {e_{p}}\\ {} \vdots \\ {} {e_{P}}\end{array}\right)\to B=Ax+e\\ {} \text{subject to}\hspace{2.5pt}\| x{\| _{1}}\leqslant C,\end{array}\]]]></tex-math></alternatives>
</disp-formula> 
which states that the <italic>P</italic> views, when concatenated together, should have the same sparse representation with respect to the concatenated view dictionary. Given the training dataset <inline-formula id="j_info1229_ineq_069"><alternatives>
<mml:math><mml:msubsup><mml:mrow><mml:mo fence="true" stretchy="false">{</mml:mo><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup><mml:mo fence="true" stretchy="false">}</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">P</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">I</mml:mi></mml:mrow></mml:msubsup></mml:math>
<tex-math><![CDATA[${\{{b_{p}^{i}}\}_{p=1,\dots ,P}^{i=1,\dots ,I}}$]]></tex-math></alternatives></inline-formula>, where <italic>i</italic> is the index for identities and <italic>p</italic> is the index for poses, the training set is rearranged by concatenating the <italic>P</italic> views of each identity in <inline-formula id="j_info1229_ineq_070"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mo fence="true" stretchy="false">{</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msup><mml:mo fence="true" stretchy="false">}</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">I</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\{{b^{i}}\}_{i=1,\dots ,I}}$]]></tex-math></alternatives></inline-formula> vector, where <inline-formula id="j_info1229_ineq_071"><alternatives>
<mml:math><mml:msup><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mo fence="true" stretchy="false">[</mml:mo><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup><mml:mo fence="true" stretchy="false">]</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">∈</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">ℜ</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">d</mml:mi><mml:mi mathvariant="italic">p</mml:mi><mml:mo>×</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[${b^{i}}={[{b_{1}^{i}}{b_{2}^{i}}{b_{P}^{i}}]^{T}}\in {\mathrm{\Re }^{(dp\times 1)}}$]]></tex-math></alternatives></inline-formula> and <inline-formula id="j_info1229_ineq_072"><alternatives>
<mml:math><mml:mi mathvariant="italic">B</mml:mi><mml:mo stretchy="false">∈</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">ℜ</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">d</mml:mi><mml:mi mathvariant="italic">P</mml:mi><mml:mo>×</mml:mo><mml:mi mathvariant="italic">I</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[$B\in {\mathrm{\Re }^{(dP\times I)}}$]]></tex-math></alternatives></inline-formula> is the matrix made from concatenating face images of <italic>I</italic> different identities. Now, the view dictionaries can be learned via the following minimization problem: 
<disp-formula id="j_info1229_eq_010">
<label>(10)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mo fence="true" stretchy="false">⟨</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover><mml:mo mathvariant="normal">,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">X</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover><mml:mo fence="true" stretchy="false">⟩</mml:mo><mml:mo>=</mml:mo><mml:mo movablelimits="false">arg</mml:mo><mml:munder><mml:mrow><mml:mo movablelimits="false">min</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">X</mml:mi></mml:mrow></mml:munder>
<mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mstyle displaystyle="true"><mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">I</mml:mi></mml:mrow></mml:munderover><mml:msubsup><mml:mrow><mml:mo fence="true" maxsize="1.19em" minsize="1.19em" stretchy="true">‖</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msup><mml:mo>−</mml:mo><mml:mi mathvariant="italic">A</mml:mi><mml:msup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msup><mml:mo fence="true" maxsize="1.19em" minsize="1.19em" stretchy="true">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi mathvariant="italic">λ</mml:mi><mml:msub><mml:mrow><mml:mo fence="true" maxsize="1.19em" minsize="1.19em" stretchy="true">‖</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msup><mml:mo fence="true" maxsize="1.19em" minsize="1.19em" stretchy="true">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ \langle \hat{A},\hat{X}\rangle =\arg \underset{A,X}{\min }{\sum \limits_{i=1}^{I}}{\big\| {b^{i}}-A{x^{i}}\big\| _{2}^{2}}+\lambda {\big\| {x^{i}}\big\| _{1}},\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_info1229_ineq_073"><alternatives>
<mml:math><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mo fence="true" stretchy="false">[</mml:mo><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow></mml:msubsup><mml:mo mathvariant="normal">,</mml:mo><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow></mml:msubsup><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msubsup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow></mml:msubsup><mml:mo fence="true" stretchy="false">]</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">∈</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">ℜ</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">d</mml:mi><mml:mi mathvariant="italic">P</mml:mi><mml:mo>×</mml:mo><mml:mi mathvariant="italic">K</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[$\hat{A}={[{\hat{A}_{1}^{T}},{\hat{A}_{2}^{T}},\dots ,{\hat{A}_{P}^{T}}]^{T}}\in {\mathrm{\Re }^{(dP\times K)}}$]]></tex-math></alternatives></inline-formula> is the learned dictionary and <inline-formula id="j_info1229_ineq_074"><alternatives>
<mml:math><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">X</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mo fence="true" stretchy="false">[</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo mathvariant="normal">,</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">I</mml:mi></mml:mrow></mml:msup><mml:mo fence="true" stretchy="false">]</mml:mo><mml:mo stretchy="false">∈</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">ℜ</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">K</mml:mi><mml:mo>×</mml:mo><mml:mi mathvariant="italic">I</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[$\hat{X}=[{\hat{x}^{1}},{\hat{x}^{2}},\dots ,{\hat{x}^{I}}]\in {\mathrm{\Re }^{(K\times I)}}$]]></tex-math></alternatives></inline-formula> is the sparse coefficients matrix whose column <italic>i</italic> is the sparse representation vectors of the training samples in <inline-formula id="j_info1229_ineq_075"><alternatives>
<mml:math><mml:msup><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[${b^{i}}$]]></tex-math></alternatives></inline-formula> and <italic>K</italic> is the dictionary size. Eq. (<xref rid="j_info1229_eq_010">10</xref>) aims to jointly find the proper sparse representation coefficients and the dictionary. It describes face images from the ith identity (<inline-formula id="j_info1229_ineq_076"><alternatives>
<mml:math><mml:msup><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[${b^{i}}$]]></tex-math></alternatives></inline-formula>) as the sparsest representation <inline-formula id="j_info1229_ineq_077"><alternatives>
<mml:math><mml:msup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[${x^{i}}$]]></tex-math></alternatives></inline-formula> over dictionary <italic>A</italic>. After <inline-formula id="j_info1229_ineq_078"><alternatives>
<mml:math><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:math>
<tex-math><![CDATA[$\hat{A}$]]></tex-math></alternatives></inline-formula> is learned, the view dictionaries <inline-formula id="j_info1229_ineq_079"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mo fence="true" stretchy="false">{</mml:mo><mml:msup><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msup><mml:mo fence="true" stretchy="false">}</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">P</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\{{\hat{A}^{p}}\}_{p=1,\dots ,P}}$]]></tex-math></alternatives></inline-formula> are obtained by splitting <inline-formula id="j_info1229_ineq_080"><alternatives>
<mml:math><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:math>
<tex-math><![CDATA[$\hat{A}$]]></tex-math></alternatives></inline-formula> into <italic>P</italic> parts, e.g. view dictionary of pose <italic>p</italic> (<inline-formula id="j_info1229_ineq_081"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\hat{A}_{p}}$]]></tex-math></alternatives></inline-formula>) can be achieved by separating <italic>d</italic> rows of <inline-formula id="j_info1229_ineq_082"><alternatives>
<mml:math><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:math>
<tex-math><![CDATA[$\hat{A}$]]></tex-math></alternatives></inline-formula> that are corresponding to the <italic>p</italic>-th pose (rows <inline-formula id="j_info1229_ineq_083"><alternatives>
<mml:math><mml:mi mathvariant="italic">p</mml:mi><mml:mi mathvariant="italic">d</mml:mi><mml:mo>−</mml:mo><mml:mi mathvariant="italic">d</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:math>
<tex-math><![CDATA[$pd-d+1$]]></tex-math></alternatives></inline-formula> to <inline-formula id="j_info1229_ineq_084"><alternatives>
<mml:math><mml:mi mathvariant="italic">d</mml:mi><mml:mi mathvariant="italic">p</mml:mi></mml:math>
<tex-math><![CDATA[$dp$]]></tex-math></alternatives></inline-formula>). Figure <xref rid="j_info1229_fig_004">3</xref> demonstrates these matrices and the dictionary learning process visually.</p>
<p>In order to properly choose the dictionary size <italic>K</italic>, it is worthy to remind some points on dictionary characteristics. Dictionary <inline-formula id="j_info1229_ineq_085"><alternatives>
<mml:math><mml:mi mathvariant="italic">A</mml:mi><mml:mo stretchy="false">∈</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">ℜ</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">d</mml:mi><mml:mi mathvariant="italic">P</mml:mi><mml:mo>×</mml:mo><mml:mi mathvariant="italic">K</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[$A\in {\mathrm{\Re }^{(dP\times K)}}$]]></tex-math></alternatives></inline-formula> is considered undercomplete if <inline-formula id="j_info1229_ineq_086"><alternatives>
<mml:math><mml:mi mathvariant="italic">K</mml:mi><mml:mo mathvariant="normal">&lt;</mml:mo><mml:mi mathvariant="italic">d</mml:mi><mml:mi mathvariant="italic">P</mml:mi></mml:math>
<tex-math><![CDATA[$K<dP$]]></tex-math></alternatives></inline-formula> or overcomplete if <inline-formula id="j_info1229_ineq_087"><alternatives>
<mml:math><mml:mi mathvariant="italic">K</mml:mi><mml:mo mathvariant="normal">&gt;</mml:mo><mml:mi mathvariant="italic">d</mml:mi><mml:mi mathvariant="italic">P</mml:mi></mml:math>
<tex-math><![CDATA[$K>dP$]]></tex-math></alternatives></inline-formula>. When (<inline-formula id="j_info1229_ineq_088"><alternatives>
<mml:math><mml:mi mathvariant="italic">K</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="italic">d</mml:mi><mml:mi mathvariant="italic">P</mml:mi></mml:math>
<tex-math><![CDATA[$K=dP$]]></tex-math></alternatives></inline-formula>), dictionary is considered as a complete dictionary. From a representational point of view, a complete dictionary does not help in any improvement and is neglected. Undercomplete dictionaries are strongly related to dimensionality reduction. Principal component analysis is a famous example of this case where dictionary atoms have to be orthogonal. However, putting orthogonality constraint on dictionary atoms limits the choice of atoms which is the main disadvantage of undercomplete dictionaries. On the other side, overcomplete dictionaries do not have the orthogonality constraint, therefore, they allow for more flexible dictionaries and richer data representation (Elad, <xref ref-type="bibr" rid="j_info1229_ref_009">2010</xref>).</p>
<p>Although all view dictionaries can be learned simultaneously using Eq. (<xref rid="j_info1229_eq_010">10</xref>), the learning process will be impractical for large dictionary sizes or high dimensional data. Consider a situation where each identity has images in 10 poses and each image has about 1000 pixels (a small 30×35 face image), so each column of dictionary has about 10000 entries. If the dictionary size is adjusted to 1000 atoms, the size of dictionary will be 10000×1000. Doing computation on a dictionary of this huge size is impractical because of memory and computational limitations. To dominate this problem, in this paper, pairwise dictionary learning is proposed, where each view dictionary is learned separately. In other words, in order to learn the view dictionary for pose <italic>p</italic>, the training matrix will be <inline-formula id="j_info1229_ineq_089"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">B</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">∈</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">ℜ</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mn>2</mml:mn><mml:mi mathvariant="italic">d</mml:mi><mml:mo>×</mml:mo><mml:mi mathvariant="italic">I</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[${B_{p}}\in {\mathrm{\Re }^{(2d\times I)}}$]]></tex-math></alternatives></inline-formula> where each column of <inline-formula id="j_info1229_ineq_090"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">B</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${B_{p}}$]]></tex-math></alternatives></inline-formula> is <inline-formula id="j_info1229_ineq_091"><alternatives>
<mml:math><mml:msup><mml:mrow><mml:mo fence="true" stretchy="false">[</mml:mo><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup><mml:msubsup><mml:mrow><mml:mi mathvariant="italic">b</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msubsup><mml:mo fence="true" stretchy="false">]</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">∈</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">ℜ</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mn>2</mml:mn><mml:mi mathvariant="italic">d</mml:mi><mml:mo>×</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[${[{b_{p}^{i}}{b_{1}^{i}}]^{T}}\in {\mathrm{\Re }^{(2d\times 1)}}$]]></tex-math></alternatives></inline-formula> (face images of frontal pose and images of non-frontal pose <italic>p</italic>). In this case, the optimization in Eq. (<xref rid="j_info1229_eq_010">10</xref>) results in <inline-formula id="j_info1229_ineq_092"><alternatives>
<mml:math><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover><mml:mo stretchy="false">∈</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">ℜ</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mn>2</mml:mn><mml:mi mathvariant="italic">d</mml:mi><mml:mo>×</mml:mo><mml:mi mathvariant="italic">K</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[$\hat{A}\in {\mathrm{\Re }^{(2d\times K)}}$]]></tex-math></alternatives></inline-formula> where the first <italic>d</italic> rows of <inline-formula id="j_info1229_ineq_093"><alternatives>
<mml:math><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:math>
<tex-math><![CDATA[$\hat{A}$]]></tex-math></alternatives></inline-formula> can be considered as learned view dictionary for pose <italic>p</italic>. It should be noted that view dictionaries are not learned simultaneously in pairwise dictionary learning. However, as training images in frontal pose are the same for learning all view dictionaries, it is expected that describing face images of one identity in different poses by these view dictionaries has similar sparse representation coefficients. Figure <xref rid="j_info1229_fig_005">4</xref> shows the effect of dictionary learning on sparse representation of three face images of an identity in different poses. The first column shows the sparse coefficients when dictionary atoms are simply the training data in different poses while the second column shows sparse coefficients obtained based on learned dictionary. As expected, the representation coefficients of the three images shown in the second column are more similar compared to the ones in the first column. This observation confirms the effect of dictionary learning on unifying the sparse representation coefficients of face images of one identity in different poses. Also, the figure depicts the increase in sparsity of coefficients after dictionary learning, which is another aim of dictionary learning. Getting back to dictionary learning process in Eq. (<xref rid="j_info1229_eq_010">10</xref>), several dictionary learning methods have been proposed since now that can be divided into two groups: 1) unsupervised dictionary learning methods such as MOD (Engan <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_010">1999</xref>) and K-SVD (Aharon <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_001">2006</xref>) and 2) supervised dictionary learning methods such as SDL (Mairal <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_017">2009</xref>) and LCKSVD (Jiang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_014">2013</xref>). The K-SVD method is introduced to efficiently learn an overcomplete dictionary and has been successfully applied to image restoration and image compression. K-SVD focuses on the representational power of the learned dictionary, but does not consider the discrimination capability of it. LCKSVD (Jiang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_014">2013</xref>) is a supervised extension of K-SVD that uses supervised information (labels) of training samples to learn a compact and discriminative dictionary. As LCKSVD has proved itself as a successful supervised dictionary learning method, it has been used here for learning view dictionaries. The objective function defined by LCKSVD is as follows: 
<disp-formula id="j_info1229_eq_011">
<label>(11)</label><alternatives>
<mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mo fence="true" stretchy="false">⟨</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover><mml:mo mathvariant="normal">,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">X</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover><mml:mo mathvariant="normal">,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">T</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover><mml:mo fence="true" stretchy="false">⟩</mml:mo><mml:mo>=</mml:mo><mml:mo movablelimits="false">arg</mml:mo><mml:munder><mml:mrow><mml:mo movablelimits="false">min</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">A</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">X</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">T</mml:mi></mml:mrow></mml:munder><mml:mo stretchy="false">‖</mml:mo><mml:mi mathvariant="italic">B</mml:mi><mml:mo>−</mml:mo><mml:mi mathvariant="italic">A</mml:mi><mml:mi mathvariant="italic">X</mml:mi><mml:msubsup><mml:mrow><mml:mo stretchy="false">‖</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">F</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi mathvariant="italic">α</mml:mi><mml:mo stretchy="false">‖</mml:mo><mml:mi mathvariant="italic">Q</mml:mi><mml:mo>−</mml:mo><mml:mi mathvariant="italic">T</mml:mi><mml:mi mathvariant="italic">X</mml:mi><mml:msubsup><mml:mrow><mml:mo stretchy="false">‖</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">F</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mspace width="1em"/><mml:mtext>s.t.</mml:mtext><mml:mspace width="2.5pt"/><mml:mo>∀</mml:mo><mml:mi mathvariant="italic">i</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mspace width="2.5pt"/><mml:mo stretchy="false">‖</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">i</mml:mi></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mo stretchy="false">‖</mml:mo></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>⩽</mml:mo><mml:mi mathvariant="italic">C</mml:mi><mml:mo mathvariant="normal">,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<tex-math><![CDATA[\[ \langle \hat{A},\hat{X},\hat{T}\rangle =\arg \underset{A,X,T}{\min }\| B-AX{\| _{F}^{2}}+\alpha \| Q-TX{\| _{F}^{2}}\hspace{1em}\text{s.t.}\hspace{2.5pt}\forall i,\hspace{2.5pt}\| {x^{i}}{\| _{0}}\leqslant C,\]]]></tex-math></alternatives>
</disp-formula> 
where <inline-formula id="j_info1229_ineq_094"><alternatives>
<mml:math><mml:mo stretchy="false">‖</mml:mo><mml:mo>.</mml:mo><mml:msub><mml:mrow><mml:mo stretchy="false">‖</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">F</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[$\| .{\| _{F}}$]]></tex-math></alternatives></inline-formula> is the Frobenius norm and <italic>B</italic>, <italic>A</italic> and <italic>X</italic> are training data, dictionary and sparse coefficients matrices, respectively. <inline-formula id="j_info1229_ineq_095"><alternatives>
<mml:math><mml:mi mathvariant="italic">Q</mml:mi><mml:mo>=</mml:mo><mml:mo fence="true" stretchy="false">[</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="italic">q</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo mathvariant="normal">,</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="italic">q</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo mathvariant="normal">,</mml:mo><mml:mo>…</mml:mo><mml:mo mathvariant="normal">,</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="italic">q</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">I</mml:mi></mml:mrow></mml:msup><mml:mo fence="true" stretchy="false">]</mml:mo><mml:mo stretchy="false">∈</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="normal">ℜ</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">K</mml:mi><mml:mo>×</mml:mo><mml:mi mathvariant="italic">I</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math>
<tex-math><![CDATA[$Q=[{q^{1}},{q^{2}},\dots ,{q^{I}}]\in {\mathrm{\Re }^{(K\times I)}}$]]></tex-math></alternatives></inline-formula> is the discriminative sparse code of training samples in <italic>B</italic> and is initialized based on the labels of training samples and desired labels of dictionary atoms. For example, if <italic>i</italic>-th atom of dictionary has the same label as <italic>j</italic>-th training sample, then <inline-formula id="j_info1229_ineq_096"><alternatives>
<mml:math><mml:mi mathvariant="italic">Q</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">i</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">j</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math>
<tex-math><![CDATA[$Q(i,j)=1$]]></tex-math></alternatives></inline-formula>, else <inline-formula id="j_info1229_ineq_097"><alternatives>
<mml:math><mml:mi mathvariant="italic">Q</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">i</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">j</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:math>
<tex-math><![CDATA[$Q(i,j)=0$]]></tex-math></alternatives></inline-formula>. <italic>T</italic> is a linear transformation matrix and the term <inline-formula id="j_info1229_ineq_098"><alternatives>
<mml:math><mml:mo stretchy="false">‖</mml:mo><mml:mi mathvariant="italic">Q</mml:mi><mml:mo>−</mml:mo><mml:mi mathvariant="italic">T</mml:mi><mml:mi mathvariant="italic">X</mml:mi><mml:msubsup><mml:mrow><mml:mo stretchy="false">‖</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="italic">F</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math>
<tex-math><![CDATA[$\| Q-TX{\| _{F}^{2}}$]]></tex-math></alternatives></inline-formula> enforces the sparse coefficients <italic>X</italic> to approximate the discriminative sparse codes <italic>Q</italic>. This term enforces the samples from the same class to have very similar sparse representations. The first and second terms of Eq. (<xref rid="j_info1229_eq_011">11</xref>) are the reconstruction error and the discrimination power, respectively, where <italic>α</italic> controls the contribution between these two terms. The implementation of LCKSVD is available by the LCKSVD authors and is used in this paper for solving Eq. (<xref rid="j_info1229_eq_011">11</xref>). For more details on LCKSVD, we refer the interested reader to Jiang <italic>et al.</italic> (<xref ref-type="bibr" rid="j_info1229_ref_014">2013</xref>).</p>
<fig id="j_info1229_fig_005">
<label>Fig. 4</label>
<caption>
<p>Effect of dictionary learning on sparse representations of 3 different pose images of one identity. The first and second columns show the sparse representation of face images using training samples and the learned dictionary, respectively. As expected, representation coefficients in second column are more similar in different poses, while coefficients are sparser in each pose.</p>
</caption>
<graphic xlink:href="info1229_g005.jpg"/>
</fig>
<p>After obtaining view dictionaries, virtual frontal view generation can be done by first estimating the pose <inline-formula id="j_info1229_ineq_099"><alternatives>
<mml:math><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:math>
<tex-math><![CDATA[$\hat{p}$]]></tex-math></alternatives></inline-formula> of the input face image <italic>y</italic> by Algorithm <xref rid="j_info1229_fig_002">1</xref>, then finding <inline-formula id="j_info1229_ineq_100"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\hat{x}_{\hat{p}}}$]]></tex-math></alternatives></inline-formula> as the sparse representation of <italic>y</italic> over the learned view dictionary of the estimated pose. Finally, the virtual frontal view of <italic>y</italic> is generated by multiplying the sparse representation <inline-formula id="j_info1229_ineq_101"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">x</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\hat{x}_{\hat{p}}}$]]></tex-math></alternatives></inline-formula> to the learned view dictionary of frontal pose <inline-formula id="j_info1229_ineq_102"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\hat{A}_{1}}$]]></tex-math></alternatives></inline-formula>. The algorithm of virtual view generation is summarized in Algorithm <xref rid="j_info1229_fig_006">2</xref> and an example of virtual view generation is shown in Fig. <xref rid="j_info1229_fig_007">5</xref>.</p>
<fig id="j_info1229_fig_006">
<label>Algorithm 2</label>
<caption>
<p>Virtual View Generation</p>
</caption>
<graphic xlink:href="info1229_g006.jpg"/>
</fig>
<fig id="j_info1229_fig_007">
<label>Fig. 5</label>
<caption>
<p>Virtual frontal view generation. (a) non-frontal input face image <inline-formula id="j_info1229_ineq_103"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">y</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${y_{p}}$]]></tex-math></alternatives></inline-formula>, (b) learned view dictionary for pose <italic>p</italic> (<inline-formula id="j_info1229_ineq_104"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\hat{A}_{p}}$]]></tex-math></alternatives></inline-formula>), (c) sparse representation of <inline-formula id="j_info1229_ineq_105"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">y</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${y_{p}}$]]></tex-math></alternatives></inline-formula> over <inline-formula id="j_info1229_ineq_106"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">A</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi mathvariant="italic">p</mml:mi></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\hat{A}_{p}}$]]></tex-math></alternatives></inline-formula>, (d) generated virtual frontal view <inline-formula id="j_info1229_ineq_107"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="italic">y</mml:mi></mml:mrow><mml:mo stretchy="false">ˆ</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${\hat{y}_{1}}$]]></tex-math></alternatives></inline-formula>, (e) actual frontal view of input face image <inline-formula id="j_info1229_ineq_108"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">y</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${y_{1}}$]]></tex-math></alternatives></inline-formula>.</p>
</caption>
<graphic xlink:href="info1229_g007.jpg"/>
</fig>
</sec>
<sec id="j_info1229_s_008">
<label>4.4</label>
<title>Multi-Pose Face Recognition</title>
<p>The previous subsection explained the proposed virtual frontal view generation algorithm which was based on supervised dictionary learning. This subsection completes the previous steps by recognizing the generated frontal view. As mentioned earlier, SRC method has shown superior performance on frontal view face recognition, therefore, in this paper, SRC is used as the classifier in the recognition step. The overall view of the 3 steps of the proposed method is shown in Algorithm <xref rid="j_info1229_fig_008">3</xref>.</p>
<fig id="j_info1229_fig_008">
<label>Algorithm 3</label>
<caption>
<p>Multi-Pose Face Recognition with Supervised Dictionary Learning (MPSDL)</p>
</caption>
<graphic xlink:href="info1229_g008.jpg"/>
</fig>
</sec>
</sec>
<sec id="j_info1229_s_009">
<label>5</label>
<title>Experimental Results</title>
<p>The proposed method is evaluated on CMU-PIE (Sim <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_027">2002</xref>) and FERET (Phillips <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_024">2000</xref>) face databases and three experiments are carried out to show the effectiveness of the proposed method. Section <xref rid="j_info1229_s_010">5.1</xref> explains the databases and how to prepare the face images for experiments. In Section <xref rid="j_info1229_s_011">5.2</xref>, performance of the proposed pose estimation method is measured in both small and large pose variations. Section <xref rid="j_info1229_s_012">5.3</xref> considers the virtual view generation step and compares the generated frontal faces with similar view generation methods such as GLR and SRR. Finally, in Section <xref rid="j_info1229_s_013">5.4</xref>, the accuracy of the proposed multi-pose face recognition is evaluated based on virtual frontal views.</p>
<fig id="j_info1229_fig_009">
<label>Fig. 6</label>
<caption>
<p>Different poses of a subject in CMU-PIE face database (Sim <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_027">2002</xref>).</p>
</caption>
<graphic xlink:href="info1229_g009.jpg"/>
</fig>
<sec id="j_info1229_s_010">
<label>5.1</label>
<title>Databases for Evaluations</title>
<p>CMU-PIE and FERET databases contain a large number of face images in different illumination, viewpoints and expressions. CMU-PIE database has 68 identities who were imaged under 13 different poses, 43 different illumination conditions and 4 different expressions. Figure <xref rid="j_info1229_fig_009">6</xref> shows the variation of poses in the CMU-PIE database, images are within −90° to +90° with ±22.5° interval in yaw and about ±20° in pitch. FERET database contains more than 1000 identities in different conditions. From them, 200 subjects have all 9 different pose variations within ±60° in yaw (0° in pitch). Specifically, the poses are ±60°, ±40°, ±25°, ±15° and frontal pose 0°. Figure <xref rid="j_info1229_fig_010">7</xref> shows the variation of poses in the FERET database. In our experiments, 5 poses of CMU-PIE database are used in 90°, 67.5°, 45°, 22.5°, 0° with 16 different illumination conditions and neutral expression. From FERET database, face images of poses 60°, 40°, 15° and 0° are selected in neutral expression and illumination condition. For pre-processing, all images are cropped manually (such that eyes and mouth level are fixed), resized to <inline-formula id="j_info1229_ineq_109"><alternatives>
<mml:math><mml:mn>28</mml:mn><mml:mo>×</mml:mo><mml:mn>28</mml:mn></mml:math>
<tex-math><![CDATA[$28\times 28$]]></tex-math></alternatives></inline-formula> pixels and histogram equalization is performed on them.</p>
<fig id="j_info1229_fig_010">
<label>Fig. 7</label>
<caption>
<p>Different images from FERET database (Phillips <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_024">2000</xref>).</p>
</caption>
<graphic xlink:href="info1229_g010.jpg"/>
</fig>
</sec>
<sec id="j_info1229_s_011">
<label>5.2</label>
<title>Pose Estimation</title>
<p>As mentioned in Section <xref rid="j_info1229_s_004">4</xref>, it is necessary to know the pose of a non-frontal face image in order to generate its virtual frontal view, because it is necessary to select the proper view dictionary. Table <xref rid="j_info1229_tab_001">1</xref> and Table <xref rid="j_info1229_tab_002">2</xref> show the accuracy of pose estimation algorithm for 5 poses (22.5° pose interval) and 3 poses (45° pose interval) of CMUPIE database. In each pose, 10 face images from 68 identities are randomly selected to construct the view dictionary. The remaining 58 images per pose are used for evaluation. The accuracies reported in these two tables are obtained by averaging several runs.</p>
<table-wrap id="j_info1229_tab_001">
<label>Table 1</label>
<caption>
<p>Average accuracy of pose estimation for 5 poses (22.5° pose interval) on CMU-PIE.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Pose</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">0° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">22.5° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">45° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">67.5° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">90° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Average</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Accuracy (%)</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">96.5</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">90.6</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">89.3</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">91.1</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">92.7</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">92.0</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="j_info1229_tab_002">
<label>Table 2</label>
<caption>
<p>Average accuracy of pose estimation for 3 poses with larger pose interval (45° pose interval) on CMU-PIE.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Pose</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">0° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">45° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">90° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Average</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Accuracy (%)</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">99.12</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">98.25</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">98.77</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">98.6</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Both Tables <xref rid="j_info1229_tab_001">1</xref> and <xref rid="j_info1229_tab_002">2</xref> represent high accuracy (above 90%) in pose estimation, which is acceptable for many applications. The comparison of the results from these two tables implies that the accuracy of pose estimation increases when the difference between adjacent poses (pose interval) increases. This is because the dictionary atoms of different poses are more distinct and adjacent poses are more discriminable in large pose intervals. Although there might be methods in pose estimation with higher accuracies in small pose intervals, the proposed pose estimation method is simple, fast and accurate enough for the aim of virtual view generation and face recognition. Also, it is based on sparse representation which is the base for the other two steps of the proposed method, too.</p>
</sec>
<sec id="j_info1229_s_012">
<label>5.3</label>
<title>Virtual Frontal View Generation</title>
<p>The following experiment assesses the frontal view generation step of the proposed algorithm. Figure <xref rid="j_info1229_fig_011">8</xref> shows the virtual frontal views generated from different methods for 22.5°, 45°, 90° poses. As the figure shows, compared to the generated frontal faces by LSRR or GLR methods, generated faces by the proposed method (MPSDL) are visually more similar to the ground-truth images. In fact, generated faces by GLR do not contain much details and generated faces for different identities are similar with artifacts for pose 90°. SRR generated faces have less artifacts and are visually acceptable, but are over-smoothed and details are lost. The LSRR method is similar to SRR, but it performs locally on small patches of an image, so its generated faces have less artifact but are locally smoothed. It requires many overlapped patches to generate detailed images. In contrast, the proposed MPSDL method can generate visually acceptable frontal faces which are similar to ground-truth images and have enough details to be used in recognizing identities.</p>
<fig id="j_info1229_fig_011">
<label>Fig. 8</label>
<caption>
<p>Virtual view generation of different methods for different poses.</p>
</caption>
<graphic xlink:href="info1229_g011.jpg"/>
</fig>
<p>For evaluating different view generation methods, a 10-fold cross validation strategy is used on CMU-PIE database. In each fold, 61 identities in 16 illuminations are selected for dictionary learning and the remaining 7 identities in 16 different illuminations are used for evaluation. Table <xref rid="j_info1229_tab_003">3</xref> shows the Mean Square Error (MSE) between generated frontal views and ground-truths in three poses. The reported results in this table are based on 16 different illuminations of each pose. Chosen values for different parameters are mentioned in caption of the table where <inline-formula id="j_info1229_ineq_110"><alternatives>
<mml:math><mml:mi mathvariant="italic">σ</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">B</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$\sigma (B)$]]></tex-math></alternatives></inline-formula> is the standard deviation of matrix <italic>B</italic>. As can be seen from Table <xref rid="j_info1229_tab_003">3</xref>, for large pose variations (45° and 90°), the MSE between generated virtual frontal views and the ground-truths is smaller in the proposed method compared to GLR and SRR methods. The reason is that, compared to GLR, the proposed method has no assumption on linear transformation between different poses which is not correct for large pose angles. Also, compared to SRR, the proposed method is based on supervised dictionary learning which uses the discrimination in data and is expected to generate faces with more details. Since the proposed method is a holistic approach, LSRR and LLR methods are not included in this comparison, because of their locality manner. Obtained results in Table <xref rid="j_info1229_tab_003">3</xref> clearly show that the proposed method can generate accurate virtual views even in large pose variations which can be considered as a desired property of the proposed method. As expected, MSE of different methods is more similar in small pose angles where transformation between poses is nearly linear.</p>
<p>It should be noted that the proposed method aims to generate discriminative virtual frontal views and it does not concentrate on visually good generated faces. However, the results in Table <xref rid="j_info1229_tab_003">3</xref> depict that based on MSE, the proposed method can generate more accurate virtual frontal views in large pose angles, compared to other methods.</p>
<table-wrap id="j_info1229_tab_003">
<label>Table 3</label>
<caption>
<p>MSE of virtual frontal view generation of different methods (<inline-formula id="j_info1229_ineq_111"><alternatives>
<mml:math><mml:mi mathvariant="italic">K</mml:mi><mml:mo>=</mml:mo><mml:mn>200</mml:mn></mml:math>
<tex-math><![CDATA[$K=200$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_info1229_ineq_112"><alternatives>
<mml:math><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math>
<tex-math><![CDATA[$\alpha =1$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_info1229_ineq_113"><alternatives>
<mml:math><mml:mi mathvariant="italic">λ</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="italic">σ</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">B</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$\lambda =\sigma (B)$]]></tex-math></alternatives></inline-formula>).</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Pose</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">22.5° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">45° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">90° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Average</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">GLR (Chai <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_006">2007</xref>)</td>
<td style="vertical-align: top; text-align: left">0.3228</td>
<td style="vertical-align: top; text-align: left">0.4725</td>
<td style="vertical-align: top; text-align: left">0.4943</td>
<td style="vertical-align: top; text-align: left">0.4298</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">SRR (Zhang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_031">2013</xref>)</td>
<td style="vertical-align: top; text-align: left"><bold>0.3125</bold></td>
<td style="vertical-align: top; text-align: left">0.3890</td>
<td style="vertical-align: top; text-align: left">0.4427</td>
<td style="vertical-align: top; text-align: left">0.3814</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">MPSDL (proposed)</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">0.3243</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>0.3858</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>0.4004</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>0.3701</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="j_info1229_s_013">
<label>5.4</label>
<title>Multi-Pose Face Recognition</title>
<p>In this section, we evaluate the performance of the proposed multi-pose face recognition method. Virtual view generation can be considered as a preprocessing that is independent of the classification step. Therefore, face recognition based on generated frontal views can be done with various kind of classifiers. Since the SRC method has been successful in frontal face recognition task (Wright <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_029">2009</xref>), in this paper, SRC is used as a classifier for the generated virtual views. The performance of the proposed method is compared to GLR, SRR and LSRR which are all based on virtual view generation. 10-fold cross validation is used to evaluate the performance of each method on CMU-PIE and FERET databases. For CMU-PIE database, each fold contains 7 identities in 5 poses and 16 different illumination conditions. For FERET database, each fold contains 20 identities in 4 poses and neutral illumination and expression conditions. Using this setting, recognition accuracy of different methods on CMU-PIE and FERET databases are shown in Table <xref rid="j_info1229_tab_004">4</xref> and Table <xref rid="j_info1229_tab_005">5</xref>, respectively. In these tables, it is assumed that the pose angle of test images is known and the proposed pose estimation in Algorithm <xref rid="j_info1229_fig_002">1</xref> is not used. In both tables, dictionary size parameter is adjusted to 200 for SRR, LSRR and MPSDL methods. Also, for LSRR, images have been partitioned into 16 patches.</p>
<table-wrap id="j_info1229_tab_004">
<label>Table 4</label>
<caption>
<p>Recognition accuracy (%) of different methods (with known pose angle) on CMU-PIE database for 4 poses (<inline-formula id="j_info1229_ineq_114"><alternatives>
<mml:math><mml:mi mathvariant="italic">K</mml:mi><mml:mo>=</mml:mo><mml:mn>200</mml:mn></mml:math>
<tex-math><![CDATA[$K=200$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_info1229_ineq_115"><alternatives>
<mml:math><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math>
<tex-math><![CDATA[$\alpha =1$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_info1229_ineq_116"><alternatives>
<mml:math><mml:mi mathvariant="italic">λ</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="italic">σ</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">B</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$\lambda =\sigma (B)$]]></tex-math></alternatives></inline-formula>).</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Pose</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">22.5° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">45° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">67.5° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">90° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Average</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Raw images</td>
<td style="vertical-align: top; text-align: left">70.3</td>
<td style="vertical-align: top; text-align: left">38.7</td>
<td style="vertical-align: top; text-align: left">21.0</td>
<td style="vertical-align: top; text-align: left">15.5</td>
<td style="vertical-align: top; text-align: left">36.3</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">GLR (Chai <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_006">2007</xref>)</td>
<td style="vertical-align: top; text-align: left">87.9</td>
<td style="vertical-align: top; text-align: left">42.9</td>
<td style="vertical-align: top; text-align: left">35.7</td>
<td style="vertical-align: top; text-align: left">32.8</td>
<td style="vertical-align: top; text-align: left">49.8</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">SRR (Zhang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_031">2013</xref>)</td>
<td style="vertical-align: top; text-align: left">90.1</td>
<td style="vertical-align: top; text-align: left">65.6</td>
<td style="vertical-align: top; text-align: left">43.4</td>
<td style="vertical-align: top; text-align: left">40.7</td>
<td style="vertical-align: top; text-align: left">59.9</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">LSRR (Zhang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_031">2013</xref>)</td>
<td style="vertical-align: top; text-align: left"><bold>91.2</bold></td>
<td style="vertical-align: top; text-align: left">74.9</td>
<td style="vertical-align: top; text-align: left">49.5</td>
<td style="vertical-align: top; text-align: left">48.9</td>
<td style="vertical-align: top; text-align: left">66.1</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">MPSDL (Proposed)</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">90.9</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>76.2</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>55.8</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>52.1</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>68.7</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="j_info1229_tab_005">
<label>Table 5</label>
<caption>
<p>Recognition accuracy (%) of different methods (with known pose angle) on FERET database for 3 poses (<inline-formula id="j_info1229_ineq_117"><alternatives>
<mml:math><mml:mi mathvariant="italic">K</mml:mi><mml:mo>=</mml:mo><mml:mn>200</mml:mn></mml:math>
<tex-math><![CDATA[$K=200$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_info1229_ineq_118"><alternatives>
<mml:math><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math>
<tex-math><![CDATA[$\alpha =1$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_info1229_ineq_119"><alternatives>
<mml:math><mml:mi mathvariant="italic">λ</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="italic">σ</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">B</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$\lambda =\sigma (B)$]]></tex-math></alternatives></inline-formula>).</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Pose</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">15° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">45° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">60° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Average</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Raw Images</td>
<td style="vertical-align: top; text-align: left">89.5</td>
<td style="vertical-align: top; text-align: left">27.1</td>
<td style="vertical-align: top; text-align: left">35.2</td>
<td style="vertical-align: top; text-align: left">50.6</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">GLR (Chai <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_006">2007</xref>)</td>
<td style="vertical-align: top; text-align: left">86.7</td>
<td style="vertical-align: top; text-align: left">33.7</td>
<td style="vertical-align: top; text-align: left">31.5</td>
<td style="vertical-align: top; text-align: left">50.7</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">SRR (Zhang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_031">2013</xref>)</td>
<td style="vertical-align: top; text-align: left">91.9</td>
<td style="vertical-align: top; text-align: left">52.6</td>
<td style="vertical-align: top; text-align: left">43.0</td>
<td style="vertical-align: top; text-align: left">62.5</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">LSRR (Zhang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_031">2013</xref>)</td>
<td style="vertical-align: top; text-align: left"><bold>94.6</bold></td>
<td style="vertical-align: top; text-align: left">57.5</td>
<td style="vertical-align: top; text-align: left">45.1</td>
<td style="vertical-align: top; text-align: left">65.7</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">MPSDL (proposed)</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">93.7</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>62.2</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>50.7</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>68.9</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Discussing the results reported in Tables <xref rid="j_info1229_tab_004">4</xref> and <xref rid="j_info1229_tab_005">5</xref>, the recognition accuracy using raw images without virtual view generation will decrease rapidly when the pose angle increases. The GLR method performs better compared to raw images and improves the accuracy, however, its performance is not satisfying for large pose angles because of using linear assumption in virtual view generation. The SRR generally performs better than GLR for large pose angles. Using local patches, LSRR improves the accuracy of SRR, but it seems that both methods suffer from unsupervised dictionary learning. The proposed MPSDL method outperforms other methods and is more robust in large pose variations.</p>
<p>In order to investigate the impact of automatic pose estimation in step 1 on recognition accuracy, Table <xref rid="j_info1229_tab_006">6</xref> shows the results of the proposed multi-pose face recognition on FERET database while using Algorithm <xref rid="j_info1229_fig_002">1</xref> for pose estimation. As expected, the results in this table are very close to those of Table <xref rid="j_info1229_tab_005">5</xref> which confirms the perfect performance of Algorithm <xref rid="j_info1229_fig_002">1</xref> for pose estimation. Therefore, the effect of pose estimation error on recognition accuracy can be ignored.</p>
<p>The effect of dictionary size in recognition accuracy has been investigated through experiments done with different dictionary sizes. It should be noted that for raw images and the GLR method that are not based on dictionary learning, dictionary size points to the number of samples in a training set. Table <xref rid="j_info1229_tab_007">7</xref> and Fig. <xref rid="j_info1229_fig_012">9</xref> show the recognition accuracy of different methods under different dictionary sizes for 22.5° pose of CMU-PIE database. For small dictionary sizes, LSRR method outperforms others because of using small patches.</p>
<table-wrap id="j_info1229_tab_006">
<label>Table 6</label>
<caption>
<p>Recognition accuracy (%) of different methods with pose estimation phase on FERET database for 3 poses (<inline-formula id="j_info1229_ineq_120"><alternatives>
<mml:math><mml:mi mathvariant="italic">K</mml:mi><mml:mo>=</mml:mo><mml:mn>200</mml:mn></mml:math>
<tex-math><![CDATA[$K=200$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_info1229_ineq_121"><alternatives>
<mml:math><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math>
<tex-math><![CDATA[$\alpha =1$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_info1229_ineq_122"><alternatives>
<mml:math><mml:mi mathvariant="italic">λ</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="italic">σ</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">B</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$\lambda =\sigma (B)$]]></tex-math></alternatives></inline-formula>).</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Pose</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">15° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">45° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">60° </td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Average</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Raw Images</td>
<td style="vertical-align: top; text-align: left">88.9</td>
<td style="vertical-align: top; text-align: left">24.8</td>
<td style="vertical-align: top; text-align: left">35.1</td>
<td style="vertical-align: top; text-align: left">49.6</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">GLR (Chai <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_006">2007</xref>)</td>
<td style="vertical-align: top; text-align: left">89.0</td>
<td style="vertical-align: top; text-align: left">31.6</td>
<td style="vertical-align: top; text-align: left">30.9</td>
<td style="vertical-align: top; text-align: left">50.5</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">SRR (Zhang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_031">2013</xref>)</td>
<td style="vertical-align: top; text-align: left">91.6</td>
<td style="vertical-align: top; text-align: left">52.5</td>
<td style="vertical-align: top; text-align: left">43.6</td>
<td style="vertical-align: top; text-align: left">62.5</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">LSRR (Zhang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_031">2013</xref>)</td>
<td style="vertical-align: top; text-align: left"><bold>94.1</bold></td>
<td style="vertical-align: top; text-align: left">57.2</td>
<td style="vertical-align: top; text-align: left">42.4</td>
<td style="vertical-align: top; text-align: left">64.5</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">MPSDL (proposed)</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">93.4</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>60.4</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>49.2</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>67.7</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="j_info1229_tab_007">
<label>Table 7</label>
<caption>
<p>Recognition accuracy (%) for different dictionary sizes (<inline-formula id="j_info1229_ineq_123"><alternatives>
<mml:math><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math>
<tex-math><![CDATA[$\alpha =1$]]></tex-math></alternatives></inline-formula>, <inline-formula id="j_info1229_ineq_124"><alternatives>
<mml:math><mml:mi mathvariant="italic">λ</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="italic">σ</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">B</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math>
<tex-math><![CDATA[$\lambda =\sigma (B)$]]></tex-math></alternatives></inline-formula>) for 22.5° pose in CMU-PIE.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Dictoinary size</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">50</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">100</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">200</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">300</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">500</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Raw images</td>
<td style="vertical-align: top; text-align: left">70.2</td>
<td style="vertical-align: top; text-align: left">72.3</td>
<td style="vertical-align: top; text-align: left">70.0</td>
<td style="vertical-align: top; text-align: left">72.5</td>
<td style="vertical-align: top; text-align: left">72.2</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">GLR (Chai <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_006">2007</xref>)</td>
<td style="vertical-align: top; text-align: left">59.1</td>
<td style="vertical-align: top; text-align: left">68.9</td>
<td style="vertical-align: top; text-align: left">86.9</td>
<td style="vertical-align: top; text-align: left">53.7</td>
<td style="vertical-align: top; text-align: left">44.1</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">SRR (Zhang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_031">2013</xref>)</td>
<td style="vertical-align: top; text-align: left">76.2</td>
<td style="vertical-align: top; text-align: left">77.4</td>
<td style="vertical-align: top; text-align: left">90.1</td>
<td style="vertical-align: top; text-align: left">79.6</td>
<td style="vertical-align: top; text-align: left">88.7</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">LSRR (Zhang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_031">2013</xref>)</td>
<td style="vertical-align: top; text-align: left"><bold>80.0</bold></td>
<td style="vertical-align: top; text-align: left"><bold>80.9</bold></td>
<td style="vertical-align: top; text-align: left"><bold>91.2</bold></td>
<td style="vertical-align: top; text-align: left">82.6</td>
<td style="vertical-align: top; text-align: left">93.1</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">MPSDL (proposed)</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">75.8</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">78.7</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">90.9</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>92.1</bold></td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"><bold>94.5</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="j_info1229_fig_012">
<label>Fig. 9</label>
<caption>
<p>Comparison of the face recognition accuracy with different dictionary sizes. MPSDL overtakes other methods in larger dictionary sizes.</p>
</caption>
<graphic xlink:href="info1229_g012.jpg"/>
</fig>
<p>Increasing the dictionary size, face recognition accuracy increases in all methods of Table <xref rid="j_info1229_tab_007">7</xref> (except for GLR), but as can be seen, performance of MPSDL increases more rapidly. For large dictionary sizes, accuracy of MPSDL overtakes other methods which is because of the use of supervised dictionary learning. Compared to other methods, GLR method shows weak performance for large dictionary sizes, which is because of overfitting of regression based methods in large number of training samples. However, it should be noted that the use of huge databases and big dictionaries might have some computational problems due to the use of huge memory size and very long processing time. One might cope with these problems using high-performance hardware devices and a quantization or clustering method prior to dictionary learning.</p>
<p>Although the proposed method has better performance compared to similar methods, it cannot overtake the state-of-the-art methods which are not based on sparse representation and 2D virtual view generation. For instance, Table <xref rid="j_info1229_tab_008">8</xref> shows the comparison between the proposed method and some well-known and state-of-the-art methods in multi-pose face recognition on CMU-PIE database. As can be inferred from this table, 3D reconstruction based methods such as 3DMM (Blanz and Vetter, <xref ref-type="bibr" rid="j_info1229_ref_005">2003</xref>) and Probabilistic Geometry Assisted FR (Liu and Chen, <xref ref-type="bibr" rid="j_info1229_ref_016">2005</xref>) can recognize face images with higher accuracy, because they utilize the 3D model of face by aligning 2D images with either a general or an identity specific 3D model which is computationally expensive. TFA (Prince <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_025">2008</xref>) is a 2D method which benefits from the latent variable view point for face recognition, but it requires two face images (one frontal and one non-frontal) in its gallery for recognition while the proposed method only requires one frontal face image. Therefore, it can be concluded that the recognition accuracy of the proposed method for multi-pose face recognition is persuasive from a computational point of view and when there is only one frontal image of each person in hand. However, based on the idea and obtained results in Zhang <italic>et al.</italic> (<xref ref-type="bibr" rid="j_info1229_ref_034">2015</xref>), Zhao <italic>et al.</italic> (<xref ref-type="bibr" rid="j_info1229_ref_035">2016</xref>), one can extend the proposed method by using mixed norm <inline-formula id="j_info1229_ineq_125"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo><mml:mi mathvariant="italic">p</mml:mi><mml:mo mathvariant="normal">,</mml:mo><mml:mi mathvariant="italic">q</mml:mi><mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${l_{(p,q)}}$]]></tex-math></alternatives></inline-formula> instead of <inline-formula id="j_info1229_ineq_126"><alternatives>
<mml:math><mml:msub><mml:mrow><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math>
<tex-math><![CDATA[${l_{1}}$]]></tex-math></alternatives></inline-formula> norm and hopefully decrease the frontal face generation error while increasing the face recogntion accuracy.</p>
<table-wrap id="j_info1229_tab_008">
<label>Table 8</label>
<caption>
<p>Comparison of proposed method and some popular methods on CMU-PIE.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Method</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Training images</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Accuracy (%)</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">3DMM (Blanz and Vetter, <xref ref-type="bibr" rid="j_info1229_ref_005">2003</xref>)</td>
<td style="vertical-align: top; text-align: left">0°, 16°, 60°. 22 illuminations</td>
<td style="vertical-align: top; text-align: left">92.1</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">PGA FR (Liu and Chen, <xref ref-type="bibr" rid="j_info1229_ref_016">2005</xref>)</td>
<td style="vertical-align: top; text-align: left">0°, 15°, 30°, 45°, 60° </td>
<td style="vertical-align: top; text-align: left">86.0</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">TFA (Prince <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_025">2008</xref>)</td>
<td style="vertical-align: top; text-align: left">0°, 16°, 60° </td>
<td style="vertical-align: top; text-align: left">91.0</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">LBP (Ahonen <italic>et al.</italic>, <xref ref-type="bibr" rid="j_info1229_ref_002">2006</xref>)</td>
<td style="vertical-align: top; text-align: left">0°, 30°, 60° </td>
<td style="vertical-align: top; text-align: left">74.2</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">MPSDL (Proposed)</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">0°, 22°, 45°, 60°. 16 illuminations.</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">81.7</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="j_info1229_s_014">
<label>6</label>
<title>Conclusions</title>
<p>In this paper, we proposed a multi pose face recognition method based on sparse representation and supervised dictionary learning. The proposed method generates virtual frontal views from non-frontal views based on the assumption that the images of an identity observed from different views share the same sparse representation coefficients over all view dictionaries. Also, to increase virtual view generation performance and reconstruction accuracy, a supervised dictionary learning is used to generate adapted dictionary atoms. As dictionary learning is usually expensive from computational point of view, the proposed method benefits from pair-wise dictionary learning which learns each view dictionary separately. As a preprocessing step, the proposed method uses a sparse representation based pose estimation, while sparse representation based classification is used for face recognition in the last step. Therefore, all steps of the proposed method are based on sparse representation. Experiments carried out on FERET and CMU-PIE databases show the superior performance of the proposed method compared to other similar methods especially in confronting large pose angles. Compared to the state-of-the art methods, the proposed method has acceptable recognition accuracy from computational point of view while it requires only one frontal image of each subject for recognition. For further work, we would suggest to extend the proposed method to work locally on small patches of a face image, and to investigate how using mixed norms in the objective function can increase the recognition accuracy.</p>
</sec>
</body>
<back>
<ref-list id="j_info1229_reflist_001">
<title>References</title>
<ref id="j_info1229_ref_001">
<mixed-citation publication-type="journal"><string-name><surname>Aharon</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Elad</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Bruckstein</surname>, <given-names>A.</given-names></string-name> <etal>et al.</etal> (<year>2006</year>). <article-title>K-svd: an algorithm for designing overcomplete dictionaries for sparse representation</article-title>. <source>IEEE Transactions on Signal Processing</source>, <volume>54</volume>(<issue>11</issue>), <fpage>4311</fpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_002">
<mixed-citation publication-type="journal"><string-name><surname>Ahonen</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Hadid</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Pietikainen</surname>, <given-names>M.</given-names></string-name> (<year>2006</year>). <article-title>Face description with local binary patterns: application to face recognition</article-title>. <source>IEEE Transactions on Pattern Analysis &amp; Machine Intelligence</source>, <volume>12</volume>, <fpage>2037</fpage>–<lpage>2041</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_003">
<mixed-citation publication-type="other"><string-name><surname>Belhumeur</surname>, <given-names>P.N.</given-names></string-name>, <string-name><surname>Hespanha</surname>, <given-names>J.P.</given-names></string-name>, <string-name><surname>Kriegman</surname>, <given-names>D.J.</given-names></string-name> (1997). <italic>Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection</italic>. Technical report, Yale University, New Haven, United States of America.</mixed-citation>
</ref>
<ref id="j_info1229_ref_004">
<mixed-citation publication-type="chapter"><string-name><surname>Beymer</surname>, <given-names>D.</given-names></string-name> (<year>1994</year>). <chapter-title>Face recognition under varying pose</chapter-title>. In: <source>CVPR</source>, Vol. <volume>94</volume>, p. <fpage>137</fpage>, <comment>Citeseer</comment>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_005">
<mixed-citation publication-type="journal"><string-name><surname>Blanz</surname>, <given-names>V.</given-names></string-name>, <string-name><surname>Vetter</surname>, <given-names>T.</given-names></string-name> (<year>2003</year>). <article-title>Face recognition based on fitting a 3d morphable model</article-title>. <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>, <volume>25</volume>(<issue>9</issue>), <fpage>1063</fpage>–<lpage>1074</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_006">
<mixed-citation publication-type="journal"><string-name><surname>Chai</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Shan</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Gao</surname>, <given-names>W.</given-names></string-name> (<year>2007</year>). <article-title>Locally linear regression for pose-invariant face recognition</article-title>. <source>IEEE Transactions on Image Processing</source>, <volume>16</volume>(<issue>7</issue>), <fpage>1716</fpage>–<lpage>1725</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_007">
<mixed-citation publication-type="journal"><string-name><surname>Cootes</surname>, <given-names>T.F.</given-names></string-name>, <string-name><surname>Edwards</surname>, <given-names>G.J.</given-names></string-name>, <string-name><surname>Taylor</surname>, <given-names>C.J.</given-names></string-name> (<year>2001</year>). <article-title>Active appearance models</article-title>. <source>IEEE Transactions on Pattern Analysis &amp; Machine Intelligence</source>, <volume>6</volume>, <fpage>681</fpage>–<lpage>685</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_008">
<mixed-citation publication-type="journal"><string-name><surname>Ding</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Tao</surname>, <given-names>D.</given-names></string-name> (<year>2016</year>). <article-title>A comprehensive survey on pose-invariant face recognition</article-title>. <source>ACM Transactions on Intelligent Systems and Technology (TIST)</source>, <volume>7</volume>(<issue>3</issue>), <fpage>37</fpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_009">
<mixed-citation publication-type="chapter"><string-name><surname>Elad</surname>, <given-names>M.</given-names></string-name> (<year>2010</year>). <chapter-title>From exact to approximate solutions</chapter-title>. In: <source>Sparse and Redundant Representations</source>. <publisher-name>Springer</publisher-name>, pp. <fpage>79</fpage>–<lpage>109</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_010">
<mixed-citation publication-type="chapter"><string-name><surname>Engan</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Aase</surname>, <given-names>S.O.</given-names></string-name>, <string-name><surname>Husoy</surname>, <given-names>J.H.</given-names></string-name> (<year>1999</year>). <chapter-title>Method of optimal directions for frame design</chapter-title>. In: <source>1999 IEEE International Conference on Acoustics, Speech, and Signal Processing 1999. Proceedings</source>, Vol. <volume>5</volume>. <publisher-name>IEEE</publisher-name>, pp. <fpage>2443</fpage>–<lpage>2446</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_011">
<mixed-citation publication-type="journal"><string-name><surname>Gottumukkal</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Asari</surname>, <given-names>V.K.</given-names></string-name> (<year>2004</year>). <article-title>An improved face recognition technique based on modular pca approach</article-title>. <source>Pattern Recognition Letters</source>, <volume>25</volume>(<issue>4</issue>), <fpage>429</fpage>–<lpage>436</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_012">
<mixed-citation publication-type="journal"><string-name><surname>Gross</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Matthews</surname>, <given-names>I.</given-names></string-name>, <string-name><surname>Baker</surname>, <given-names>S.</given-names></string-name> (<year>2004</year>). <article-title>Appearance-based face recognition and light-fields</article-title>. <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>, <volume>26</volume>(<issue>4</issue>), <fpage>449</fpage>–<lpage>465</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_013">
<mixed-citation publication-type="journal"><string-name><surname>Jiang</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Hu</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Yan</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Gao</surname>, <given-names>W.</given-names></string-name> (<year>2005</year>). <article-title>Efficient 3d reconstruction for face recognition</article-title>. <source>Pattern Recognition</source>, <volume>38</volume>(<issue>6</issue>), <fpage>787</fpage>–<lpage>798</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_014">
<mixed-citation publication-type="journal"><string-name><surname>Jiang</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Lin</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Davis</surname>, <given-names>L.S.</given-names></string-name> (<year>2013</year>). <article-title>Label consistent k-SVD: learning a discriminative dictionary for recognition</article-title>. <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>, <volume>35</volume>(<issue>11</issue>), <fpage>2651</fpage>–<lpage>2664</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_015">
<mixed-citation publication-type="journal"><string-name><surname>Lee</surname>, <given-names>H.-S.</given-names></string-name>, <string-name><surname>Kim</surname>, <given-names>D.</given-names></string-name> (<year>2006</year>). <article-title>Generating frontal view face image for pose invariant face recognition</article-title>. <source>Pattern Recognition Letters</source>, <volume>27</volume>(<issue>7</issue>), <fpage>747</fpage>–<lpage>754</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_016">
<mixed-citation publication-type="chapter"><string-name><surname>Liu</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>T.</given-names></string-name> (<year>2005</year>). <chapter-title>Pose-robust face recognition using geometry assisted probabilistic modeling</chapter-title>. In: <source>IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005</source>, Vol. <volume>1</volume>. <publisher-name>IEEE</publisher-name>, pp. <fpage>502</fpage>–<lpage>509</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_017">
<mixed-citation publication-type="other"><string-name><surname>Mairal</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Ponce</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Sapiro</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Zisserman</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Bach</surname>, <given-names>F.R.</given-names></string-name> (2009). Supervised dictionary learning. <italic>Advances in Neural Information Processing Systems</italic>, 1033–1040.</mixed-citation>
</ref>
<ref id="j_info1229_ref_018">
<mixed-citation publication-type="chapter"><string-name><surname>McKenna</surname>, <given-names>S.J.</given-names></string-name>, <string-name><surname>Gong</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Collins</surname>, <given-names>J.J.</given-names></string-name> (<year>1996</year>). <chapter-title>Face tracking and pose representation</chapter-title>. In: <source>BMVC</source>, pp. <fpage>1</fpage>–<lpage>10</lpage>. <comment>Citeseer</comment>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_019">
<mixed-citation publication-type="chapter"><string-name><surname>Messer</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Kittler</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Sadeghi</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Hamouz</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Kostin</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Cardinaux</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Marcel</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Bengio</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Sanderson</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Poh</surname>, <given-names>N.</given-names></string-name> <etal>et al.</etal> (<year>2004</year>). <chapter-title>Face authentication test on the banca database</chapter-title>. In: <source>Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004</source>, Vol. <volume>4</volume>. <publisher-name>IEEE</publisher-name>, pp. <fpage>523</fpage>–<lpage>532</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_020">
<mixed-citation publication-type="chapter"><string-name><surname>Murase</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Nayar</surname>, <given-names>S.K.</given-names></string-name> (<year>1993</year>). <chapter-title>Learning and recognition of 3d objects from appearance</chapter-title>. In: <source>IEEE Qualitative Vision Workshop</source>. <publisher-name>CVPR</publisher-name>, <publisher-loc>New York</publisher-loc>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_021">
<mixed-citation publication-type="journal"><string-name><surname>Murphy-Chutorian</surname>, <given-names>E.</given-names></string-name>, <string-name><surname>Trivedi</surname>, <given-names>M.M.</given-names></string-name> (<year>2009</year>). <article-title>Head pose estimation in computer vision: a survey</article-title>. <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>, <volume>31</volume>(<issue>4</issue>), <fpage>607</fpage>–<lpage>626</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_022">
<mixed-citation publication-type="chapter"><string-name><surname>Nastar</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Mitschke</surname>, <given-names>M.</given-names></string-name> (<year>1998</year>). <chapter-title>Real-time face recognition using feature combination</chapter-title>. In: <source>Third IEEE International Conference on Automatic Face and Gesture Recognition, 1998. Proceedings</source>. <publisher-name>IEEE</publisher-name>, pp. <fpage>312</fpage>–<lpage>317</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_023">
<mixed-citation publication-type="chapter"><string-name><surname>Pentl</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Moghaddam</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Starner</surname>, <given-names>T.</given-names></string-name> (<year>1994</year>). <chapter-title>View-based and modular eigenspaces for face recognition</chapter-title>. In: <source>Proceedings/CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition</source>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_024">
<mixed-citation publication-type="journal"><string-name><surname>Phillips</surname>, <given-names>P.J.</given-names></string-name>, <string-name><surname>Moon</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Rizvi</surname>, <given-names>S.A.</given-names></string-name>, <string-name><surname>Rauss</surname>, <given-names>P.J.</given-names></string-name> (<year>2000</year>). <article-title>The feret evaluation methodology for face-recognition algorithms</article-title>. <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>, <volume>22</volume>(<issue>10</issue>), <fpage>1090</fpage>–<lpage>1104</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_025">
<mixed-citation publication-type="journal"><string-name><surname>Prince</surname>, <given-names>S.J.D.</given-names></string-name>, <string-name><surname>Elder</surname>, <given-names>J.H.</given-names></string-name>, <string-name><surname>Warrell</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Felisberti</surname>, <given-names>F.M.</given-names></string-name> (<year>2008</year>). <article-title>Tied factor analysis for face recognition across large pose differences</article-title>. <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>, <volume>30</volume>(<issue>6</issue>), <fpage>970</fpage>–<lpage>984</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_026">
<mixed-citation publication-type="journal"><string-name><surname>Sharma</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Al Haj</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Choi</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Davis</surname>, <given-names>L.S.</given-names></string-name>, <string-name><surname>Jacobs</surname>, <given-names>D.W.</given-names></string-name> (<year>2012</year>). <article-title>Robust pose invariant face recognition using coupled latent space discriminant analysis</article-title>. <source>Computer Vision and Image Understanding</source>, <volume>116</volume>(<issue>11</issue>), <fpage>1095</fpage>–<lpage>1110</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_027">
<mixed-citation publication-type="chapter"><string-name><surname>Sim</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Baker</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Bsat</surname>, <given-names>M.</given-names></string-name> (<year>2002</year>). <chapter-title>The cmu pose, illumination, and expression (pie) database</chapter-title>. In: <source>Fifth IEEE International Conference on Automatic Face and Gesture Recognition, 2002. Proceedings</source>. <publisher-name>IEEE</publisher-name>, pp. <fpage>53</fpage>–<lpage>58</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_028">
<mixed-citation publication-type="journal"><string-name><surname>Wright</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Ma</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Mairal</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Sapiro</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Huang</surname>, <given-names>T.S.</given-names></string-name>, <string-name><surname>Yan</surname>, <given-names>S.</given-names></string-name> (<year>2010</year>). <article-title>Sparse representation for computer vision and pattern recognition</article-title>. <source>Proceedings of the IEEE</source>, <volume>98</volume>(<issue>6</issue>), <fpage>1031</fpage>–<lpage>1044</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_029">
<mixed-citation publication-type="journal"><string-name><surname>Wright</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Yang</surname>, <given-names>A.Y.</given-names></string-name>, <string-name><surname>Ganesh</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Shankar Sastry</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Ma</surname>, <given-names>Y.</given-names></string-name> (<year>2009</year>). <article-title>Robust face recognition via sparse representation</article-title>. <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>, <volume>31</volume>(<issue>2</issue>), <fpage>210</fpage>–<lpage>227</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_030">
<mixed-citation publication-type="chapter"><string-name><surname>Yu</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>H.</given-names></string-name> (<year>2014</year>). <chapter-title>Facial pose estimation via dense and sparse representation</chapter-title>. In: <source>2014 IEEE Symposium on Robotic Intelligence In Informationally Structured Space (RiiSS)</source>. <publisher-name>IEEE</publisher-name>, pp. <fpage>1</fpage>–<lpage>6</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_031">
<mixed-citation publication-type="journal"><string-name><surname>Zhang</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Huang</surname>, <given-names>T.S.</given-names></string-name> (<year>2013</year>). <article-title>Pose-robust face recognition via sparse representation</article-title>. <source>Pattern Recognition</source>, <volume>46</volume>(<issue>5</issue>), <fpage>1511</fpage>–<lpage>1521</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_032">
<mixed-citation publication-type="journal"><string-name><surname>Zhang</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Gao</surname>, <given-names>Y.</given-names></string-name> (<year>2009</year>). <article-title>Face recognition across pose: a review</article-title>. <source>Pattern Recognition</source>, <volume>42</volume>(<issue>11</issue>), <fpage>2876</fpage>–<lpage>2896</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_033">
<mixed-citation publication-type="journal"><string-name><surname>Zhang</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Gao</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Leung</surname>, <given-names>M.K.H.</given-names></string-name> (<year>2008</year>). <article-title>Recognizing rotated faces from frontal and side views: an approach toward effective use of mugshot databases</article-title>. <source>IEEE Transactions on Information Forensics and Security</source>, <volume>3</volume>(<issue>4</issue>), <fpage>684</fpage>–<lpage>697</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_034">
<mixed-citation publication-type="journal"><string-name><surname>Zhang</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Pham</surname>, <given-names>D.-S.</given-names></string-name>, <string-name><surname>Venkatesh</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Phung</surname>, <given-names>D.</given-names></string-name> (<year>2015</year>). <article-title>Mixed-norm sparse representation for multi view face recognition</article-title>. <source>Pattern Recognition</source>, <volume>48</volume>(<issue>9</issue>), <fpage>2935</fpage>–<lpage>2946</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_035">
<mixed-citation publication-type="journal"><string-name><surname>Zhao</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Yin</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Sun</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Hu</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Piao</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Wu</surname>, <given-names>Q.</given-names></string-name> (<year>2016</year>). <article-title>Fisher discrimination-based l2, 1-norm sparse representation for face recognition</article-title>. <source>The Visual Computer</source>, <volume>32</volume>(<issue>9</issue>), <fpage>1165</fpage>–<lpage>1178</lpage>.</mixed-citation>
</ref>
<ref id="j_info1229_ref_036">
<mixed-citation publication-type="journal"><string-name><surname>Zhou</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Fu</surname>, <given-names>J.H.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>Z.</given-names></string-name> (<year>2001</year>). <article-title>Neural network ensemble based view invariant face recognition</article-title>. <source>Journal of Computer Study and Development</source>, <volume>38</volume>(<issue>9</issue>), <fpage>1061</fpage>–<lpage>1065</lpage>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>