<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">INFORMATICA</journal-id>
<journal-title-group><journal-title>Informatica</journal-title></journal-title-group>
<issn pub-type="epub">1822-8844</issn><issn pub-type="ppub">0868-4952</issn><issn-l>0868-4952</issn-l>
<publisher>
<publisher-name>Vilnius University</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">INFOR454</article-id>
<article-id pub-id-type="doi">10.15388/21-INFOR454</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Research Article</subject></subj-group></article-categories>
<title-group>
<article-title>A Systematic Mapping Study on Analysis of Code Repositories</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Sayago-Heredia</surname><given-names>Jaime</given-names></name><email xlink:href="jaime.sayago@pucese.edu.ec">jaime.sayago@pucese.edu.ec</email><xref ref-type="aff" rid="j_infor454_aff_001">1</xref><xref ref-type="corresp" rid="cor1">∗</xref><bio>
<p><bold>J. Sayago-Heredia</bold> is a PhD student at the University of Castilla-La Mancha (UCLM), Spain. His research interests include software engineering. He is a professor at the School of Systems and Computing of the Pontificia Universidad Católica del Ecuador, Sede Esmeraldas. Contact him at jaime.sayago@pucese.edu.ec.</p></bio>
</contrib>
<contrib contrib-type="author">
<name><surname>Pérez-Castillo</surname><given-names>Ricardo</given-names></name><email xlink:href="ricardo.pdelcastillo@uclm.es">ricardo.pdelcastillo@uclm.es</email><xref ref-type="aff" rid="j_infor454_aff_002">2</xref><bio>
<p><bold>R. Perez-Castillo</bold> is a researcher at the Information Technologies and Systems Institute, University of Castilla-La Mancha (UCLM), Spain. His research interests include architecture-driven modernization, model-driven development, business-process archaeology, and enterprise architecture. Perez-Castillo received a PhD in computer science from UCLM. Contact him at ricardo.pdelcastillo@uclm.es.</p></bio>
</contrib>
<contrib contrib-type="author">
<name><surname>Piattini</surname><given-names>Mario</given-names></name><email xlink:href="mario.piattini@uclm.es">mario.piattini@uclm.es</email><xref ref-type="aff" rid="j_infor454_aff_002">2</xref><bio>
<p><bold>M. Piattini</bold> is the director of the Alarcos Research Group and a full professor at the University of Castilla-La Mancha, Spain. His research interests include software and data quality, information-systems audit and security, and IT governance. Piattini received a PhD in computer science from Madrid Technical University, Spain. Contact him at mario.piattini@uclm.es.</p></bio>
</contrib>
<aff id="j_infor454_aff_001"><label>1</label><institution>Pontificia Universidad Católica del Ecuador</institution>, Sede Esmeraldas, Espejo y subida a Santa Cruz Casilla 08-01-0065, <country>Ecuador</country></aff>
<aff id="j_infor454_aff_002"><label>2</label>Information Technology &amp; Systems Institute, <institution>University of Castilla-La Mancha</institution>, Paseo de la Universidad, 4, 13071, Ciudad Real, <country>Spain</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>∗</label>Corresponding author.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2021</year></pub-date><pub-date pub-type="epub"><day>2</day><month>6</month><year>2021</year></pub-date><volume>32</volume><issue>3</issue><fpage>619</fpage><lpage>660</lpage><history><date date-type="received"><month>10</month><year>2020</year></date><date date-type="accepted"><month>5</month><year>2021</year></date></history>
<permissions><copyright-statement>© 2021 Vilnius University</copyright-statement><copyright-year>2021</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>Open access article under the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">CC BY</ext-link> license.</license-p></license></permissions>
<abstract>
<p>Code repositories contain valuable information, which can be extracted, processed and synthesized into valuable information. It enabled developers to improve maintenance, increase code quality and understand software evolution, among other insights. Certain research has been made during the last years in this field. This paper presents a systematic mapping study to find, evaluate and investigate the mechanisms, methods and techniques used for the analysis of information from code repositories that allow the understanding of the evolution of software. Through this mapping study, we have identified the main information used as input for the analysis of code repositories (commit data and source code), as well as the most common methods and techniques of analysis (empirical/experimental and automatic). We believe the conducted research is useful for developers working on software development projects and seeking to improve maintenance and understand the evolution of software through the use and analysis of code repositories.</p>
</abstract>
<kwd-group>
<label>Key words</label>
<kwd>code repository analysis</kwd>
<kwd>repository mining</kwd>
<kwd>code repository</kwd>
<kwd>GitHub</kwd>
<kwd>systematic mapping study</kwd>
</kwd-group>
<funding-group><funding-statement>This study has been partially funded by the G3SOFT (SBPLY/17/180501/000150), GEMA (SBPLY/17/180501/000293) and SOS (SBPLY/17/180501/000364) projects funded by the ‘Dirección General de Universidades, Investigación e Innovación – Consejería de Educación, Cultura y Deportes; Gobierno de Castilla-La Mancha’. This work is also a part of the projects BIZDEVOPS-Global (RTI2018-098309-B-C31) and ECLIPSE (RTI2018-094283-B-C31) funded by Ministerio de Economía, Industria y Competitividad (MINECO) &amp; Fondo Europeo de Desarrollo Regional (FEDER).</funding-statement></funding-group>
</article-meta>
</front>
<body>
<sec id="j_infor454_s_001">
<label>1</label>
<title>Introduction</title>
<p>Software engineering researchers have sought to optimize software development by analysing software repositories, especially code repositories. Code repositories contain important information about software systems and projects to analyse and process (Hassan, <xref ref-type="bibr" rid="j_infor454_ref_047">2008</xref>). Code repositories contain valuable information, which can be extracted, processed and synthesized into output or resultant information. Information allows developers to improve maintenance, increase code quality and understand software evolution. For some years now, software engineering researchers have been working on extracting this information to support the evolution of software systems, improve software design and reuse, and empirically validate new ideas and techniques (Amann <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_006">2015</xref>).</p>
<p>Researchers have a real challenge with the realization of these studies, since it is complex to analyse the different artifacts contained in the code repositories. Despite the challenge of analysing code repositories, they can provide solutions to problems that arise in a software development project such as defects, effort estimation, cloning, evolutionary patterns. Understanding these issues, along with other parameters and metrics obtained from the repository, can decrease maintenance costs and increase the quality of the software.</p>
<p>The objective of this Systematic Mapping Study (SMS) is to find, evaluate and investigate the mechanisms, methods and techniques used for the analysis of information from code repositories that allow the understanding of the evolution of software and research of this area. The primary studies for our research were taken from the main digital databases. The process of searching, analysing, and debugging the literature on code repositories was carried out through rigorous protocols and methodologies described (Section <xref rid="j_infor454_s_014">4</xref>) in subsequent sections of the study. We obtained 236 documents out of a total of 3755 documents published between 2012 and 2019. This selected period (seven years) is a reasonable time period to avoid the selection of outdated, general or extensive works (Cosentino <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_020">2017</xref>; Tahir <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_089">2013</xref>), but also to prevent studies as a result of fashion peaks in a very short period (De Farias <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_023">2016</xref>). The selected studies allowed us to learn about the conducted research in this field and to answer the six research questions we posed.</p>
<p>This study reveals some trends in the current use of software coding evolution and the massive use of code repositories as a platform for software development. We believe this research is useful for developers who are working in software development projects, seek to improve maintenance and understand the evolution of software through the use and analysis of code repositories. These repositories included the source code and information about the development process, which can be analysed and used for both developers and project managers.</p>
<p>An important contribution is that we have defined a taxonomy which was divided according to the input, method and output of the studies and which is a part of our research. Through this mapping study, we have identified the main information inputs used in the analysis of code repositories, as well as the use of a wide variety of tools and methods for processing the information extracted from the code repository. Specifically, most of the studies focus on the use of empirical and other experimental analyses used in other research fields such as artificial intelligence, although there are plenty of other analysis methods employed. The study allows us to understand how the analysis of code repositories has evolved over the last decade. The scientific community has been constantly investigating the potential benefits of code repository analysis for a decade to understand the evolution of software, along with the possibility of validating techniques and tools (Amann <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_006">2015</xref>). It allows us to identify areas where researchers need to go deeper and find new lines of future research.</p>
<p>The rest of this paper is structured as follows: Section <xref rid="j_infor454_s_002">2</xref> provides a brief background on the definition and evolution of code repositories. Section <xref rid="j_infor454_s_005">3</xref> details the research methodology. Section <xref rid="j_infor454_s_014">4</xref> then describes the systematic mapping method applied in this study. Section <xref rid="j_infor454_s_021">5</xref> presents the results of the systematic mapping. Section <xref rid="j_infor454_s_030">6</xref> discusses the main results of the study and analyses them. Finally, Section <xref rid="j_infor454_s_030">6</xref> presents the conclusions of this study.</p>
</sec>
<sec id="j_infor454_s_002">
<label>2</label>
<title>Background</title>
<p>The following is a description of the state of the art code repositories, showing the most important concepts and evolution of this knowledge area. In addition, this section shows papers on Systematic Literature Reviews (SLR), mapping studies and surveys.</p>
<sec id="j_infor454_s_003">
<label>2.1</label>
<title>Code Repository Analysis</title>
<p>One important task in this discipline is software comprehension, since software must be sufficiently understood before it can be properly modified and evolved. Actually, some authors argue that around 60% of software engineering effort is about software comprehension (Cornelissen <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_019">2009</xref>). Researchers in this area use different methods, artifacts and tools to analyse the source code and extract relevant knowledge (Chen <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_018">2016</xref>; Chahal and Saini, <xref ref-type="bibr" rid="j_infor454_ref_016">2016</xref>). The analysis and understanding of software are complicated, alongside the handling of the different versions of the software and other information of the software development projects. To mitigate such problem, there are systems for controlling those versions, servers, and code repositories, and other software artifacts in general.</p>
<list>
<list-item id="j_infor454_li_001">
<label>•</label>
<p><bold>Version Control Systems (VCS).</bold> Version Control Systems (VCS) is a tool that organizes the source code of software systems. VCS are used to store and build all the different versions of the source code (Ball <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_010">1997</xref>). In general, a VCS manages the development of an evolving object (Zolkifli <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_096">2018</xref>), recording every change made by software developers. In the process of building software, developers make changes in portions of the source code, artifacts, and the structure of the software. Thus, it is difficult to organize and document this process because it becomes a large and complex software. Therefore, VCS is a tool that allows developers to manage and control the process of development, maintainability and evolution of a software (Costa and Murta, <xref ref-type="bibr" rid="j_infor454_ref_021">2013</xref>).</p>
</list-item>
<list-item id="j_infor454_li_002">
<label>•</label>
<p><bold>Software repositories.</bold> Systems that store project data, e.g. issue control systems and version control systems, are known as software repositories (Falessi and Reichel, <xref ref-type="bibr" rid="j_infor454_ref_031">2015</xref>). Software repositories are virtual spaces where development teams generate collaborative artifacts from the activities of a development process (Arora and Garg, <xref ref-type="bibr" rid="j_infor454_ref_007">2018</xref>; Güemes-Peña <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_043">2018</xref>; Ozbas-Caglayan and Dogru, <xref ref-type="bibr" rid="j_infor454_ref_079">2013</xref>). Software repositories contain large amount of software historical data that can include valuable information on the source code, defects, and other issues like new features (De Farias <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_023">2016</xref>). Moreover, we can extract many types of data from repositories, study them, and can make changes according to the need (Siddiqui and Ahmad, <xref ref-type="bibr" rid="j_infor454_ref_087">2018</xref>). Due to open source, the number of these repositories and its uses is increasing at a rapid rate in the last decade (Amann <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_006">2015</xref>; Costa and Murta, <xref ref-type="bibr" rid="j_infor454_ref_021">2013</xref>; Wijesiriwardana and Wimalaratne, <xref ref-type="bibr" rid="j_infor454_ref_093">2018</xref>). Such repositories are used to discover useful knowledge about the development, maintenance and evolution of software (Chaturvedi <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_017">2013</xref>; Farias <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_032">2015</xref>). It is important to identify software repositories. Hassan (<xref ref-type="bibr" rid="j_infor454_ref_047">2008</xref>) describes the various examples of software repositories such as the following: historical repositories, run-time repositories, code repositories. Our research mainly focuses on code repositories.</p>
</list-item>
<list-item id="j_infor454_li_003">
<label>•</label>
<p><bold>Code repositories</bold>. Code repositories are maintained by collecting source code from a large number of heterogeneous projects (Siddiqui and Ahmad, <xref ref-type="bibr" rid="j_infor454_ref_087">2018</xref>). Code repositories like SourceForge, GitHub, GitLab, Bitbucket and Google Code contain a lot of information (Güemes-Peña <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_043">2018</xref>). These companies offer services that go beyond simple hosting and version control of the software (Joy <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_054">2018</xref>). Therefore, source code repositories have been attracting a huge interest from many researchers (Lee <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_068">2013</xref>).</p>
</list-item>
</list>
<p>These kinds of systems have been adopted by the industry and are used by a significant number of open source projects (Joy <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_054">2018</xref>). Thereby, such systems have become an important source of technical and social information about software development that is used to identify conventions, patterns, artifacts, etc. made by software development teams to understand and improve the quality of software (Del Carpio, <xref ref-type="bibr" rid="j_infor454_ref_024">2017</xref>). However, the repository platforms only allow searches on projects, so they do not allow any analysis or value-added information to support the decision-making process (Hidalgo Suarez <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_049">2018</xref>). Researchers are interested in analysing these code repositories for information on different software issues (e.g. quality, defects, effort estimation, cloning, evolutionary patterns, etc.). Analysing code repositories is a difficult task that requires certain knowledge on how to access, gather, aggregate and analyse the vast amount of data in code repositories (Dyer <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_029">2015</xref>). Our research focuses on performing an SMS to know what kind of research has been done on the analysis of code repositories and to know what research areas have not been covered yet.</p>
</sec>
<sec id="j_infor454_s_004">
<label>2.2</label>
<title>Related Work</title>
<p>This section describes some secondary studies (e.g. SMS, SLR and surveys) about the analysis of code repositories. To the best of our knowledge, in the relevant literature there are few SLR or SMS studies that tackle analysis of code repositories. We can find some works whose aim is to provide the state of the art in the field of code repository analysis.</p>
<p>In this line, De Farias <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_023">2016</xref>), Siddiqui and Ahmad (<xref ref-type="bibr" rid="j_infor454_ref_087">2018</xref>) and Costa and Murta (<xref ref-type="bibr" rid="j_infor454_ref_021">2013</xref>), present reviews to investigate the different approaches of Mining Software Repositories (MSR), showing they are used for many purposes, mainly for understanding the defects, analysing the contribution and behaviour of developers, and understanding the evolution of software. In addition, the authors strive to discover the problems encountered during the development of software projects and the role of mining software repositories in solving those problems. A comparative study of data mining tools and techniques for extracting software repositories is also presented, one of these tools being VCS. These results can help practitioners and researchers to better understand and overcome version control system problems, and to devise more effective solutions to improve version control in a distributed environment. Zolkifli <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_096">2018</xref>) discusses the background and work related to VCS that has been studied by researchers. The purpose of this document is to convey the knowledge and ideas that have been established in VCS. It is also important to understand the approaches to VCS, as different approaches will affect the software development process differently. Kagdi <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_056">2007</xref>) presents a study on approaches to MSRs that includes sources such as information stored in VCS, error tracking requirements/systems, and communication files. The study provides a taxonomy of software repositories in the context of the evolution of software, which supports the development of tools, methods and processes to understand the evolution of software. In addition, Demeyer <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_025">2013</xref>) provides an analysis of the MSR conference. This paper reports on technologies that are obsolete or emerging and current research methods for the date (2013) the study was conducted. In conclusion, the research focuses on the change and evolution of software, along with a few studies for the industry. The study already mentions the code repositories and their importance as an important source of data for software analysis.</p>
<p>This work focuses on the concepts presented in Section <xref rid="j_infor454_s_003">2.1</xref>. Consequently, the research efforts (Costa and Murta, <xref ref-type="bibr" rid="j_infor454_ref_021">2013</xref>; De Farias <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_023">2016</xref>; Siddiqui and Ahmad, <xref ref-type="bibr" rid="j_infor454_ref_087">2018</xref>; Zolkifli <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_096">2018</xref>) are at the coarse-grained level where a generalized taxonomy of the different types of information analysed in software repositories is performed along with their respective tools and techniques for information extraction. This allows to have a general vision of the different code repositories, but it does not provide details of the information that is obtained from these software repositories. For example, what is the resulting information used for? What problems does it solve? and other questions linked to the maintenance and evolution of the software.</p>
<p>Other studies such as Amann <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_006">2015</xref>) and Güemes-Peña <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_043">2018</xref>), show that the main objectives of software repositories are mainly productivity objectives, such as identifying the impacts of change, as well as making development more effective. Other objectives are to support quality assurance, for example, by finding and predicting errors, or by detecting code clones and calculating the testing effort. Management objectives, such as estimating change effort, understanding human factors, or understanding processes, are also pursued, but in far fewer studies. In addition, in their research on the use of software repositories, they have identified the most relevant problems in the software development community: software degradation, dependencies between exchanges, error prediction and developer interaction. They pointed out that repositories record large volumes of data, although standard data collection tools are not available to extract specific data from the repositories. Most of the data sets came from open source projects with few contributions from industry participants.</p>
<p>In general, those studies highlight the challenge for researchers of analysing code repositories, as they need to deal with various software engineering artifacts, data sources, consider the human factor as a primary component, understand the areas of research and identify their current objectives, gaps and deficiencies, as well as to understand how to better evaluate their purposes and results.</p>
<p>GitHub is the main tool for software repositories with 79 million repositories (Borges and Tulio Valente, <xref ref-type="bibr" rid="j_infor454_ref_014">2018</xref>). Cosentino <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_020">2017</xref>), Kalliamvakou <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_058">2016</xref>) analysed it through a systematic mapping of software development with GitHub, in which most of the work was focused on the interaction around the tasks related to coding and project communities. Some concerns were also identified about the reliability of these results because, in general, the proposal used small data sets and poor sampling techniques, employed a narrow range of methodologies and/or was difficult to understand. They also documented the results of an empirical study aimed at understanding the characteristics of software repositories such as GitHub; their results indicate that while GitHub is a rich source of data on software development, the use of GitHub for research purposes should take into account several potential hazards. Some potential dangers are manifested in relation to repository activity; there should also be a call for quantitative studies to be complemented by qualitative data. There are gaps in the data that may jeopardize the conclusions of any rigorous study. This software repository is a unique resource and continues to grow at a rapid rate; its users are finding innovative ways to use it and it will continue to be an attractive source for research in software engineering.</p>
<p>Therefore, in this section we can observe that SLRs, SMS, survey of the literature and text mining obtain information from MSR conferences or focus directly on the analysis of software repositories, with the purpose of knowing the software development process and understanding the evolution of software (Table <xref rid="j_infor454_tab_001">1</xref>). However, those studies do not analyse in detail other subsets that are a part of software repositories, such as code repositories, which require a greater emphasis of studies in terms of information obtained, tools, techniques, methodologies or information derived from these analyses, which will be identified in this study.</p>
<p>Consequently, in order to understand and identify the information obtained, tools, techniques and utilization of the software repository and its different research topics that remain to be covered, we perform an SMS that provides us with a complete view through different perspectives and does not follow a systematic process of document selection and data extraction, but rather a complete analysis validating the different approaches and proposals of various researchers.</p>
<table-wrap id="j_infor454_tab_001">
<label>Table 1</label>
<caption>
<p>Summary of the related work.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Paper</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Type study</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Objective</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Extracted Info</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Purpose</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">De Farias <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_023">2016</xref>), Siddiqui and Ahmad (<xref ref-type="bibr" rid="j_infor454_ref_087">2018</xref>), Costa and Murta (<xref ref-type="bibr" rid="j_infor454_ref_021">2013</xref>)</td>
<td style="vertical-align: top; text-align: left">SMS</td>
<td style="vertical-align: top; text-align: left">Understand the defects, analyse the contribution and behaviour of developers, and understand the evolution of software</td>
<td style="vertical-align: top; text-align: left">Software repositories, MSR</td>
<td style="vertical-align: top; text-align: left">Software evolution</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Zolkifli <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_096">2018</xref>), Kagdi <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_056">2007</xref>), Demeyer <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_025">2013</xref>)</td>
<td style="vertical-align: top; text-align: left">Systematic literature review, survey of the literature, text mining</td>
<td style="vertical-align: top; text-align: left">Understand approaches to VCS, taxonomy of software repositories, analysis of obsolete or emerging technologies and current research methods</td>
<td style="vertical-align: top; text-align: left">MSR, VCS, Conference MSR</td>
<td style="vertical-align: top; text-align: left">Software development process, understanding the evolution of software, the change and evolution of software</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Amann <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_006">2015</xref>), Güemes-Peña <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_043">2018</xref>)</td>
<td style="vertical-align: top; text-align: left">Systematic literature review,</td>
<td style="vertical-align: top; text-align: left">identification of impact change, maintainability, software quality, developer effort and bug prediction</td>
<td style="vertical-align: top; text-align: left">Conference MSR</td>
<td style="vertical-align: top; text-align: left">Data mining, machine learning, software process</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Borges and Tulio Valente (<xref ref-type="bibr" rid="j_infor454_ref_014">2018</xref>), Cosentino <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_020">2017</xref>), Kalliamvakou <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_058">2016</xref>)</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">SMS</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Coding and project communities, characteristics of software repositories</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">GitHub</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Analysis Software Repository</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="j_infor454_s_005">
<label>3</label>
<title>Research Methodology</title>
<p>Based on the problem identified in the previous section, we prepared the main research question as follows:</p>
<list>
<list-item id="j_infor454_li_004">
<label>RQ.</label>
<p>What are the state of the art techniques and methods for the analysis of information from code repositories?</p>
</list-item>
</list>
<p>SMS is a secondary study that aims to classify and thematically analyse previous research (Kitchenham, <xref ref-type="bibr" rid="j_infor454_ref_061">2007</xref>; Petersen <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_083">2008</xref>). It is related to a broader secondary study, a systematic literature review (SLR), which aims to gather and evaluate all research results on a selected research topic (de Almeida Biolchini <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_005">2007</xref>; Kitchenham <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_062">2009</xref>). There are several SLR methodologies, e.g. PRISMA y PRISMA – P2015 (Preferred Reporting Items for systematic reviews and meta-analyses for protocols 2015) (Shamseer <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_086">2015</xref>) which can be considered as a superior option, however, there are weaknesses (Haddaway <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_045">2018</xref>).</p>
<p>SMS usually use more general search terms, and aim to classify and structure the research field, whereas the aim of SLR is to summarise and conclusively evaluate the research results. Kitchenham (<xref ref-type="bibr" rid="j_infor454_ref_061">2007</xref>) also discuss the applications and state that SMS may be particularly suitable if only a few literature reviews have been conducted on the selected topic, and an overview of the field is sought.</p>
<p>Regardless of the selection, both approaches can be used to identify research gaps in the current state of research, but SMS is usually more applicable if the problem or topic is more generic (Kasurinen and Knutas, <xref ref-type="bibr" rid="j_infor454_ref_059">2018</xref>). In addition, SMS can analyse what kind of studies have been conducted in the field, and what are their methods and results (Bailey <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_009">2007</xref>). In Fig. <xref rid="j_infor454_fig_001">1</xref> we present the systematic mapping process proposed by Petersen <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_083">2008</xref>) for the field of software engineering.</p>
<fig id="j_infor454_fig_001">
<label>Fig. 1</label>
<caption>
<p>Results obtained from the search and selection process.</p>
</caption>
<graphic xlink:href="infor454_g001.jpg"/>
</fig>
<p>The goal of our SMS is to discover and evaluate the methods and techniques used for the analysis of code repository information that allow understanding the evolution of this research area of software engineering.</p>
<p>We have performed the SMS following the formal procedures defined by Petersen <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_084">2015</xref>, <xref ref-type="bibr" rid="j_infor454_ref_083">2008</xref>) and Kitchenham <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_064">2011</xref>) and several steps of the standard process for SMS are presented in Section <xref rid="j_infor454_s_006">3.1</xref>, while Section <xref rid="j_infor454_s_013">3.2</xref> describes the execution phase.</p>
<sec id="j_infor454_s_006">
<label>3.1</label>
<title>Definition Phase</title>
<p>In this phase, we define a set of activities for SMS which are the following: research questions, search process, study selection procedure, quality assessment, data extraction and taxonomy and collection methods.</p>
<sec id="j_infor454_s_007">
<label>3.1.1</label>
<title>Research Questions</title>
<p>The main research question (RQ), described in the previous section along with the main goal of our SMS, is to discover and evaluate recent published studies on the methods and techniques used for information analysis of code repositories in different digital libraries. We segment our main research question into more specific research questions in order to cover the wide scope of our main research question. Table <xref rid="j_infor454_tab_002">2</xref> shows these research questions, together with their motivation. ‘<italic>The question definition is mostly based on the grounded theory methodology which involves the construction of theory through systematic gathering and analysis of data</italic>’ (Stol <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_088">2016</xref>).</p>
<table-wrap id="j_infor454_tab_002">
<label>Table 2</label>
<caption>
<p>Research questions.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Research questions</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Motivation</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">RQ1: What kind of information is taken as input for the analysis of code repositories?</td>
<td style="vertical-align: top; text-align: left">To know what kind of information is analysed and their respective characteristics or approaches in code repositories.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">RQ2: What techniques or methods are used for analysing code repository?</td>
<td style="vertical-align: top; text-align: left">To determine which are the main techniques and methods to obtain information from code repositories.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">RQ3: What information is extracted (directly) or derived (indirectly) as a result of the analysis of code repositories?</td>
<td style="vertical-align: top; text-align: left">To analyse what information is extracted or derived through the analysis of code repositories.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">RQ4. What kind of research has proliferated in this field?</td>
<td style="vertical-align: top; text-align: left">Establish the type of research that is most frequent in this area, e.g. solution proposal, applied research, research evaluation, etc., in order to know the maturity of the area and identify gaps.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">RQ5. Are both academia and industry interested in this field?</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">To analyse the degree of interest of industry in this field through its participation in research work.</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>RQ1 focuses on the input, that is, the information taken for the analysis of the code repository. The method or technique used for the analysis of information from the code repository is then analysed through RQ2. Finally, the purpose and output produced by the analysis is finally investigated by means of RQ3., i.e. the information extracted or derived from the analysis. In addition, questions RQ4 and RQ5 describe the characteristics of the study. RQ4 delimits the type of research, for example, whether it is applicable or proposes a solution; RQ5 determines the involvement of the researchers who have conducted the study, for example, if the researchers are from academia or industry. Once the RQs of our study have been formulated (see Table <xref rid="j_infor454_tab_002">2</xref>), the following subsections describe the search process, study selection procedure, quality assessment, data extraction and taxonomy and collection methods.</p>
</sec>
<sec id="j_infor454_s_008">
<label>3.1.2</label>
<title>Search Process</title>
<p>In a systematic mapping, an important step is to define the search process for primary studies. These studies are identified by using searches in scientific bibliographies or by browsing the research of specific known journals and conferences in the area. In our systematic mapping, we search five digital scientific databases considered relevant to software engineering recommended by Kuhrmann <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_065">2017</xref>) for primary studies: Scopus, IEEE Xplore, ACM Digital Library, ScienceDirect and ISI Web of Science. The use of these libraries allows us to find the largest number of primary studies related to the research questions.</p>
<table-wrap id="j_infor454_tab_003">
<label>Table 3</label>
<caption>
<p>Main terms and synonyms or alternative terms.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Main terms</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">AND expression division</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Alternative terms</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Code repository</td>
<td style="vertical-align: top; text-align: left">Conceptual synonyms</td>
<td style="vertical-align: top; text-align: left">software repository version control systems</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Technological synonyms</td>
<td style="vertical-align: top; text-align: left">Git</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Svn</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Analysis</td>
<td style="vertical-align: top; text-align: left">Synonyms</td>
<td style="vertical-align: top; text-align: left">Mining</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Inspection</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"/>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"/>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Exploring</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>After selecting the scientific libraries for the search, the next step is to create the search string. We define two main terms: “Code repository” and “Analysis”, to cover the terms (input and output) that are identified in the black box system (Perez-castillo <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_081">2019</xref>). In addition, we used the term version control system to include it in the search string because it goes similar with the main term and is a part of code repository (Dias de Moura <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_026">2014</xref>). Similarly, as technological synonyms, we include the terms Git and SVN because of their wide adoption and use as the most popular tools (Just <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_055">2016</xref>). With the main terms defined, we chose to specify some synonyms and alternative terms (see Table <xref rid="j_infor454_tab_003">3</xref>). To link the main defined terms we use AND, and to link the alternative terms we use OR. We found (test search string) that using this combination of terms we get the most studies that are a part of our approach (e.g. GitHub, GitLab, StackOverflow). The generated search string is as follows:</p><graphic xlink:href="infor454_g002.jpg"/>
<p>We search each of the five academic databases using the defined search string, with the exception of the ISI Web of Science, which does not allow it, and therefore we apply the search string only to the title, abstract and keywords. The search string was modified for each digital library. For replication of the study, Appendix <xref rid="j_infor454_app_001">A</xref> shows each library together with the search string with the syntax needed to be used in the digital library. An important point is that a filter was made by considering only the studies from 2012 to 2019, so one of the exclusion criteria was met. This selected period (seven years) is a reasonable time period to avoid the selection of outdated, general or extensive works (Tahir <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_089">2013</xref>; Cosentino <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_020">2017</xref>). Also, it helps to avoid works in fashion peaks, that is, works in a very short period (De Farias <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_023">2016</xref>).</p>
</sec>
<sec id="j_infor454_s_009">
<label>3.1.3</label>
<title>Selection of Primary Studies Procedure</title>
<p>The results obtained in the search in digital scientific libraries contain studies that contribute to our research and others that are irrelevant, so it is necessary to define both selection and exclusion criteria to filter those results. The practices and strategies of inclusion and exclusion of studies are valuable for the Petersen and Gencel systematic reviews (Petersen and Gencel, <xref ref-type="bibr" rid="j_infor454_ref_082">2013</xref>). We defined the following criteria:</p>
<table-wrap id="j_infor454_tab_004">
<label>Table 4</label>
<caption>
<p>Inclusion and exclusion criteria.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Id</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Criteria</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">IC1</td>
<td style="vertical-align: top; text-align: left">Peer reviewed paper, for example, proceeding chapters, book chapters, keynote abstracts, call for papers and irrelevant publications.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">IC2</td>
<td style="vertical-align: top; text-align: left">The study employs some kind of techniques or methods to extract information through the analysis of code repositories.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">IC3</td>
<td style="vertical-align: top; text-align: left">The study provides some idea or type of application that might be applied for the analysis of code repositories.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">IC4</td>
<td style="vertical-align: top; text-align: left">The papers that were published from 1 January 2012 to 31 August 2019</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">EC1</td>
<td style="vertical-align: top; text-align: left">The paper is duplicate</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">EC2</td>
<td style="vertical-align: top; text-align: left">Non-English articles</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">EC3</td>
<td style="vertical-align: top; text-align: left">The paper is a preliminary investigation which is extended or is dealt with in depth in a more recent paper by the same authors which have already been included.</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">EC4</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">The focus of the article is not within the computer science area.</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The inclusion criteria (IC1 in Table <xref rid="j_infor454_tab_004">4</xref>) refer first to the studies (IC1) that analyse the code repositories, extract information from the code repositories, using techniques and methods, what type of information is extracted and what this retrieved information is used for. The second inclusion criterion (IC2) refers to studies that propose innovative ideas or techniques that can be adapted or modified to apply to the analysis of code repositories.</p>
<p>Exclusion criteria (ECn) refers to the common exclusion criteria widely used in SMS to exclude, for example, duplicate papers, papers written in a language other than English, studies that present lectures and presentations, or papers that present research in other subject areas that do not meet the research area. We use the following procedure to select primary studies:</p>
<list>
<list-item id="j_infor454_li_005">
<label>Step 1.</label>
<p>Paper is not a duplicate.</p>
</list-item>
<list-item id="j_infor454_li_006">
<label>Step 2.</label>
<p>Apply the exclusion/inclusion criteria to the studies obtained by using the search string, along with the analysis of the title, keywords and abstract of the article containing information related to our research topic. Therefore, we included studies that met at least one of the criteria (see Table <xref rid="j_infor454_tab_004">4</xref> for inclusion criteria). In case of doubt, we proceeded to include the document for further analysis in Step 3.</p>
</list-item>
<list-item id="j_infor454_li_007">
<label>Step 3.</label>
<p>In order to perform a more exhaustive filtering and to know which studies should be excluded or selected, we proceed to read the entire study using the exclusion/inclusion criteria. The first author was responsible for selecting the studies. In this step, selection issues were resolved by agreement among all authors after analysing the full text. We obtained the primary studies that we used for our analysis and that allow us to answer the questions posed.</p>
</list-item>
</list>
<p>After Step 2, we use another procedure to mitigate subjectivity. The remaining authors carried out the verification of the results of the study selection separately. The authors took a random sample for Step 2 and the inclusion/exclusion criteria were applied. Once the procedure was completed, the researchers checked the agreements on the selection and classification procedure of the selected studies.</p>
</sec>
<sec id="j_infor454_s_010">
<label>3.1.4</label>
<title>Quality Assessment</title>
<p>In order to provide the quality assessment of the selected studies, Petersen <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_084">2015</xref>), Kitchenham <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_063">2013</xref>) propose criteria to perform a quality assessment for SMS in software engineering. The highest score obtained from a study means that the results are clear, with replicable results, its limitations have been analysed and its presentation is clear. In a similar way, we used these parameters to assess the quality of publications related to code repositories. An instrument with questions and a five-point rating scale was designed to determine the quality of the primary studies.</p>
<p>This analysis contains five subjective closed-ended and two-point objective questions. The assessment scale considers a range from 1 to 5 in quantitative terms, i.e. based on the Likert-scale (Pedreira <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_080">2015</xref>). The possible answers to these questions show the reviewer’s level of agreement, and range between 1 = “Strongly disagree” 2 = “Disagree”, 3 = “Neither agree nor disagree”, 4 = “Agree”, 5 = “Strongly agree”. In order to carry out the evaluation of the selected papers, considering subjectivity, group discussion sessions were held with other experts, so that the assessment of each evaluation question for each paper was obtained by consensus and independently.</p>
<p>The quality assessment provided us with guidelines and aspects related to research in area of code repositories and the information that is entered, processed, analysed, and is used in different aspects. Table <xref rid="j_infor454_tab_005">5</xref> presents the questions of the instrument used. (AQ1) evaluates the primary studies in relation to the analysis of information from code repositories (a systematic approach); (AQ2) if the study presents a result of the analysis of information from code repositories; (AQ3) if the study uses an artifact (method, technique, tool) for the processing of information from code repositories; (AQ4) if the study provides a solution to the problems of quality, development and evolution of software or not. (AQ5) if the research provides any artifact (method, technique, tool) that can be applied in an industrial environment (see Table <xref rid="j_infor454_tab_005">5</xref>). (AQ6) estimates the number of citations, which we obtained from the various digital scientific databases. For AQ6, the scale values we use are the value of ‘1’ for the score of the studies with the least amount and the value of ‘5’ with the studies with the most amount of citations. In addition, we standardized the papers, dividing the number of citations by the number of total years published. The standardization of papers helps us to avoid penalties for recent publications. AQ7 determines whether the conference or journal publishing the study is outstanding or important. To measure this question, we considered the relevance index collected by two conference classifications: CORE ERA and Qualis. These conferences were standardized with ranges (‘A’, ‘B’, ‘C’) for the first one and (‘A1’, ‘A2’, ‘B1’, ‘B2’, ‘B3’, ‘B4’, ‘B5’) for the second one to finally obtain a calculated average. In the case of journal articles, we rely on the Journal Citation Reports (JCR) quartiles, which have their index (‘Q1’ = ‘5’, ‘Q2’ = ‘4’, ‘Q3’ = ‘3’, ‘Q4’ = ‘2’, ‘Q5’ = ‘1’) that is in a descending order with the lowest ‘1’ representing non-indexed journals.</p>
<table-wrap id="j_infor454_tab_005">
<label>Table 5</label>
<caption>
<p>Quality assessment questions.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Nr.</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Assessment questions</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Criteria</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">AQ1</td>
<td style="vertical-align: top; text-align: left">Does the study have a systematic method for obtaining baseline information for code repository analysis?</td>
<td style="vertical-align: top; text-align: left">Defined methods</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">AQ2</td>
<td style="vertical-align: top; text-align: left">Does the study present a result of code repository information analysis?</td>
<td style="vertical-align: top; text-align: left">Data analysis</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">AQ3</td>
<td style="vertical-align: top; text-align: left">Does the study present an artifact (technique, tool, or method) for processing information from code repositories?</td>
<td style="vertical-align: top; text-align: left">Study presentation results</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">AQ4</td>
<td style="vertical-align: top; text-align: left">Does the research show a solution to the problems of software quality, development and evolution?</td>
<td style="vertical-align: top; text-align: left">Study focus</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">AQ5</td>
<td style="vertical-align: top; text-align: left">Does the research provide an artifact (technique, tool or method) that can be applied in industrial environments?</td>
<td style="vertical-align: top; text-align: left">Application</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">AQ6</td>
<td style="vertical-align: top; text-align: left">Do other authors cite the selected study?</td>
<td style="vertical-align: top; text-align: left">Utility</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">AQ7</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Is the journal or conference that publishes the study important or relevant?</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Relevant</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="j_infor454_s_011">
<label>3.1.5</label>
<title>Procedure for Data Extraction and Taxonomy</title>
<table-wrap id="j_infor454_tab_006">
<label>Table 6</label>
<caption>
<p>Taxonomy.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">RQs</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Categories</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">RQ1</td>
<td style="vertical-align: top; text-align: left">Project Features Info</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Defects</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Comments</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Branches</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Source Code</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Informal Information</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Committers</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Commit Data</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Logs</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Graphs/News Feed</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Issue</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Pulls/Pull Request</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Level of Interest</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Repository Info</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">RQ2</td>
<td style="vertical-align: top; text-align: left">Automatic Processing</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Branching Analysis</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Changes Analysis</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Commits/Committers Classification</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Cloning Detection</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Code Review</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Commit Analysis</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Defect/Issues Analysis</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Developer Behaviour</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Design Modelling</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Maintainability Information</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Metrics/Quality</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Source Code Improvements</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Testing Data</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">RQ3</td>
<td style="vertical-align: top; text-align: left">Ad Hoc Algorithms</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Data Mining</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Automatic</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Artificial Intelligence/Machine Learning</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Qualitative Analyses</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Heuristic Techniques</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Empirical/Experimental</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Statistical Analyses</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Prediction</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Reverse Engineering</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Testing-Based Techniques</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">RQ4</td>
<td style="vertical-align: top; text-align: left">Evaluation Research</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Proposal of Solution</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Validation Research</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Philosophical Papers</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Opinion Papers</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Personal Experience Papers</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">RQ5</td>
<td style="vertical-align: top; text-align: left">Industry</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"/>
<td style="vertical-align: top; text-align: left">Academia</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin"/>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Freelance</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We obtained clear and systematic information using a data extraction instrument (see Appendix <xref rid="j_infor454_app_002">B</xref>). We defined the possible answers for research questions posed in the previous sections (see Table <xref rid="j_infor454_tab_002">2</xref>). We obtain a homogeneous cluster by extracting the criteria from the selected studies and allowing for a taxonomy. For taxonomy we take as a basis the taxonomy provided (Dit <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_028">2013</xref>), other similar studies (Kagdi <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_056">2007</xref>; Cavalcanti <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_015">2014</xref>) and our pilot study. Main category consists of a set subcategory that shares common characteristics and type quantitative. For example, the category “Empirical/Experimental” in Table <xref rid="j_infor454_tab_006">6</xref>, which corresponds to RQ3, is grouped together with research that employs methods and techniques, mentioned bellow: empirical study, empirical evaluation, controlled experiment, experimental study case, study empirical analyses, exploratory study. Extraction procedure was tested using the form in a pilot study (see Appendix <xref rid="j_infor454_app_002">B</xref>). Intention of the pilot studies is “to evaluate both technical issues, such as the completeness of the forms, and usability issues, such as the clarity of the instructions for use and the order of the questions” (Kitchenham, <xref ref-type="bibr" rid="j_infor454_ref_061">2007</xref>). The process of item categorization was carried out by the authors individually. Items with some disagreements were identified and a discussion table was held about them. The set of attributes was extracted and defined by the authors. The characterization of the articles by the authors allows us to verify the quality of the taxonomy, minimising possible bias. At the discussion table, disagreements served as a parameter that our taxonomy and content needed to be refined. Table <xref rid="j_infor454_tab_006">6</xref> shows in more detail the taxonomy we developed for each research question.</p>
</sec>
<sec id="j_infor454_s_012">
<label>3.1.6</label>
<title>Summary Methods</title>
<p>We summarize the results through both qualitative and quantitative approaches. The qualitative approaches are as follows: 
<list>
<list-item id="j_infor454_li_008">
<label>•</label>
<p>Quality assessment is an important parameter when selecting studies according to research questions.</p>
</list-item>
<list-item id="j_infor454_li_009">
<label>•</label>
<p>We delimit the research questions with a classification and a quality evaluation.</p>
</list-item>
</list> 
The quantitative approaches are as follows:</p>
<list>
<list-item id="j_infor454_li_010">
<label>•</label>
<p>We generate a taxonomy of the selected studies according to each research question (see Table <xref rid="j_infor454_tab_006">6</xref>).</p>
</list-item>
<list-item id="j_infor454_li_011">
<label>•</label>
<p>We made a summary with the total number of articles per country and per year (see Fig. <xref rid="j_infor454_fig_003">3</xref>).</p>
</list-item>
<list-item id="j_infor454_li_012">
<label>•</label>
<p>We prepared a matrix of each primary study distributed in rows containing information on the research questions, proposed taxonomy and quality assessment.</p>
</list-item>
<list-item id="j_infor454_li_013">
<label>•</label>
<p>To summarize the results of the SMS, we generated a bubble chart where the different research questions intersect with the number of the selected primary studies.</p>
</list-item>
</list>
<p>According to Petersen and others (Petersen <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_083">2008</xref>), a bubble plot “is basically two <inline-formula id="j_infor454_ineq_001"><alternatives><mml:math>
<mml:mi mathvariant="italic">x</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi mathvariant="italic">y</mml:mi></mml:math><tex-math><![CDATA[$x-y$]]></tex-math></alternatives></inline-formula> scatter plots with bubbles at category intersections”. The proportion of the bubble size depends on the number of studies that are distributed in the <inline-formula id="j_infor454_ineq_002"><alternatives><mml:math>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">x</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi mathvariant="italic">y</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo></mml:math><tex-math><![CDATA[$(x-y)$]]></tex-math></alternatives></inline-formula> categories of the bubble.</p>
</sec>
</sec>
<sec id="j_infor454_s_013">
<label>3.2</label>
<title>Execution Phase</title>
<p>The execution of our SMS was carried out by three researchers, with time of 9 months to finish. The systematic mapping study schedule began with protocol development and improvement, extraction, and elimination of duplicates. This was followed by study selection by analysing the title, abstract and keywords. Another iteration applied the inclusion and exclusion criteria. Then, all the primary studies selected in this step were downloaded. The selection process is determined by the full text, we apply taxonomy, classification, and quality assessment. Conflict resolution is carried out by focus group sessions. And finally, report of all the steps executed and the activities carried out throughout the study was generated. In Figure <xref rid="j_infor454_fig_002">2</xref> we can see a summary of the search and selection process of primary studies and their respective results.</p>
<fig id="j_infor454_fig_002">
<label>Fig. 2</label>
<caption>
<p>Results obtained from the search and selection process.</p>
</caption>
<graphic xlink:href="infor454_g003.jpg"/>
</fig>
<p>Altogether, we obtained 3755 publications as a result of the automatic search in the digital libraries. As a first step, we eliminated duplicates (502 studies) obtaining a total of 3409 studies. Then, applying the exclusion and inclusion criteria, we selected 732 publications. As the last one that corresponds to the complete reading of the study, we selected 236 studies. Appendix <xref rid="j_infor454_app_003">C</xref> shows the list of the primary studies we selected. We subsequently conducted the extraction of data, classification and synthesis with these 236 primary studies. It is possible to view the tables obtained from the data extraction, classification and synthesis online at https://GitHub.com/jaimepsayago/SMS.</p>
</sec>
</sec>
<sec id="j_infor454_s_014">
<label>4</label>
<title>Results of the Systematic Mapping Study</title>
<p>The search process was carried out by following the criteria and strategies described in the previous section. Figure <xref rid="j_infor454_fig_003">3</xref> and Appendix <xref rid="j_infor454_app_001">A</xref> show a summary of the number of papers obtained in each step of the search process regarding the year in which the primary studies were published and which country their authors were from, respectively.</p>
<p>According to the results shown in Fig. <xref rid="j_infor454_fig_003">3</xref>, the number of primary studies obtained may appear to be large. We considered studies published between 2012 and 2019 for the reasons explained in Section <xref rid="j_infor454_s_009">3.1.3</xref>. The distribution shows an upward trend regarding the papers retrieved from digital libraries in the most recent years. The first primary studies focusing on code repositories were published at the beginning of the 2000s. The number of studies published in 2012 are on par with those of 2013. The amount of studies is much higher in 2014 and 2015. In this sense, the number of primary studies published in 2016 is lower than in the previous years. Nevertheless, there is a spike in the number of articles published in 2017 and 2018. In the year 2019, the number of studies decreases because of the cut in our mapping: the search lasted from January to August, although the trend of papers for the year 2019 is high. This result seems to follow the trend even though we do not complete the whole year 2019. Yet we can see a growing interest from researchers in code repositories.</p>
<fig id="j_infor454_fig_003">
<label>Fig. 3</label>
<caption>
<p>Distribution of primary studies by year.</p>
</caption>
<graphic xlink:href="infor454_g004.jpg"/>
</fig>
<fig id="j_infor454_fig_004">
<label>Fig. 4</label>
<caption>
<p>Distribution of primary studies by country.</p>
</caption>
<graphic xlink:href="infor454_g005.jpg"/>
</fig>
<p>Figure <xref rid="j_infor454_fig_004">4</xref> shows the distribution of the studies (we include affiliation and location of each of the authors) according to the year they were published. It reveals that most of the selected papers come from the American continent, the first is USA (17%), followed by Brazil (10%), Canada (10%) and then China (9%), Japan (9%) and India (8%). Despite these top countries, code repository analysis is widely studied around the world, demonstrating the importance of the topic.</p>
<p>In terms of the type of publication, the studies were published as conference proceedings with 60% and conference papers with 10%, respectively (see Fig. <xref rid="j_infor454_fig_005">5</xref>). Journal articles represented only 23% of the total selected primary studies. The 4% correspond to series and 3% to book sections. By analysing these results, we can observe that there are certain efforts to achieve a greater maturity in this field of research. However, it must be taken into account that this is a field that has been intensively researched during the last decade.</p>
<fig id="j_infor454_fig_005">
<label>Fig. 5</label>
<caption>
<p>Types of publications considered in research.</p>
</caption>
<graphic xlink:href="infor454_g006.jpg"/>
</fig>
<p>In this section an analysis is performed with the 236 primary studies obtained following the classification criteria and research questions that have been previously outlined (see Table <xref rid="j_infor454_tab_002">2</xref> and <xref rid="j_infor454_tab_006">6</xref>). The answers to the stated research questions, according to the analysis performed on the primary studies selected are depicted in next sections.</p>
<sec id="j_infor454_s_015">
<label>4.1</label>
<title>RQ1. What Kind of Information is Taken as Input for the Analysis of Code Repositories?</title>
<p>Table <xref rid="j_infor454_tab_007">7</xref> shows the classification of categories made for the first research question. The selected studies exhibit different artifacts that require some kind of grouping. For this purpose, we have created a taxonomy for code repositories. For this taxonomy, we take as a basis the taxonomy provided in Dit <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_028">2013</xref>) as well as other similar studies (Kagdi <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_056">2007</xref>; Cavalcanti <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_015">2014</xref>). The 14 possible inputs considered for RQ1 are grouped and evaluated as shown in Table <xref rid="j_infor454_tab_007">7</xref>.</p>
<p>Table <xref rid="j_infor454_tab_007">7</xref> shows how the selected primary studies are distributed in relation to (RQ1), the number of studies for each category and the percentage distribution. We observed that the proposed distribution includes the different components present in a code repository (commits, pulls, branches, etc.). We identified several studies that combine more than one source of data to achieve their analysis of the code repository but do not reach a significant number within our research (less than 2%). For the analysis of code repositories are the commits; commit messages is the most recurrent input employed in selected studies, with 115 studies in total (34%). This means the type of information most used as input, contributors, commit history, etc. In particular, the most relevant studies of this category focus on the use of repository commits that record changes in the source code made by developers.</p>
<p>Some of the studies in this category focus on taking as the main information the commitments to follow up on issues that may occur in the project or with the developers, and that represent a challenge when executing software maintenance. Thus, the study 115 (Jarczyk <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_051">2017</xref>), the closing of issues (errors and characteristics) is studied using it as main information for the analysis (commits) in software projects, which allows a better understanding of the factors that affect the completion rates of issues in software projects. As for example in the study 130 (Kagdi1 <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_057">2014</xref>), the authors propose to use commits in repositories that record changes to source code submitted by developers to version control systems. This approach consists of recommending a classified list of expert developers to assist in the execution of software change requests.</p>
<p>The second most common entry with 90 studies (26%) (see Table <xref rid="j_infor454_tab_007">7</xref>) is the input “source code” which represents a huge body of software and related information for researchers who are interested in analysing different properties related to the source code of software. Specifically, source code allows a meaningful understanding of software development artifacts and processes (Dyer <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_029">2015</xref>). For example, the study 187 (Negara <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_076">2014</xref>) provides an approach that previously identifies unknown frequent code change patterns of a sequence of fine-grained code changes and that allows understanding the evolution of the code.</p>
<p>Other studies in this category also focus on analysing the repository code to investigate the development process. The study 80 (Finlay <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_034">2014</xref>) takes as main information (input) the code and comments to obtain metrics to relate them to the development effort.</p>
<p>The third most common entry is the input “information repository” with 23 studies (7%) (see Table <xref rid="j_infor454_tab_007">7</xref>), it mainly groups historical data, dataset repository and historical code repository. These studies focus on obtaining the general information of the repository order to obtain useful knowledge for the development and maintenance of the software.</p>
<p>In this category, we found the study 232 (Wu <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_095">2014</xref>) that takes the information from the code repository to analyse social characteristics of collaboration between developers. The authors focus on demonstrating that code repositories are a part of a broad ecosystem of developer interactions. Another example is the study 191 (Novielli <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_078">2018</xref>) that takes the information from the code repository to analyse the emotions of developers, applying sentiment analysis to the content of communication traces left in collaborative development environments.</p>
<table-wrap id="j_infor454_tab_007">
<label>Table 7</label>
<caption>
<p>Classification of selected papers by the input used (RQ1).</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Input</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Papers</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin"># studies</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">%</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Commit data</td>
<td style="vertical-align: top; text-align: left">62, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 185, 189, 203, 223, 225, 226, 227, 229, 230, 233, 235</td>
<td style="vertical-align: top; text-align: left">115</td>
<td style="vertical-align: top; text-align: left">34</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Source code</td>
<td style="vertical-align: top; text-align: left">7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 167, 176, 203, 224, 226, 228, 229, 230, 235, 236</td>
<td style="vertical-align: top; text-align: left">90</td>
<td style="vertical-align: top; text-align: left">26</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Repository info</td>
<td style="vertical-align: top; text-align: left">61, 179, 180, 181, 182, 190, 191, 192, 193, 194, 195, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 232, 234</td>
<td style="vertical-align: top; text-align: left">23</td>
<td style="vertical-align: top; text-align: left">7</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Issue</td>
<td style="vertical-align: top; text-align: left">84, 85, 166, 167, 173, 174, 175, 178, 195, 196, 197, 198, 199, 200, 201, 202, 225, 229, 234</td>
<td style="vertical-align: top; text-align: left">19</td>
<td style="vertical-align: top; text-align: left">6</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Comments</td>
<td style="vertical-align: top; text-align: left">63, 64, 65, 66, 87, 88, 89, 90, 91, 157, 158, 159, 185, 198, 199, 200</td>
<td style="vertical-align: top; text-align: left">16</td>
<td style="vertical-align: top; text-align: left">5</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Branches</td>
<td style="vertical-align: top; text-align: left">1, 2, 3, 147, 148, 149, 165, 177, 185, 186, 187, 188, 189</td>
<td style="vertical-align: top; text-align: left">13</td>
<td style="vertical-align: top; text-align: left">4</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Defects</td>
<td style="vertical-align: top; text-align: left">4, 5, 6, 80, 81, 82, 150, 151, 152, 153, 154, 155, 233</td>
<td style="vertical-align: top; text-align: left">13</td>
<td style="vertical-align: top; text-align: left">4</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Pulls/pulls request</td>
<td style="vertical-align: top; text-align: left">146, 172, 173, 174, 175, 205, 206, 207, 208, 209, 210, 223, 225</td>
<td style="vertical-align: top; text-align: left">13</td>
<td style="vertical-align: top; text-align: left">4</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Commiters</td>
<td style="vertical-align: top; text-align: left">83, 102, 153, 154, 155, 160, 161, 162, 163, 164, 165, 233</td>
<td style="vertical-align: top; text-align: left">12</td>
<td style="vertical-align: top; text-align: left">4</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Informal information</td>
<td style="vertical-align: top; text-align: left">86, 156, 168, 169, 170, 171, 203, 204</td>
<td style="vertical-align: top; text-align: left">8</td>
<td style="vertical-align: top; text-align: left">2</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Level of interest</td>
<td style="vertical-align: top; text-align: left">177, 188, 189, 211, 212, 223</td>
<td style="vertical-align: top; text-align: left">6</td>
<td style="vertical-align: top; text-align: left">2</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Proyect features info</td>
<td style="vertical-align: top; text-align: left">61, 62, 86, 100, 101, 189</td>
<td style="vertical-align: top; text-align: left">6</td>
<td style="vertical-align: top; text-align: left">2</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Logs</td>
<td style="vertical-align: top; text-align: left">6, 144, 145, 183, 184</td>
<td style="vertical-align: top; text-align: left">5</td>
<td style="vertical-align: top; text-align: left">1</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Graphs/news feed</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">201, 202</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">2</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">1</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The remaining categories are as follows. The next input is the “issues” category with 19 studies (6%) (see Table <xref rid="j_infor454_tab_007">7</xref>), the issues are processed for the resolution of the problem or question of something specific. The fifth input is the “comments” category with 16 studies (5%) which can be seen as an important complementary analysis component in a repository. These five categories are the most common inputs for this classification.</p>
<p>Other studies employ alternative inputs that are used in groups of 12 to 13 studies, representing less than 5% (see Table <xref rid="j_infor454_tab_007">7</xref>). Those categories are “branches”, “defects”, and “pulls and pull requests” and “committers”. The studies take characteristics from code repositories as mentioned in Section <xref rid="j_infor454_s_003">2.1</xref>. In study 75 (Elsen, <xref ref-type="bibr" rid="j_infor454_ref_030">2013</xref>) mention that a potential contributor may participate in the development or maintenance process by submitting a pull request, which will be reviewed for acceptance or rejection by the central development team. It is here, that in addition to hosting software repositories, features such as “defects” of the developers and the “branches” or “forks” of the projects are incorporated into the development process (Liu <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_070">2016</xref>). These actions and interactions of the developers are to be collected and allows the possibility of analysis within the code repositories.</p>
<p>Other five categories with 27 studies in total represent no more than 8% (see Table <xref rid="j_infor454_tab_007">7</xref>) of the total studies in RQ1. These studies are directed at features of the code repositories presented in Section <xref rid="j_infor454_s_003">2.1</xref>. The categorization with the type of information as input is “informal information” which focuses on “chats”, “mails” and “messages” interchanged. “Level of interest” relates to the social components of the project such as “stars”, “follows” and “watches”, that are mechanisms typically offered in public open source repositories like, for example, GitHub. “Project features info” involves aspects like the “size”, “owner”, “weeks”, “contributors” which are a part of the general features of the code repository. “Logs” and “graphs/news feed” are categories that also contribute as inputs to the analysis of the code repository.</p>
</sec>
<sec id="j_infor454_s_016">
<label>4.2</label>
<title>RQ2. What Techniques or Methods are Used for Analysing Source Code Repository?</title>
<p>The methods or techniques used in the process of analysis of the code repository can be observed in Table <xref rid="j_infor454_tab_008">8</xref> together with the distribution of studies for this question. Results for this question were obtained using the procedure for data extraction and taxonomy provided in Section <xref rid="j_infor454_s_011">3.1.5</xref>. There are some papers that are present in more than one category, i.e. different methods contributed by the paper were counted. As a result, the percentage column in Table <xref rid="j_infor454_tab_008">8</xref> represents the total.</p>
<table-wrap id="j_infor454_tab_008">
<label>Table 8</label>
<caption>
<p>Classification of selected papers by concern/topic, number or studies and percentage (RQ2).</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Methods-techniques</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Papers</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin"># studies</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">%</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Empirical/experimental</td>
<td style="vertical-align: top; text-align: left">2, 5, 8, 16, 17, 18, 19, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 34, 35, 36, 42, 43, 44, 45, 46, 48, 52, 56, 57, 59, 60, 62, 63, 64, 65, 66, 68, 74, 77, 83, 88, 91, 97, 100, 101, 102, 107, 111, 113, 114, 118, 120, 121, 128, 138, 139, 143, 147, 148, 151, 154, 161, 163, 165, 169, 177, 179, 184, 185, 186, 189, 192, 193, 194, 196, 197, 202, 206, 209, 211, 212, 213, 215, 216, 220, 222, 223, 225, 226, 227, 229, 233, 235</td>
<td style="vertical-align: top; text-align: left">93</td>
<td style="vertical-align: top; text-align: left">39</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Automatic</td>
<td style="vertical-align: top; text-align: left">14, 51, 54, 55, 67, 71, 75, 76, 78, 81, 82, 85, 90, 96, 99, 105, 109, 116, 131, 153, 159, 168, 171, 174, 176, 181, 208, 219, 230</td>
<td style="vertical-align: top; text-align: left">29</td>
<td style="vertical-align: top; text-align: left">12</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Artificial intelligence/machine learning</td>
<td style="vertical-align: top; text-align: left">1, 3, 4, 33, 49, 50, 53, 69, 70, 80, 87, 106, 108, 122, 126, 127, 130, 142, 150, 157, 158, 166, 175, 178, 182, 203, 205, 207, 214</td>
<td style="vertical-align: top; text-align: left">29</td>
<td style="vertical-align: top; text-align: left">12</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Statistical analyses</td>
<td style="vertical-align: top; text-align: left">6, 9, 12, 21, 37, 38, 39, 58, 86, 89, 94, 98, 112, 136, 167, 199, 201, 204, 210, 217, 228, 232, 234, 236</td>
<td style="vertical-align: top; text-align: left">24</td>
<td style="vertical-align: top; text-align: left">10</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Ad hoc algorithms</td>
<td style="vertical-align: top; text-align: left">20, 29, 41, 47, 61, 63, 72, 92, 103, 104, 115, 129, 137, 140, 146, 149, 162, 166, 173, 183, 187, 188, 195</td>
<td style="vertical-align: top; text-align: left">23</td>
<td style="vertical-align: top; text-align: left">10</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Data mining</td>
<td style="vertical-align: top; text-align: left">7, 15, 53, 95, 110, 132, 141, 145, 152, 157, 164, 200, 217, 221, 231</td>
<td style="vertical-align: top; text-align: left">15</td>
<td style="vertical-align: top; text-align: left">6</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Qualitative analyses</td>
<td style="vertical-align: top; text-align: left">13, 125, 127, 144, 158, 160, 191, 218, 224</td>
<td style="vertical-align: top; text-align: left">9</td>
<td style="vertical-align: top; text-align: left">4</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Prediction</td>
<td style="vertical-align: top; text-align: left">10, 53, 134, 157, 190, 198, 203, 214</td>
<td style="vertical-align: top; text-align: left">8</td>
<td style="vertical-align: top; text-align: left">3</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Reverse engineering</td>
<td style="vertical-align: top; text-align: left">11, 73, 84, 133, 155, 180</td>
<td style="vertical-align: top; text-align: left">6</td>
<td style="vertical-align: top; text-align: left">3</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Heuristical techniques</td>
<td style="vertical-align: top; text-align: left">93, 119, 123, 124, 156, 172</td>
<td style="vertical-align: top; text-align: left">6</td>
<td style="vertical-align: top; text-align: left">3</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Testing-based techniques</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">40, 79, 117, 135</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">4</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">2</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The mapping indicates that the most common is “empirical/experimental” with 93 studies (38%) (see Table <xref rid="j_infor454_tab_008">8</xref>). This type of studies is a systematic, disciplined, quantifiable and controlled way to evaluate information and approaches against other existing ones and to know under which criteria they are better (Genero Bocco <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_041">2014</xref>). These methods include empirical studies, empirical evaluations, experimental studies, empirical analyses, case studies, systematic literature reviews, research strategy, etc.</p>
<p>The second most recurrent methods are tagged as “automatic” with 29 studies (12%) (see Table <xref rid="j_infor454_tab_008">8</xref>).These studies focus on using tool automation techniques to perform a specific task. For example, 67 (Dias <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_027">2015</xref>) proposes an automatic tool to untangle fine grain code changes in groups, allowing good results with an average success rate of 91%. The proposal of study 176 (Martínez-Torres <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_073">2013</xref>) develops an automatic categorization tool to extract text for the analysis of knowledge sharing activities in projects.</p>
<p>The third input is “artificial intelligence/machine learning” (AI/ML) with 29 studies (12%) (see Table <xref rid="j_infor454_tab_008">8</xref>). These studies use techniques that are relevant today as they have achieved a remarkable momentum that, if properly used, can meet the best expectations in many application sectors across the research field (Barredo Arrieta <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_012">2020</xref>). Due to this importance, we provide a specific classification within this category.</p>
<p>The vertiginous increase of artifacts using AI/ML techniques demonstrates the inclination of the software engineering community towards this branch. These are not isolated cases or fads (Harman, <xref ref-type="bibr" rid="j_infor454_ref_046">2012</xref>). Nowadays, the nature of software goes hand in hand with human intelligence; this is where AI/ML techniques are becoming a part of software, specifically in the field of code repositories.</p>
<p>The renewed interest and number of AI/ML techniques has led to many advances related to this field. For example, Bayesian statistics (Abdeen <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_002">2015</xref>), Convolutional Neural Network (Li <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_069">2019</xref>) and Random Forest Classifier (Maqsood <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_072">2017</xref>). These are used to understand bugs or make predictions of possible code changes, finding and predicting defects in a code repository or identifying code repository errors. Besides, it has been criticised that many of these approaches to building smarter software are too far from human-level intelligence and are therefore likely to be insufficient (Feldt <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_033">2018</xref>). This situation has focused on the need for less complex algorithms and tools to be integrated into the systems and solutions that are used by organizations.</p>
<table-wrap id="j_infor454_tab_009">
<label>Table 9</label>
<caption>
<p>Components of the category artificial intelligence/machine learning.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Artificial intelligence/machine learning</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Papers</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin"># studies</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">%</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Random Forest Classifier</td>
<td style="vertical-align: top; text-align: left">49, 50, 157, 175</td>
<td style="vertical-align: top; text-align: left">4</td>
<td style="vertical-align: top; text-align: left">14</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Natural Language Processing (NLP)</td>
<td style="vertical-align: top; text-align: left">126, 130, 178</td>
<td style="vertical-align: top; text-align: left">3</td>
<td style="vertical-align: top; text-align: left">10</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Bayesian classifier</td>
<td style="vertical-align: top; text-align: left">3, 205</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">7</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Search-based genetic algorithm</td>
<td style="vertical-align: top; text-align: left">33, 207</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">7</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Latent Dirichlet Allocation (LDA)</td>
<td style="vertical-align: top; text-align: left">87, 106</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">7</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Naive Bayes-based approach</td>
<td style="vertical-align: top; text-align: left">122, 166</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">7</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Artificial Intelligence</td>
<td style="vertical-align: top; text-align: left">4, 70</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">7</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Statistical learner</td>
<td style="vertical-align: top; text-align: left">203, 214</td>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">7</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Sentiments analysis tools</td>
<td style="vertical-align: top; text-align: left">69</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">3</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Deep model structur (convolutional Neural Network)</td>
<td style="vertical-align: top; text-align: left">158</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">3</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Rule-based technique</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">3</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">semantics-based methodology</td>
<td style="vertical-align: top; text-align: left">150</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">3</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">SGDClassifer</td>
<td style="vertical-align: top; text-align: left">142</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">3</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Machine learning techniques</td>
<td style="vertical-align: top; text-align: left">127</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">3</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Hoeffding tree classification method</td>
<td style="vertical-align: top; text-align: left">80</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">3</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Dynamic topic models</td>
<td style="vertical-align: top; text-align: left">108</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">3</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Naive bayes classifier</td>
<td style="vertical-align: top; text-align: left">182</td>
<td style="vertical-align: top; text-align: left">1</td>
<td style="vertical-align: top; text-align: left">3</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Gradient boosting machine</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">157</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">1</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">3</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Table <xref rid="j_infor454_tab_009">9</xref> shows the taxonomic subcategories that we have carried out based on (Baltrusaitis <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_011">2019</xref>) and (Agarwal <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_004">2019</xref>) for “Machine Learning” and (Gani <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_039">2016</xref>) for “Artificial Intelligence”. These techniques and methods aim to build models that can process and relate information from multiple sources. It is worth mentioning that these techniques have a growing importance and an extraordinary potential.</p>
<p>There exist examples where AI/ML models are applied to improve software development, specifically in the area of code repositories. For example, the study 87 (Fu <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_037">2015</xref>) uses the technique Latent Dirichlet Allocation (LDA) to extract information from the change messages of the repository to classify them in an automatic way. Another example is the study 127 (Joblin <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_053">2015</xref>) where a general approach is proposed for the automatic building of developer networks based on source code structure and commit information, obtained from a code repository that is applicable to a wide variety of software projects.</p>
<p>Other examples are the study 122 (Jiang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_052">2019</xref>) that uses a random forest classifier and naive bayes classifier together with the study 3 (Abdeen <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_002">2015</xref>) that uses a Bayesian classifier. Both studies use those classifiers as the main technique to process and analyse the input information and generate models to predict different aspects of the code repositories (change impact or code review, among others).</p>
<p>The proposed taxonomy aids the understanding and comprehension of AI/ML techniques used in code repository analysis.</p>
<p>Continuing with the taxonomy for (RQ2), other relevant techniques used in the selected studies are those related to “statistical analyses” appearing with 24 studies (10%) (see Table <xref rid="j_infor454_tab_008">8</xref>), this category groups different techniques such as “Micro-Productivity Profiles Method”, “Quantitative analysis”, “Models regression”, “Regression tree”, etc. Some examples of the application of these techniques are studies 89 (Gamalielsson and Lundell, <xref ref-type="bibr" rid="j_infor454_ref_038">2014</xref>) through a review quantitative analysis of project repository data in order to investigate the sustainability in OSS communities with a detailed analysis of developer communities, the authors of study 86 (Foucault <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_035">2015</xref>) provide a quantitative analysis of the rotation patterns and effects of developer that along with the activity of external newcomers, affect negatively the quality of the software; or the study 38 (Borges and Tulio Valente, <xref ref-type="bibr" rid="j_infor454_ref_014">2018</xref>) provides strong empirical quantitative evidence about the meaning of the number of stars in the code repository, recommending to monitor this metric to use it as a pattern for repository selection.</p>
<p>Following the classification we find the use of “ad hoc algorithms”, that are used in 23 studies, representing 9% (see Table <xref rid="j_infor454_tab_008">8</xref>). In these studies, specific algorithms are provided, for example, semantic slicing, gumtree, prediction partial matching, etc. These are applied, for example, for information analysis, as in the study 61 (Datta <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_022">2012</xref>) algorithms to determine social collaboration teams are employed. Also, the study 173 (Malheiros <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_071">2012</xref>) provided an algorithm to analyse change requests and recommend potentially relevant source code that will help the developer.</p>
<p>Another category is “data mining” with 15 studies (6%) (see Table <xref rid="j_infor454_tab_008">8</xref>). Data mining refers to the extraction or “mining” of knowledge from large volumes of data (Grossi <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_042">2017</xref>). Studies rely on these techniques (“Hierarchical agglomerative clustering”, “Information retrieval (IR)”, “Decision Tree”, “C4.5”, “Logistic Regression”, “k-Nearest Neighbour (k-NN)”, etc.) to process data from code repositories.</p>
<p>The rest of the studies add up to 15% of RQ2. Qualitative analyses with 9 studies (4%). “prediction” with 8 studies (3%), “reverse engineering with 6 studies (2%). “heuristical techniques” with 6 studies (2%). Finally, “testing-based techniques” with 4 studies (2%) (see Table <xref rid="j_infor454_tab_008">8</xref>).</p>
</sec>
<sec id="j_infor454_s_017">
<label>4.3</label>
<title>RQ3. What Information is Extracted (Directly) or Derived (Indirectly) as a Result of the Analysis of Source Code Repositories?</title>
<p>Having analysed the studies according to RQ3, the main output generated is information related to “developer behaviour” with 65 studies (26%) (see Table <xref rid="j_infor454_tab_010">10</xref>). Currently researchers have been motivated by the lack of research on developer-related social processes oriented to management, analysis, maintenance and teamwork (Gamalielsson and Lundell, <xref ref-type="bibr" rid="j_infor454_ref_038">2014</xref>) and we see this is reflected in the mapping study. The category groups different characteristics of the developer that can be extracted from the code repositories, for example, with the purpose of knowing developers’ patterns, developers’ sentiment classification, developer contribution analysis, developer social networks, development processes, etc. These outputs are eventually used to improve the maintenance and generally to comprehend the evolution of software.</p>
<p>For example, we can point out the study 186 (Murgia <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_075">2014</xref>) where emotion mining is performed, applied to developers’ problem reports, and it can be useful to identify and monitor the mood of the development team, which allows to anticipate and solve possible threats in their team. Another example is the study 130 (Kagdi1 <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_057">2014</xref>) that proposes an approach to recommend a classified list of expert developers to assist in the implementation of software change requests (e.g. bug reports and feature requests).</p>
<p>The second most recurrent output is “changes analysis” with 35 studies (14%) (see Table <xref rid="j_infor454_tab_010">10</xref>). These studies are interesting since the information extracted from these code changes can be used to predict future defects, analyse who should be assigned a particular task, obtain information on specific projects or measure the impact of the organizational structure on software quality (Herzig <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_048">2016</xref>).</p>
<table-wrap id="j_infor454_tab_010">
<label>Table 10</label>
<caption>
<p>Classification of selected papers by concern/topic, studies and percentage (RQ3).</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Output</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Papers</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin"># studies</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">%</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Developer Behaviour</td>
<td style="vertical-align: top; text-align: left">1, 5, 8, 11, 13, 17, 21, 26, 30, 36, 37, 38, 39, 40, 43, 48, 52, 55, 56, 60, 61, 64, 65, 66, 69, 71, 77, 89, 92, 99, 101, 116, 120, 121, 127, 130, 147, 151, 160, 161, 163, 169, 175, 177, 181, 186, 189, 190, 191, 192, 196, 200, 204, 208, 211, 215, 216, 220, 221, 222, 230, 231, 232, 233, 234</td>
<td style="vertical-align: top; text-align: left">65</td>
<td style="vertical-align: top; text-align: left">26</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Changes Analysis</td>
<td style="vertical-align: top; text-align: left">3, 4, 31, 32, 33, 42, 45, 51, 59, 63, 67, 70, 102, 104, 107, 108, 114, 118, 124, 136, 138, 155, 162, 171, 174, 179, 180, 184, 187, 194, 195, 217, 223, 229, 235</td>
<td style="vertical-align: top; text-align: left">35</td>
<td style="vertical-align: top; text-align: left">14</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Metrics/Quality</td>
<td style="vertical-align: top; text-align: left">23, 29, 58, 74, 78, 79, 80, 84, 90, 91, 105, 115, 128, 131, 145, 153, 164, 166, 168, 170, 185, 198, 199, 228</td>
<td style="vertical-align: top; text-align: left">24</td>
<td style="vertical-align: top; text-align: left">10</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Deffect/Issue Analysis</td>
<td style="vertical-align: top; text-align: left">10, 12, 15, 16, 27, 34, 41, 44, 84, 97, 113, 119, 134, 150, 167, 197, 198, 203, 205, 207, 224</td>
<td style="vertical-align: top; text-align: left">21</td>
<td style="vertical-align: top; text-align: left">9</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Source Code Improvements</td>
<td style="vertical-align: top; text-align: left">2, 9, 18, 19, 25, 49, 54, 63, 67, 72, 73, 95, 102, 110, 125, 133, 141, 164, 165, 202, 212</td>
<td style="vertical-align: top; text-align: left">21</td>
<td style="vertical-align: top; text-align: left">9</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Commits/Committers Classification</td>
<td style="vertical-align: top; text-align: left">6, 20, 46, 50, 87, 96, 98, 111, 126, 142, 146, 157, 176, 178, 182, 209, 210</td>
<td style="vertical-align: top; text-align: left">17</td>
<td style="vertical-align: top; text-align: left">7</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Cloning Detection</td>
<td style="vertical-align: top; text-align: left">22, 35, 81, 82, 85, 94, 112, 144, 149, 159, 188, 227, 236</td>
<td style="vertical-align: top; text-align: left">13</td>
<td style="vertical-align: top; text-align: left">5</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Maintenability Information</td>
<td style="vertical-align: top; text-align: left">7, 28, 57, 62, 68, 76, 83, 106, 143, 168, 226</td>
<td style="vertical-align: top; text-align: left">11</td>
<td style="vertical-align: top; text-align: left">4</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Design Modelling</td>
<td style="vertical-align: top; text-align: left">86, 88, 100, 103, 129, 148, 193, 206, 213, 218</td>
<td style="vertical-align: top; text-align: left">10</td>
<td style="vertical-align: top; text-align: left">4</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Commit Analysis</td>
<td style="vertical-align: top; text-align: left">14, 24, 47, 53, 122, 123, 137, 214, 219</td>
<td style="vertical-align: top; text-align: left">9</td>
<td style="vertical-align: top; text-align: left">4</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Automatic Processing</td>
<td style="vertical-align: top; text-align: left">109, 117, 126, 158, 172, 173, 183</td>
<td style="vertical-align: top; text-align: left">7</td>
<td style="vertical-align: top; text-align: left">3</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Code Review</td>
<td style="vertical-align: top; text-align: left">132, 139, 166, 201, 225</td>
<td style="vertical-align: top; text-align: left">5</td>
<td style="vertical-align: top; text-align: left">2</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Branching Analysis</td>
<td style="vertical-align: top; text-align: left">24, 75, 140, 152, 156</td>
<td style="vertical-align: top; text-align: left">5</td>
<td style="vertical-align: top; text-align: left">2</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Testing Data</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">93, 135, 154</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">3</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">1</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For example, the study 138 (Kirinuki <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_060">2014</xref>) proposes a technique to prevent “tangled changes” in which it is identified whether a developer’s changes are tangled and using the technique, developers can be made aware that their changes are potentially tangled and can be given the opportunity to commit the tangled changes separately. The study 187 (Negara <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_076">2014</xref>) presents an approach that identifies previously unknown frequent code change patterns of a sequence of fine-grained code changes.</p>
<p>The next output is tagged as “metrics/quality” with 24 studies (10%) (see Table <xref rid="j_infor454_tab_010">10</xref>). Software metrics and measurements are those processes or tools that include the assessment of the software product, project or process in order to obtain values that can help give indicators of one or more software attributes (Abuasad and Alsmadi, <xref ref-type="bibr" rid="j_infor454_ref_003">1994, (2012)</xref>). This category is made up of these specific outputs like change analysis, change contracts, change histories, change impact analysis, etc. To exemplify, the study 80 (Finlay <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_034">2014</xref>) describes the extraction of metrics from a repository and the application of data flow mining techniques to identify useful metrics to predict the success or failure of the construction. We can also mention the study 145 (Kumar <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_066">2018</xref>) that proposes the creation of an effective failure prediction tool by identifying and investigating the predictive capabilities of several well-known and widely used software metrics for failure prediction.</p>
<p>Then, the output “deffect/issue analysis” with 21 studies (9%) (see Table <xref rid="j_infor454_tab_010">10</xref>) groups studies focusing on repository data and are employed to provide software analytics and predict where defects might appear in the future (Rosen <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_085">2015</xref>). An example of this is the paper 207 (Rosen <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_085">2015</xref>), which presents a tool that performs analysis and predicts risks in software by performing commits. Alternatively, the study 97 (Gupta <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_044">2014</xref>) proposes a run-time process model for the error resolution process using a process mining tool and an analysis of the performance and efficiency of the process is performed.</p>
<p>Another category of importance is “Source Code Improvements” with 21 studies (9%) (see Table <xref rid="j_infor454_tab_010">10</xref>), grouped according to identifiers in source code, source code legibility, annotations, source code plagiarism detection, scope of source code comments, etc.</p>
<p>The output “Commits/Committers” with 17 studies (7%) (see Table <xref rid="j_infor454_tab_010">10</xref>) corresponds to studies that usually extract information to perform a classification of commit messages, change messages, committers or commits from the code repository.</p>
<p>The following output categories in RQ3 correspond to “Cloning Detection” with 13 studies (5%) and is made up of studies where information about code cloning that aims to detect all groups of code blocks or code fragments that are functionally equivalent in a code base is derived (Nishi and Damevski, <xref ref-type="bibr" rid="j_infor454_ref_077">2018</xref>). “Maintainability Information” with 11 studies (4%) is made up of studies on traceability, maintainability or technical debt. “Design Modelling” with 10 studies (4%) has studies that extract information on UML models, EMF patterns or patterns of social forking. “Commit Analysis” has 9 studies (4%), “Automatic Processing”, 7 studies (3%), “Code Review”, 5 studies (2%), “Branching Analysis”, 5 studies (2%) and “Testing Data”, 3 studies (1%) (see Table <xref rid="j_infor454_tab_010">10</xref>).</p>
<fig id="j_infor454_fig_006">
<label>Fig. 6</label>
<caption>
<p>Bubble graph intersecting research questions RQ1, RQ2 and RQ3.</p>
</caption>
<graphic xlink:href="infor454_g007.jpg"/>
</fig>
<p>Figure <xref rid="j_infor454_fig_006">6</xref> presents a bubble graph summarizing the combination of principal questions (RQ1, RQ2, RQ3) organized as the black box model, starting with the input, the method/technique and the output (Section <xref rid="j_infor454_s_007">3.1.1</xref>). The largest bubble (47 studies) represents studies that take as input the category “Source Code” and the methods or techniques for processing are “Empirical/Experimental”. After this, the second largest bubble (37 studies) represents that the information taken for the analysis is the “Commit Data” and the techniques, used for processing it, are equally empirical or experimental. On the other hand, the third bubble (37 studies) in terms of information extracted from the analysis of the repositories shows that it is used for “Developer Behaviour”. We observe that in the bubbles of “Automatic”, “Data Mining” and “Artificial Intelligence/Machine Learning” these techniques are used to process information and it has become an emerging field to process information from the repositories. Another interesting point to highlight is that all categories of both input and output use empirical/experimental techniques and methods to process information.</p>
<p>Analysing the data obtained from the SMS, we observe that the main trend in code repository research focuses on using empirical or experimental techniques (93 studies) in source code, code review and code repository commits to obtain results, especially related to developers’ analysis (67 studies). Research trends seem to gravitate towards analyses of code changes and the impact they have on software maintenance and evolution. Analyses, metrics, measurements and classification of developers’ feelings, efforts and contributions are the trends revealed by the SMS. Another marked trend in the research is the analysis of defects, issues and bugs present in the software and looking for patterns or ways to find these defects or predict them.</p>
</sec>
<sec id="j_infor454_s_018">
<label>4.4</label>
<title>RQ4. What Kind of Research Has Proliferated in this Field?</title>
<p>Figure <xref rid="j_infor454_fig_007">7</xref> describes the arrangement of the primary studies according to research questions RQ4 and RQ5. The definition of the kind of contribution for each paper was done alongside the data extraction procedure (Section <xref rid="j_infor454_s_011">3.1.5</xref>). There also may be papers that are present in more than one category to provide a solution. The main and central contribution for each paper was analysed for figuring out its classification. Regarding the nature of the research, the graph shows that the majority of studies (90%) provides solution proposals. A further 4% of studies are applied research. Two percent of studies are classified as validation research. Finally, remaining 4% are classified as evaluation research (1%), opinion articles (1%), personal experience articles (1%), philosophical articles (1%).</p>
<fig id="j_infor454_fig_007">
<label>Fig. 7</label>
<caption>
<p>Description the layout of the primary studies according to research questions RQ4 and RQ5.</p>
</caption>
<graphic xlink:href="infor454_g008.jpg"/>
</fig>
</sec>
<sec id="j_infor454_s_019">
<label>4.5</label>
<title>RQ5. Are Both Academia and Industry Interested in this Field?</title>
<p>With respect to industry interest in the field of research, Fig. <xref rid="j_infor454_fig_007">7</xref> shows that 97% of the selected studies are authored by at least one affiliate of a university or research centre. This percentage is very high, which allows us to know that researchers are interested in the different areas of code repositories. The classification that follows is “Both” with 3%, these studies have a mixed authorship between academia and industry.</p>
</sec>
<sec id="j_infor454_s_020">
<label>4.6</label>
<title>Quality Assessment</title>
<p>Finally, we used the instrument for quality assessment (Section <xref rid="j_infor454_s_010">3.1.4</xref>) with the primary studies. Figure <xref rid="j_infor454_fig_008">8</xref> shows the quality assessment of the seven assessment questions, in which the instrument is applied with its respective scale. AQ1 to AQ5 are questions that are evaluated in quantitative terms, while AQ6 and AQ7 are objective questions. The systematic method (AQ1) of the selected studies, which represents whether it is possible to replicate the methods and techniques systematically, resulted in the majority with high values (mostly evaluated as ‘5’), the other studies were rated between 3 and 4. Regarding the presentation of a result of the analysis of the code repository (AQ2), most of the studies (176) were rated as ‘5’ (see Fig. <xref rid="j_infor454_fig_008">8</xref>). This shows that the studies present some proposal or result of the analysis carried out. In terms of methods, tools or related aspects (AQ3), most of the studies obtained a value of ‘5’ (see Fig. <xref rid="j_infor454_fig_008">8</xref>). As for problems of quality, development or evolution of software (AQ4), the studies seek solutions through an artifact (tool, framework, methodology, etc.), and most of them were evaluated with ‘5’ (see Fig. <xref rid="j_infor454_fig_008">8</xref>). In relation to the proposals being able to be implemented in industrial environments (AQ5), they were evaluated with a high value (‘5’). This means that a study is considered replicable, but there are strong dependencies in terms of tools, software and configurations that should be considered (see Fig. <xref rid="j_infor454_fig_008">8</xref>).</p>
<fig id="j_infor454_fig_008">
<label>Fig. 8</label>
<caption>
<p>Summary of the quality assessment.</p>
</caption>
<graphic xlink:href="infor454_g009.jpg"/>
</fig>
<p>Finally, questions AQ6 and AQ7 objectively assess the citations and relevance of the conferences and journals in which the selected studies were published (see Fig. <xref rid="j_infor454_fig_008">8</xref>). Most studies have been referenced several times. Table <xref rid="j_infor454_tab_011">11</xref> shows the most cited documents. The most cited paper is the study 2 (Abdalkareem <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_001">2017</xref>) (143 times). That study focuses on providing an insight into the potential impact of reusing repository code in mobile applications, through an exploratory study where open source applications in a code repository are analysed. These results can benefit the research community in the development of new techniques and tools to facilitate and improve code reuse.</p>
<table-wrap id="j_infor454_tab_011">
<label>Table 11</label>
<caption>
<p>Most cited papers.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Study</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin"># Citations</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Year</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">AQ5</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">2</td>
<td style="vertical-align: top; text-align: left">143</td>
<td style="vertical-align: top; text-align: left">2017</td>
<td style="vertical-align: top; text-align: left">5</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">186</td>
<td style="vertical-align: top; text-align: left">72</td>
<td style="vertical-align: top; text-align: left">2014</td>
<td style="vertical-align: top; text-align: left">5</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">25</td>
<td style="vertical-align: top; text-align: left">69</td>
<td style="vertical-align: top; text-align: left">2014</td>
<td style="vertical-align: top; text-align: left">5</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">9</td>
<td style="vertical-align: top; text-align: left">66</td>
<td style="vertical-align: top; text-align: left">2013</td>
<td style="vertical-align: top; text-align: left">5</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">61</td>
<td style="vertical-align: top; text-align: left">62</td>
<td style="vertical-align: top; text-align: left">2012</td>
<td style="vertical-align: top; text-align: left">5</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">187</td>
<td style="vertical-align: top; text-align: left">46</td>
<td style="vertical-align: top; text-align: left">2014</td>
<td style="vertical-align: top; text-align: left">5</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">130</td>
<td style="vertical-align: top; text-align: left">43</td>
<td style="vertical-align: top; text-align: left">2012</td>
<td style="vertical-align: top; text-align: left">5</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">165</td>
<td style="vertical-align: top; text-align: left">42</td>
<td style="vertical-align: top; text-align: left">2012</td>
<td style="vertical-align: top; text-align: left">5</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">89</td>
<td style="vertical-align: top; text-align: left">42</td>
<td style="vertical-align: top; text-align: left">2014</td>
<td style="vertical-align: top; text-align: left">5</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">159</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">40</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">2012</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">5</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In addition to the analysis of the most cited articles, we have carried out a social network analysis (SNA), which allows us to generate a graph and identify the main groups of authors together with the most relevant authors in the area (Franco-Bedoya <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_036">2017</xref>). The methodology used for our analysis is based on (Wang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_092">2016</xref>). The analysis is done with our SMS studies and is the main column of the network.</p>
<p>In the co-citation analysis, a matrix is compiled by retrieving the quotation counts of each pair of the important documents that were identified in the citation analysis, and a major component of the factor analysis is to reveal the knowledge clusters of the code repository research (Wang <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_092">2016</xref>).</p>
<p>We use the VOSviewer software that enables sophisticated cluster analysis without the need for in-depth knowledge of clusters and without the need for advanced computer skills (van Eck and Waltman, <xref ref-type="bibr" rid="j_infor454_ref_090">2017</xref>).</p>
<fig id="j_infor454_fig_009">
<label>Fig. 9</label>
<caption>
<p>Relationship between authors and research knowledge groups in code repositories.</p>
</caption>
<graphic xlink:href="infor454_g010.jpg"/>
</fig>
<p>In Fig. <xref rid="j_infor454_fig_009">9</xref>, the size of a cluster reflects the number of papers belonging to the cluster. Larger clusters include more publications. The distance between two clusters roughly indicates the relationship of the clusters in terms of citations. The clusters that are close to each other tend to be strongly related in terms of co-citations, while clusters that are farther apart tend to be less related (van Eck and Waltman, <xref ref-type="bibr" rid="j_infor454_ref_090">2017</xref>). The curved lines between clusters also reflect the relationship between them, and the thickness of a line represents the number of citations between two clusters. VOSviewer has its own clustering technique (Waltman <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_091">2010</xref>). This clustering technique was used to divide in 14 clusters with four main branches. This was done based on the citation relationships between the analysed studies. In Fig. <xref rid="j_infor454_fig_009">9</xref>, each cluster has a colour indicating the group to which the cluster was assigned. Thus, a breakdown of papers concerning code repositories into broad subfields is obtained. A rough interpretation can be depicted as follows: The cluster in the down-left corner that becomes a branch (green, orange and pink nodes) seems to cover research about changes analysis, maintenance and code review to maintain software quality. The branch on the down-right (purple, red and brown nodes) seems to cover the research about code changes, commit analysis, automatic processing by focusing on bugs and software defects. The branch in top-left corner (blue and pink nodes) might be related to research in source code improvements and metrics/quality. Finally, the top-right branch (light blue and green) is more related to research in the field of defect/issue analysis, maintainability information and developer behaviour.</p>
</sec>
</sec>
<sec id="j_infor454_s_021">
<label>5</label>
<title>Discussion</title>
<p>This section presents the main results obtained through SMS for research and industry.</p>
<sec id="j_infor454_s_022">
<label>5.1</label>
<title>Principal Findings</title>
<p>The main research question and the reason for this SMS was to know the information that is extracted from the code repositories along with the methods and tools to process it and the output that is obtained from this process. Our SMS has scrutinized and schematized this field of research and determined its current status by analysing, evaluating, and understanding the research to date in relation to extraction, methods/tools and output generated from code repositories. The main findings are: 
<list>
<list-item id="j_infor454_li_014">
<label>•</label>
<p><bold>F1</bold>. The research field regarding code repository analysis is in the process of improving its matureness. Researchers have worked in this discipline very hard in the last 10 years with several different proposals with some evidence. However, most of them have not been extensively applied in the industry. In addition, most of the papers have been published in high impact conferences and journals. Therefore, several objectives have been found to be covered by these research proposals. As a result, many of the proposals turn out to be innovative and built on previous research.</p>
</list-item>
<list-item id="j_infor454_li_015">
<label>•</label>
<p><bold>F2</bold>. The authors consider several methods/techniques for the analysis of information obtained from code repositories. The collected studies point out that there are different techniques and methods from other research areas that can be applied in the analysis of information extracted from code repositories. We can detect some recurrent patterns. For example, the most recurrent techniques are those related to empirical or experimental analyses which are present for various inputs and outputs. Another insight is the extensive use of artificial intelligence and, more specifically, machine learning to analyse the information extracted from code repositories with good results. Finally, some tools and techniques combine some automatic processes with other sources to achieve the research goal.</p>
</list-item>
<list-item id="j_infor454_li_016">
<label>•</label>
<p><bold>F3</bold>. The selected proposals contribute to the understanding of software quality and evolution. Several methods, techniques and tools have been found for the process of analysis of information extracted from code repositories, their application in the industry is given in a minimum measure and poor ascent, as it is demonstrated by some studies that give viability through empirical results. Although there are studies that go hand in hand between industry and academia, few initiatives are found in digital libraries.</p>
</list-item>
<list-item id="j_infor454_li_017">
<label>•</label>
<p><bold>F4</bold>. The output obtained from the analysis of information from code repositories focuses on most studies to investigate the developer, such as classifying it or finding patterns of feelings (like through sentiment analysis) that infer from the coding. In summary, this analysis allows the developer to know the quality and evolution of the software beyond counting and measuring the lines of code and focuses on the human factor as a fundamental part within the software development.</p>
</list-item>
<list-item id="j_infor454_li_018">
<label>•</label>
<p><bold>F5</bold>. Finally, another important output obtained from the analysis of information from code repositories is the analysis of changes. These studies focus on obtaining the impact of changes in the software code that is developed to try to find error patterns and to be able to make predictions of possible failures in the code before they occur. These changes in the code can greatly influence the quality and maintenance of the software, as well as have serious repercussions on costs and developers of the software project.</p>
</list-item>
</list>
</p>
</sec>
<sec id="j_infor454_s_023">
<label>5.2</label>
<title>Implications for Researchers and Practitioners</title>
<p>The main findings presented above have some implications for researchers and practitioners working in the industry who research code repositories. For the academic world, the most used inputs for information analysis are mostly source code and commit data, the categories that have less amount of studies are an area of research to be explored. As for the methods and techniques used for the analysis of information in the repository, a wide variety of tools, techniques and methods are used (see Fig. <xref rid="j_infor454_fig_007">7</xref>), especially most approaches focus on empirical studies or artificial intelligence, which provides different approaches to information processing, and most studies seek to improve data processing to meet the studied objectives. As shown in Fig. <xref rid="j_infor454_fig_003">3</xref>, research involving analysis of information from code repositories is increasing every year. Therefore, it becomes a wide area of future research. Another important implication for researchers is that most of the proposed methods and techniques require several tools and software to replicate the studies. This makes these techniques somewhat complex to replicate in the industry. Finally, researchers should also focus on how the obtained information is applied to solve problems of software quality and evolution, which is the important point for both academia and industry and therefore seeks to improve software development.</p>
</sec>
<sec id="j_infor454_s_024">
<label>5.3</label>
<title>Evaluation of Validity</title>
<p>In this section we discuss the limitations of our systematic mapping study, based on the types of validity proposed by Petersen and Gencel (<xref ref-type="bibr" rid="j_infor454_ref_082">2013</xref>), which we describe below:</p>
<sec id="j_infor454_s_025">
<label>5.3.1</label>
<title>Descriptive Validity</title>
<p>“Descriptive validity refers to threats to the ability to capture and accurately represent the observations made” (Badampudi <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_008">2016</xref>). In an MSS, the main objective is to obtain available studies without any research bias. To avoid bias, we applied a review protocol, which was evaluated and approved by the authors as a means of quality assurance. As mentioned above, SMS guidelines of Kitchenham <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_064">2011</xref>) and Petersen <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_084">2015</xref>) were important for optimizing internal validity. The authors of this SMS double-checked the results of the selection procedure. These researchers took a random sample of 50% of the studies selected by the primary author and applied the inclusion/exclusion criteria for the selection procedure. To reduce the threats to data extraction, we created a form and used it in the pilot study (see Appendix <xref rid="j_infor454_app_002">B</xref>), which was validated by researchers. As well as Badampudi <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_008">2016</xref>), the text of the selected primary studies was highlighted and marked, which made it easy to consult the document if a review was required. Finally, during the data extraction process, the researchers conducted several focus group sessions to discuss any potential controversies regarding quality assessment.</p>
</sec>
<sec id="j_infor454_s_026">
<label>5.3.2</label>
<title>Theoretical Validity</title>
<p>Uncertainty of some factors by the author may affect theoretical validity (Badampudi <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_008">2016</xref>). In the case of our SMS, one of the main threats was the use of multiple terms and classifications to refer to code repositories. We mitigated this threat by using a synonym term defined in the search string, which was validated in the pilot search.</p>
</sec>
<sec id="j_infor454_s_027">
<label>5.3.3</label>
<title>Generalizability</title>
<p>There are several limitations that can affect our SMS, this is the generalization presented by Petersen and Gencel (<xref ref-type="bibr" rid="j_infor454_ref_082">2013</xref>) and a distinction between internal and external generality. As far as systematic mapping is concerned, internal capacity is not a major threat (Petersen and Gencel, <xref ref-type="bibr" rid="j_infor454_ref_082">2013</xref>), we believe that the most important limitation in our SMS is publication bias, because it is not possible to extract all the studies published in this area of research. We mitigate this threat by using five digital scientific databases considered relevant to software engineering recommended by Kuhrmann <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_065">2017</xref>) as sources for study extraction. The consulted databases do not cover certain digital material that could be useful and relevant to our research, for example “technical reports, blog entries and video presentations” (Laukkanen <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_067">2017</xref>). Thus, this aspect is a strong limitation in research of code repositories in terms of research done by industry that is hardly ever published publicly. This limitation is reflected in low authorship between academia and industry. Anyway, this does not prove that the industry is not interested in analysing code repositories.</p>
<p>External generalizability measures the ability to generalize results, that is, the extent to which the results reported in a publication can be generalized in other contexts (Munir <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_074">2016</xref>; Wohlin <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_094">2012</xref>). In this regard, the main threat to external generalizability refers to our subjectivity in selecting, classifying, and understanding the point of view of the original authors of the studied works. A misperception or misunderstanding by us of a given paper may have led to a misclassification of the study. To minimize the chances of this, we apply a quality assurance system (Section <xref rid="j_infor454_s_010">3.1.4</xref>). In addition, as we present the review protocol in detail in Section <xref rid="j_infor454_s_005">3</xref>, our mapping is intended to be reliable to other investigators in terms of the search strategy, inclusion/exclusion criteria, and applied data extraction (Borg <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_013">2014</xref>).</p>
</sec>
</sec>
<sec id="j_infor454_s_028">
<label>5.4</label>
<title>Interpretive Validity</title>
<p>Interpretive validity is achieved when the drawn conclusions are reasonable from the data obtained and lead to the validation of the mapping (Petersen and Gencel, <xref ref-type="bibr" rid="j_infor454_ref_082">2013</xref>). The main threat is author bias in the interpretation of the data. To mitigate this, the discussion groups and the classification process of the primary study selection were carried out. The researchers participated in various meetings to analyse and interpret the obtained data, in which their conclusions were discussed, and they made sure to maintain the same criteria.</p>
</sec>
<sec id="j_infor454_s_029">
<label>5.5</label>
<title>Reliability</title>
<p>Repeatability is the ability of other researchers to replicate the results. To achieve reliability, research steps must be repeatable (Badampudi <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_008">2016</xref>). Detailed measures adopted in searches are limitations to theoretical validity and may lead to a lack of reporting capacity (Munir <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_074">2016</xref>). For example, the used search strings and databases extract the sought information, due to the documented inclusion/exclusion criteria, which increases reliability.</p>
<p>There is always a risk of losing primary studies with only one search string for all selected databases (Cosentino <italic>et al.</italic>, <xref ref-type="bibr" rid="j_infor454_ref_020">2017</xref>). Therefore, a preliminary test with several versions of search strings was performed in the pilot search.</p>
<p>In addition, the inclusion/exclusion criteria were defined as in Genero <italic>et al.</italic> (<xref ref-type="bibr" rid="j_infor454_ref_040">2011</xref>). Collection of as many articles as possible aligned with the theme of the code repository. Repeatability of data extraction is important. We mitigated this threat by extracting and sorting the gathered papers in focus group sessions in which all researchers participated. The steps and information of our research are documented and published at <uri>https://GitHub.com/jaimepsayago/SMS</uri> including tables, graphs and the corresponding tables and analyses found in the document. In addition, all studies were classified according to the criteria of rigour and relevance adapted from Ivarsson and Gorschek (<xref ref-type="bibr" rid="j_infor454_ref_050">2011</xref>). This facilitates the traceability and repeatability of our study.</p>
</sec>
</sec>
<sec id="j_infor454_s_030">
<label>6</label>
<title>Conclusions</title>
<p>In this work, we conducted a SMS of the research published in the last eight years of five digital libraries. Through an extensive search and a systematic process, which has not been done in other similar studies, we have extracted and analysed data from more than 3700 papers, from which 236 documents have been selected. Relevant papers have been systematically analysed for answering the questions posed in this research.</p>
<p>This study reveals some trends in the current use of the evolving software coding and the massive use of code repositories as a platform for software development. These projects can range from an academic practice to large enterprise software projects. This allows us to analyse the information from these repositories, such as obtaining patterns, metrics and predictions in software development.</p>
<p>We believe that the conducted research is useful for developers working on software development projects that seek to improve maintenance and understand the evolution of software through the usage and analysis of the code repositories.</p>
<p>One important contribution is that we have defined a taxonomy that was divided according to input, method and output of the analysed proposals. Through this mapping study, we have identified the main information inputs used within code repositories that are commonly analysed: source code and commit information (RQ1).</p>
<p>A wide variety of tools and methods were used for the processing of information extracted from the code repository, especially most studies focus on using empirical and other experimental analyses, but also researchers are aligned to different approaches to information processing used in other fields of research such as artificial intelligence, with a special mention to machine learning. Together with these, data mining and other automatic techniques are employed to improve data processing in code repositories to meet the investigated objectives (RQ2).</p>
<p>Our analysis also raises the type of information derived from the processing of information from the code repository. In this sense, most studies are focused on investigating the developer behaviour or change analysis (RQ3). The analysis of the developer behaviour allows to know the quality and evolution of the software beyond counting and measuring the lines of code and focuses on the most important factor in software development. Meanwhile, the change analysis focuses on obtaining the impact of changes in the code of the software being developed, in order to try to find patterns of errors and to be able to make predictions of possible failures in the code.</p>
<p>In future work, we will focus on investigating the areas that have not yet been taken into consideration and that were identified in this systematic mapping study. We will attempt to directly research about artifacts for developer analysis. Finally, we will focus our research efforts on the analysis, measurement or testing of artifacts to determine and predict the impact on software quality from developer sentiments.</p>
</sec>
</body>
<back>
<app-group>
<app id="j_infor454_app_001"><label>A</label>
<title>Search Strings</title>
<p>This appendix shows the search strings with specific syntax for the digital libraries used in the systematic mapping study (Table <xref rid="j_infor454_tab_012">12</xref>).</p>
<table-wrap id="j_infor454_tab_012">
<label>Table 12</label>
<caption>
<p>Concrete syntax of the search string for each digital library.</p>
</caption>
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Source</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Search String</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Scopus</td>
<td style="vertical-align: top; text-align: left">TITLE-ABS-KEY ((“code repository” OR “software repository” OR “version control system” OR “GIT” OR “SVN”) AND (“analysis” OR “inspection” OR “mining” OR “exploring”)) AND (LIMIT-TO (DOCTYPE, “ar”) OR LIMIT-TO (DOCTYPE, “cp”) OR LIMIT-TO (DOCTYPE, “cr”) OR LIMIT-TO (DOCTYPE, “re”) OR LIMIT-TO (DOCTYPE, “ch”) OR LIMIT-TO (DOCTYPE, “bk”)) AND (LIMIT-TO (SUBJAREA, “COMP”) OR LIMIT-TO (SUBJAREA, “ENGI”) OR LIMIT-TO (SUBJAREA, “MATH”) OR LIMIT-TO (SUBJAREA, “DECI”)) AND (LIMIT-TO (PUBYEAR, 2020) OR LIMIT-TO (PUBYEAR, 2019) OR LIMIT-TO (PUBYEAR, 2018) OR LIMIT-TO (PUBYEAR, 2017) OR LIMIT-TO (PUBYEAR, 2016) OR LIMIT-TO (PUBYEAR, 2015) OR LIMIT-TO (PUBYEAR, 2014) OR LIMIT-TO (PUBYEAR, 2013) OR LIMIT-TO (PUBYEAR, 2012)) AND (LIMIT-TO (LANGUAGE, “English”))</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">IEEE Xplore</td>
<td style="vertical-align: top; text-align: left">(((“Document Title”:”code repository” OR “software repository” OR “version control system” OR git OR svn AND analysis) AND “Abstract”:”code repository” OR “software repository” OR “version control system” OR git OR svn AND analysis) AND “Author Keywords”:”code repository” OR “software repository” OR “version control system” OR git OR svn AND analysis)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">ACM Digital Library</td>
<td style="vertical-align: top; text-align: left">“query”: {acmdlTitle:(code repository software repository git svn) AND acmdlTitle:(analysis inspection mining exploring) AND recordAbstract:(code repository software repository git svn) AND recordAbstract:(analysis inspection mining exploring) AND keywords.author.keyword:(code repository software repository git svn) AND keywords.author.keyword:(analysis inspection mining exploring) }”filter”: {”publicationYear”:{ “gte”:2012, “lte”:2019 }},{owners.owner = HOSTED}</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Science Direct</td>
<td style="vertical-align: top; text-align: left">(“code repository” OR “software repository” OR “version control system” OR git OR svn) AND (analysis OR inspection OR mining OR exploring)</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">ISI Web of Science</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">TS = (“code repository” OR “software repository” OR “version control system” OR git OR svn) AND TS = (analysis OR inspection OR mining OR exploring)</td>
</tr>
</tbody>
</table>
</table-wrap>
</app>
<app id="j_infor454_app_002"><label>B</label>
<title>Data Extraction Form</title>
<table-wrap id="j_infor454_tab_013">
<table>
<thead>
<tr>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">Information</td>
<td style="vertical-align: top; text-align: left; border-top: solid thin; border-bottom: solid thin">RQ/AQ</td>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top; text-align: left">Meta-Information</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"> number</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"> Author</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"> Title</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"> Abstract</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"> Keywords</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"> Conference/Journal</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"> Year</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"> Reference Type</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"> DOI</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"> Tracking information about the selection of primary studies</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">…</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Classification</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"> Type Information Extract</td>
<td style="vertical-align: top; text-align: left">RQ1</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"> Methods/Techniques</td>
<td style="vertical-align: top; text-align: left">RQ2</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"> Type Information Result</td>
<td style="vertical-align: top; text-align: left">RQ3</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"> Type of Research</td>
<td style="vertical-align: top; text-align: left">RQ4</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"> Industry/Academia</td>
<td style="vertical-align: top; text-align: left">RQ5</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Journal and Conference Relevance/Citations</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">ERA</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">QUALIS</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">JCR</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Q-JCR</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">cited by (*Scopus)</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Score</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Quality Assessment</td>
<td style="vertical-align: top; text-align: left"/>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Information Extract</td>
<td style="vertical-align: top; text-align: left">AQ1</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Information Result</td>
<td style="vertical-align: top; text-align: left">AQ2</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Methods/Techniques</td>
<td style="vertical-align: top; text-align: left">AQ3</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Solution Problem</td>
<td style="vertical-align: top; text-align: left">AQ4</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left">Application</td>
<td style="vertical-align: top; text-align: left">AQ5</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left"><italic>Q</italic>-cited</td>
<td style="vertical-align: top; text-align: left">AQ6</td>
</tr>
<tr>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">Relevance of Conference/Journal</td>
<td style="vertical-align: top; text-align: left; border-bottom: solid thin">AQ7</td>
</tr>
</tbody>
</table>
</table-wrap>
</app>
<app id="j_infor454_app_003"><label>C</label>
<title>Selected Primary Studies</title>
<p>It is possible to view the select primary studies obtained to SMS in the next link <uri>https://GitHub.com/jaimepsayago/SMS</uri>.</p></app></app-group>
<ref-list id="j_infor454_reflist_001">
<title>References</title>
<ref id="j_infor454_ref_001">
<mixed-citation publication-type="journal"><string-name><surname>Abdalkareem</surname>, <given-names>R.</given-names></string-name> <string-name><surname>Shihaba</surname>, <given-names>E.</given-names></string-name>, <string-name><surname>Rillingb</surname>, <given-names>J.</given-names></string-name> (<year>2017</year>). <article-title>On code reuse from StackOverflow: an exploratory study on Android apps</article-title>. <source>Information and Software Technology</source>, <volume>88</volume>, <fpage>148</fpage>–<lpage>158</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.infsof.2017.04.005" xlink:type="simple">https://doi.org/10.1016/j.infsof.2017.04.005</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_002">
<mixed-citation publication-type="journal"><string-name><surname>Abdeen</surname>, <given-names>H.</given-names></string-name> <string-name><surname>Bali</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Sahraoui</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Dufour</surname>, <given-names>B.</given-names></string-name> (<year>2015</year>). <article-title>Learning dependency-based change impact predictors using independent change histories</article-title>. <source>Information and Software Technology</source>, <volume>67</volume>, <fpage>220</fpage>–<lpage>235</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.infsof.2015.07.007" xlink:type="simple">https://doi.org/10.1016/j.infsof.2015.07.007</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_003">
<mixed-citation publication-type="chapter"><string-name><surname>Abuasad</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Alsmadi</surname>, <given-names>I.M.</given-names></string-name> (<year>1994, (2012)</year>). <chapter-title>The correlation between source code analysis change recommendations and software metrics</chapter-title>. In: <source>ICICS ’12: Proceedings of the 3rd International Conference on Information and Communication Systems</source>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1145/2222444.2222446" xlink:type="simple">https://doi.org/10.1145/2222444.2222446</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_004">
<mixed-citation publication-type="chapter"><string-name><surname>Agarwal</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Husain</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Saini</surname>, <given-names>P.</given-names></string-name> (<year>2019</year>). <chapter-title>Next generation noise and affine invariant video watermarking scheme using Harris feature extraction</chapter-title>. In: <source>Third International Conference, ICACDS 2019, Ghaziabad, India, April 12–13, 2019, Revised Selected Papers, Part II, Advances in Computing and Data Sciences</source>. <publisher-name>Springer</publisher-name>, <publisher-loc>Singapore</publisher-loc>, pp. <fpage>655</fpage>–<lpage>665</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/978-981-13-9942-8" xlink:type="simple">https://doi.org/10.1007/978-981-13-9942-8</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_005">
<mixed-citation publication-type="journal"><string-name><surname>de Almeida Biolchini</surname>, <given-names>J.C.</given-names></string-name>, <string-name><surname>Mian</surname>, <given-names>P.G.</given-names></string-name>, <string-name><surname>Natali</surname>, <given-names>A.C.C.</given-names></string-name>, <string-name><surname>Conte</surname>, <given-names>T.U.</given-names>,</string-name> <string-name><surname>Travassos</surname>, <given-names>G.H.</given-names></string-name> (<year>2007</year>). <article-title>Scientific research ontology to support systematic review in software engineering</article-title>. <source>Advanced Engineering Informatics</source>, <volume>21</volume>(<issue>2</issue>), <fpage>133</fpage>–<lpage>151</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.aei.2006.11.006" xlink:type="simple">https://doi.org/10.1016/j.aei.2006.11.006</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_006">
<mixed-citation publication-type="chapter"><string-name><surname>Amann</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Beyer</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Kevic</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Gall</surname>, <given-names>H.</given-names></string-name> (<year>2015</year>). <chapter-title>Software mining studies: goals, approaches, artifacts, and replicability</chapter-title>. In: <string-name><surname>Meyer</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Nordio</surname>, <given-names>M.</given-names></string-name> (Eds.), <source>Software Engineering. LASER 2013, LASER 2014</source>, <series>Lecture Notes in Computer Science</series>, Vol. <volume>8987</volume>. <publisher-name>Springer</publisher-name>, <publisher-loc>Cham</publisher-loc>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/978-3-319-28406-4_5" xlink:type="simple">https://doi.org/10.1007/978-3-319-28406-4_5</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_007">
<mixed-citation publication-type="journal"><string-name><surname>Arora</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Garg</surname>, <given-names>A.</given-names></string-name> (<year>2018</year>). <article-title>Analysis of software repositories using process mining</article-title>. <source>Smart Computing and Informatics Smart Innovation, Systems and Technologies</source>, <volume>78</volume>, <fpage>637</fpage>–<lpage>643</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/978-981-10-5547-8_65" xlink:type="simple">https://doi.org/10.1007/978-981-10-5547-8_65</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_008">
<mixed-citation publication-type="journal"><string-name><surname>Badampudi</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Wohlin</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Petersen</surname>, <given-names>K.</given-names></string-name> (<year>2016</year>). <article-title>Software component decision-making: in-house, OSS, COTS or outsourcing – a systematic literature review</article-title>. <source>Journal of Systems and Software</source>, <volume>121</volume>, <fpage>105</fpage>–<lpage>124</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.jss.2016.07.027" xlink:type="simple">https://doi.org/10.1016/j.jss.2016.07.027</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_009">
<mixed-citation publication-type="chapter"><string-name><surname>Bailey</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Budgen</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Turner</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Kitchenham</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Brereton</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Linkman</surname>, <given-names>S.</given-names></string-name> (<year>2007</year>). <chapter-title>Evidence relating to object-oriented software design: a survey</chapter-title>. In: <source>Proceedings of the First International Symposium on Empirical Software Engineering and Measurement</source>. <publisher-name>IEEE Computer Society</publisher-name>, <publisher-loc>USA</publisher-loc>, pp. <fpage>482</fpage>–<lpage>484</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/ESEM.2007.46" xlink:type="simple">https://doi.org/10.1109/ESEM.2007.46</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_010">
<mixed-citation publication-type="chapter"><string-name><surname>Ball</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>J.</given-names></string-name>, <string-name><surname>Siy</surname> <given-names>H.P.</given-names></string-name>, (<year>1997</year>). <article-title>If your version control system could talk</article-title>. In: <source>ICSE Workshop on Process Modelling and Empirical Studies of Software Engineering</source> <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1.1.48.910" xlink:type="simple">https://doi.org/10.1.1.48.910</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_011">
<mixed-citation publication-type="journal"><string-name><surname>Baltrusaitis</surname>, <given-names>T.</given-names></string-name> <string-name><surname>Ahuja</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Morency</surname>, <given-names>L.</given-names></string-name> (<year>2019</year>). <article-title>Multimodal machine learning: a survey and taxonomy</article-title>. <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>, <volume>41</volume>(<issue>2</issue>), <fpage>423</fpage>–<lpage>443</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/TPAMI.2018.2798607" xlink:type="simple">https://doi.org/10.1109/TPAMI.2018.2798607</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_012">
<mixed-citation publication-type="journal"><string-name><surname>Barredo Arrieta</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Díaz-Rodríguez</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Del Ser</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Bennetot</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Tabik</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Barbado</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Garcia</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Gil-Lopez</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Molina</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Benjamins</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Chatila</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Herrera</surname>, <given-names>F.</given-names></string-name> (<year>2020</year>). <article-title>Explainable Explainable Artificial Intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI</article-title>. <source>Information Fusion</source>, <volume>58</volume>, <fpage>82</fpage>–<lpage>115</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.inffus.2019.12.012" xlink:type="simple">https://doi.org/10.1016/j.inffus.2019.12.012</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_013">
<mixed-citation publication-type="journal"><string-name><surname>Borg</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Runeson</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Ardö</surname> <given-names>A.</given-names></string-name> (<year>2014</year>). <article-title>Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability</article-title>. <source>Empirical Software Engineering</source>, <volume>19</volume>(<issue>6</issue>), <fpage>1565</fpage>–<lpage>1616</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s10664-013-9255-y" xlink:type="simple">https://doi.org/10.1007/s10664-013-9255-y</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_014">
<mixed-citation publication-type="journal"><string-name><surname>Borges</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Tulio Valente</surname>, <given-names>M.</given-names></string-name> (<year>2018</year>). <article-title>What’s in a GitHub Star? Understanding repository starring practices in a social coding platform</article-title>. <source>Journal of Systems and Software</source>, <volume>146</volume>, <fpage>112</fpage>–<lpage>129</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.jss.2018.09.016" xlink:type="simple">https://doi.org/10.1016/j.jss.2018.09.016</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_015">
<mixed-citation publication-type="journal"><string-name><surname>Cavalcanti</surname>, <given-names>Y.C.</given-names></string-name> <string-name><surname>da Mota Silveira Neto</surname>, <given-names>P.A.</given-names></string-name>, <string-name><surname>do Carmo Machado</surname>, <given-names>I.</given-names></string-name>, <string-name><surname>Vale</surname>, <given-names>T.F.</given-names></string-name>, <string-name><surname>de Almeida</surname>, <given-names>E.S.</given-names></string-name>, <string-name><surname>de Lemos Meira</surname>, <given-names>S.R.</given-names></string-name> (<year>2014</year>). <article-title>Challenges and opportunities for software change request repositories: a systematic mapping study</article-title>. <source>Journal of Software: Evolution and Process</source>, <volume>26</volume>(<issue>7</issue>), <fpage>620</fpage>–<lpage>653</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1002/smr.1639" xlink:type="simple">https://doi.org/10.1002/smr.1639</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_016">
<mixed-citation publication-type="journal"><string-name><surname>Chahal</surname>, <given-names>K.K.</given-names></string-name>, <string-name><surname>Saini</surname>, <given-names>M.</given-names></string-name> (<year>2016</year>). <article-title>Open source software evolution: a systematic literature review (Part 1)</article-title>. <source>International Journal of Open Source Software and Processes</source>, <volume>7</volume>(<issue>1</issue>), <fpage>1</fpage>–<lpage>27</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.4018/IJOSSP.2016010101" xlink:type="simple">https://doi.org/10.4018/IJOSSP.2016010101</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_017">
<mixed-citation publication-type="chapter"><string-name><surname>Chaturvedi</surname>, <given-names>K.K.</given-names></string-name>, <string-name><surname>Sing</surname>, <given-names>V.B.</given-names></string-name>, <string-name><surname>Singh</surname>, <given-names>P.</given-names></string-name> (<year>2013</year>). <chapter-title>Tools in mining software repositories</chapter-title>. In: <source>Proceedings of the 2013 13th International Conference on Computational Science and Its Applications, ICCSA 2013</source>, pp. <fpage>89</fpage>–<lpage>98</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/ICCSA.2013.22" xlink:type="simple">https://doi.org/10.1109/ICCSA.2013.22</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_018">
<mixed-citation publication-type="journal"><string-name><surname>Chen</surname>, <given-names>T.H.</given-names></string-name> <string-name><surname>Thomas</surname>, <given-names>S.W.</given-names></string-name>, <string-name><surname>Hassan</surname>, <given-names>A.E.</given-names></string-name> (<year>2016</year>). <article-title>A survey on the use of topic models when mining software repositories</article-title>. <source>Empirical Software Engineering</source>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s10664-015-9402-8" xlink:type="simple">https://doi.org/10.1007/s10664-015-9402-8</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_019">
<mixed-citation publication-type="journal"><string-name><surname>Cornelissen</surname>, <given-names>B.</given-names></string-name> <string-name><surname>Zaidman</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>van Deursen</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Moonen</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Koschke</surname>, <given-names>R.</given-names></string-name> (<year>2009</year>). <article-title>A systematic survey of program comprehension through dynamic analysis</article-title>. <source>IEEE Transactions on Software Engineering</source>, <volume>35</volume>(<issue>5</issue>), <fpage>684</fpage>–<lpage>702</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/TSE.2009.28" xlink:type="simple">https://doi.org/10.1109/TSE.2009.28</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_020">
<mixed-citation publication-type="journal"><string-name><surname>Cosentino</surname>, <given-names>V.</given-names></string-name>, <string-name><surname>Cánovas Izquierdo</surname> <given-names>J.L.</given-names></string-name> <string-name><surname>Cabot</surname> <given-names>J.</given-names></string-name> (<year>2017</year>). <article-title>A systematic mapping study of software development with GitHub</article-title>. <source>IEEE Access</source>, <volume>5</volume>, <fpage>7173</fpage>–<lpage>7192</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/ACCESS.2017.2682323" xlink:type="simple">https://doi.org/10.1109/ACCESS.2017.2682323</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_021">
<mixed-citation publication-type="chapter"><string-name><surname>Costa</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Murta</surname>, <given-names>L.</given-names></string-name> (<year>2013</year>). <chapter-title>Version control in Distributed Software Development: a systematic mapping study</chapter-title>. In: <source>IEEE 8th International Conference on Global Software Engineering, ICGSE 2013</source>, pp. <fpage>90</fpage>–<lpage>99</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/ICGSE.2013.19" xlink:type="simple">https://doi.org/10.1109/ICGSE.2013.19</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_022">
<mixed-citation publication-type="chapter"><string-name><surname>Datta</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Datta</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Naidu</surname>, <given-names>K.V.M.</given-names></string-name> (<year>2012</year>). <chapter-title>Capacitated team formation problem on social networks</chapter-title>. In: <source>Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>, pp. <fpage>1005</fpage>–<lpage>1013</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1145/2339530.2339690" xlink:type="simple">https://doi.org/10.1145/2339530.2339690</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_023">
<mixed-citation publication-type="chapter"><string-name><surname>De Farias</surname></string-name>, <string-name><surname>Novais</surname>, <given-names>M.A.F.R.</given-names></string-name>, <string-name><surname>Colaço Júnior</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>da Silva Carvalho</surname>, <given-names>L.P.</given-names></string-name> (<year>2016</year>). <chapter-title>A systematic mapping study on mining software repositories</chapter-title>. In: <source>SAC ’16: Proceedings of the 31st Annual ACM Symposium on Applied Computing</source>, pp. <fpage>1472</fpage>–<lpage>1479</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1145/2851613.2851786" xlink:type="simple">https://doi.org/10.1145/2851613.2851786</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_024">
<mixed-citation publication-type="journal"><string-name><surname>Del Carpio</surname>, <given-names>P.M.</given-names></string-name> (<year>2017</year>). <article-title>Extracción de Nubes de Palabras en Repositorios Git</article-title>. <source>2017 12th Iberian Conference on Information Systems and Technologies (CISTI)</source> <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.23919/CISTI.2017.7975911" xlink:type="simple">https://doi.org/10.23919/CISTI.2017.7975911</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_025">
<mixed-citation publication-type="other"><string-name><surname>Demeyer</surname>, <given-names>S.</given-names></string-name> <string-name><surname>Murgia</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Wyckmans</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Lamkanfi</surname>, <given-names>A.</given-names></string-name> (2013). Happy birthday! A trend analysis on past MSR papers. In: <italic>2013 10th Working Conference on Mining Software Repositories (MSR)</italic>, pp. 353–362. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/MSR.2013.6624049" xlink:type="simple">https://doi.org/10.1109/MSR.2013.6624049</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_026">
<mixed-citation publication-type="chapter"><string-name><surname>Dias de Moura</surname>, <given-names>M.H.</given-names></string-name>, <string-name><surname>Dantas do Nascimento</surname> <given-names>H.A.</given-names></string-name>, <string-name><surname>Couto Rosa</surname> <given-names>T.</given-names></string-name> (<year>2014</year>). <chapter-title>Extracting new metrics from version control system for the comparison of software developers</chapter-title>. In: <source>ARES ’14: Proceedings of the 2014 Ninth International Conference on Availability, Reliability and Security</source>, pp. <fpage>41</fpage>–<lpage>50</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/SBES.2014.25" xlink:type="simple">https://doi.org/10.1109/SBES.2014.25</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_027">
<mixed-citation publication-type="chapter"><string-name><surname>Dias</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Bacchelli</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Gousios</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Cassou</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Ducasse</surname>, <given-names>S.</given-names></string-name>, (<year>2015</year>). <chapter-title>Untangling fine-grained code changes</chapter-title>. In: <source>2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)</source>, pp. <fpage>341</fpage>–<lpage>350</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/SANER.2015.7081844" xlink:type="simple">https://doi.org/10.1109/SANER.2015.7081844</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_028">
<mixed-citation publication-type="journal"><string-name><surname>Dit</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Revelle</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Gethers</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Poshyvanyk</surname>, <given-names>D.</given-names></string-name> (<year>2013</year>). <article-title>Feature location in source code: a taxonomy and survey</article-title>. <source>Journal of Software: Evolution and Process</source>, <volume>25</volume>(<issue>1</issue>), <fpage>53</fpage>–<lpage>95</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1002/smr.567" xlink:type="simple">https://doi.org/10.1002/smr.567</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_029">
<mixed-citation publication-type="journal"><string-name><surname>Dyer</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Nguyen</surname>, <given-names>H.A.</given-names></string-name>, <string-name><surname>Rajan</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Nguyen</surname>, <given-names>T.N.</given-names></string-name> (<year>2015</year>). <article-title>Boa: Ultra-large-scale software repository and source-code mining</article-title>. <source>ACM Transactions on Software Engineering and Methodology</source>, <volume>25</volume>, <fpage>1</fpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1145/2803171" xlink:type="simple">https://doi.org/10.1145/2803171</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_030">
<mixed-citation publication-type="chapter"><string-name><surname>Elsen</surname>, <given-names>S.</given-names></string-name> (<year>2013</year>). <chapter-title>VisGi: visualizing Git branches</chapter-title>. <source>2013 First IEEE Working Conference on Software Visualization (VISSOFT)</source>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/VISSOFT.2013.6650522" xlink:type="simple">https://doi.org/10.1109/VISSOFT.2013.6650522</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_031">
<mixed-citation publication-type="chapter"><string-name><surname>Falessi</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Reichel</surname>, <given-names>A.</given-names></string-name> (<year>2015</year>). <chapter-title>Towards an open-source tool for measuring and visualizing the interest of technical debt</chapter-title>. In: <source>2015 IEEE 7th International Workshop on Managing Technical Debt (MTD)</source>, pp. <fpage>1</fpage>–<lpage>8</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/MTD.2015.7332618" xlink:type="simple">https://doi.org/10.1109/MTD.2015.7332618</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_032">
<mixed-citation publication-type="chapter"><string-name><surname>Farias</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Novais</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Ortins</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Colaço</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Mendonça</surname>, <given-names>M.</given-names></string-name> (<year>2015</year>). <chapter-title>Analyzing distributions of emails and commits from OSS contributors through mining software repositories: an exploratory study</chapter-title>. In: <source>ICEIS 2015: Proceedings of the 17th International Conference on Enterprise Information Systems, <italic>Vol</italic>. 2</source>, pp. <fpage>303</fpage>–<lpage>310</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.5220/0005368603030310" xlink:type="simple">https://doi.org/10.5220/0005368603030310</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_033">
<mixed-citation publication-type="chapter"><string-name><surname>Feldt</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>de Oliveira Neto</surname>, <given-names>F.G.</given-names></string-name>, <string-name><surname>Torkar</surname>, <given-names>R.</given-names></string-name> (<year>2018</year>). <chapter-title>Ways of applying artificial intelligence in software engineering</chapter-title>. In: <source>2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE)</source>, pp. <fpage>35</fpage>–<lpage>41</lpage>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_034">
<mixed-citation publication-type="journal"><string-name><surname>Finlay</surname>, <given-names>J.</given-names></string-name> <string-name><surname>Pears</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Connor</surname>, <given-names>A.M.</given-names></string-name> (<year>2014</year>). <article-title>Data stream mining for predicting software build outcomes using source code metrics</article-title>. <source>Information and Software Technology</source>, <volume>56</volume>(<issue>2</issue>), <fpage>183</fpage>–<lpage>198</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.infsof.2013.09.001" xlink:type="simple">https://doi.org/10.1016/j.infsof.2013.09.001</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_035">
<mixed-citation publication-type="chapter"><string-name><surname>Foucault</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Palyart</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Blanc</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Murphy</surname>, <given-names>G.C.</given-names></string-name>, <string-name><surname>Falleri</surname>, <given-names>J.-R.</given-names></string-name> (<year>2015</year>). <chapter-title>Impact of developer turnover on quality in open-source software</chapter-title>. In: <source>ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering</source>, pp. <fpage>829</fpage>–<lpage>841</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1145/2786805.2786870" xlink:type="simple">https://doi.org/10.1145/2786805.2786870</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_036">
<mixed-citation publication-type="journal"><string-name><surname>Franco-Bedoya</surname>, <given-names>O.</given-names></string-name> <string-name><surname>Ameller</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Costal</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Franch</surname>, <given-names>X.</given-names></string-name> (<year>2017</year>). <article-title>Open source software ecosystems: a systematic mapping</article-title>. <source>Information and Software Technology</source>, <volume>91</volume>, <fpage>160</fpage>–<lpage>185</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.infsof.2017.07.007" xlink:type="simple">https://doi.org/10.1016/j.infsof.2017.07.007</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_037">
<mixed-citation publication-type="journal"><string-name><surname>Fu</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Yan</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Xu</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Yang</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Kymer</surname>, <given-names>J.D.</given-names></string-name> (<year>2015</year>). <article-title>Automated classification of software change messages by semi-supervised Latent Dirichlet Allocation</article-title>. <source>Information and Software Technology</source>, <volume>57</volume>(<issue>1</issue>), <fpage>369</fpage>–<lpage>377</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.infsof.2014.05.017" xlink:type="simple">https://doi.org/10.1016/j.infsof.2014.05.017</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_038">
<mixed-citation publication-type="journal"><string-name><surname>Gamalielsson</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Lundell</surname>, <given-names>B.</given-names></string-name> (<year>2014</year>). <article-title>Sustainability of Open Source software communities beyond a fork: How and why has the LibreOffice project evolved?</article-title> <source>Journal of Systems and Software</source>, <volume>89</volume>(<issue>1</issue>), <fpage>128</fpage>–<lpage>145</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.jss.2013.11.1077" xlink:type="simple">https://doi.org/10.1016/j.jss.2013.11.1077</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_039">
<mixed-citation publication-type="journal"><string-name><surname>Gani</surname>, <given-names>A.</given-names></string-name> <string-name><surname>Siddiqa</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Shamshirband</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Hanum</surname>, <given-names>F.</given-names></string-name> (<year>2016</year>). <article-title>A survey on indexing techniques for big data: taxonomy and performance evaluation</article-title>. <source>Knowledge and Information Systems</source>, <volume>46</volume>(<issue>2</issue>), <fpage>241</fpage>–<lpage>284</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s10115-015-0830-y" xlink:type="simple">https://doi.org/10.1007/s10115-015-0830-y</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_040">
<mixed-citation publication-type="journal"><string-name><surname>Genero</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Fernandez</surname>, <given-names>A.M.</given-names></string-name>, <string-name><surname>James Nelson</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Poels</surname>, <given-names>G.</given-names></string-name> (<year>2011</year>). <article-title>A systematic literature review on the quality of UML models</article-title>. <source>Journal of Database Management</source>, <volume>22</volume>(<issue>3</issue>), <fpage>46</fpage>–<lpage>66</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.4018/jdm.2011070103" xlink:type="simple">https://doi.org/10.4018/jdm.2011070103</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_041">
<mixed-citation publication-type="book"><string-name><surname>Genero Bocco</surname> <given-names>M.F.</given-names></string-name>, <string-name><surname>Cruz-Lemus</surname> <given-names>J.A.</given-names></string-name>, <string-name><surname>Piattini Velthuis</surname> <given-names>M.G.</given-names></string-name> (<year>2014</year>). <source>Métodos de investigación en ingeniería del software</source>. <publisher-name>Ra-Ma</publisher-name>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_042">
<mixed-citation publication-type="journal"><string-name><surname>Grossi</surname>, <given-names>V.</given-names></string-name>, <string-name><surname>Romei</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Turini</surname>, <given-names>F.</given-names></string-name> (<year>2017</year>). <article-title>Survey on using constraints in data mining</article-title>. <source>Data Mining and Knowledge Discovery</source>, <volume>31</volume>(<issue>2</issue>), <fpage>424</fpage>–<lpage>464</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s10618-016-0480-z" xlink:type="simple">https://doi.org/10.1007/s10618-016-0480-z</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_043">
<mixed-citation publication-type="journal"><string-name><surname>Güemes-Peña</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>López-Nozal</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Marticorena-Sánchez</surname>, <given-names>R.</given-names></string-name> (<year>2018</year>). <article-title>Emerging topics in mining software repositories: machine learning in software repositories and datasets</article-title>. <source>Progress in Artificial Intelligence</source>, <volume>7</volume>, <fpage>237</fpage>–<lpage>247</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s13748-018-0147-7" xlink:type="simple">https://doi.org/10.1007/s13748-018-0147-7</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_044">
<mixed-citation publication-type="chapter"><string-name><surname>Gupta</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Sureka</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Padmanabhuni</surname>, <given-names>S.</given-names></string-name> (<year>2014</year>). <chapter-title>Process mining multiple repositories for software defect resolution from control and organizational perspective</chapter-title>. In: <source>MSR 2014: Proceedings of the 11th Working Conference on Mining Software Repositories</source>, pp. <fpage>122</fpage>–<lpage>131</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1145/2597073.2597081" xlink:type="simple">https://doi.org/10.1145/2597073.2597081</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_045">
<mixed-citation publication-type="journal"><string-name><surname>Haddaway</surname>, <given-names>N.R.</given-names></string-name>, <string-name><surname>Macura</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Whaley</surname>, <given-names>P.</given-names></string-name> (<year>2018</year>). <article-title>ROSES Reporting standards for Systematic Evidence Syntheses: Pro forma, flow-diagram and descriptive summary of the plan and conduct of environmental systematic reviews and systematic maps</article-title>. <source>Environmental Evidence</source>, <volume>7</volume>(<issue>1</issue>), <fpage>4</fpage>–<lpage>11</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1186/s13750-018-0121-7" xlink:type="simple">https://doi.org/10.1186/s13750-018-0121-7</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_046">
<mixed-citation publication-type="chapter"><string-name><surname>Harman</surname>, <given-names>M.</given-names></string-name> (<year>2012</year>). <chapter-title>The role of artificial intelligence in software engineering</chapter-title>. In: <source>2012 First International Workshop on Realizing AI Synergies in Software Engineering (RAISE)</source>. <publisher-name>IEEE</publisher-name>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/RAISE.2012.6227961" xlink:type="simple">https://doi.org/10.1109/RAISE.2012.6227961</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_047">
<mixed-citation publication-type="chapter"><string-name><surname>Hassan</surname>, <given-names>A.E.</given-names></string-name> (<year>2008</year>). <chapter-title>The road ahead for mining software repositories</chapter-title>. In: <source>2008 Frontiers of Software Maintenance</source>, pp. <fpage>48</fpage>–<lpage>57</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/FOSM.2008.4659248" xlink:type="simple">https://doi.org/10.1109/FOSM.2008.4659248</ext-link>. <comment>2008</comment>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_048">
<mixed-citation publication-type="journal"><string-name><surname>Herzig</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Just</surname> <given-names>S.</given-names></string-name>, <string-name><surname>Zeller</surname> <given-names>A.</given-names></string-name>, (<year>2016</year>). <article-title>The impact of tangled code changes on defect prediction models</article-title>. <source>Empirical Software Engineering </source>, <volume>21</volume>(<issue>2</issue>), <fpage>303</fpage>–<lpage>336</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s10664-015-9376-6" xlink:type="simple">https://doi.org/10.1007/s10664-015-9376-6</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_049">
<mixed-citation publication-type="chapter"><string-name><surname>Hidalgo Suarez</surname>, <given-names>C.G.</given-names></string-name>, <string-name><surname>Bucheli</surname>, <given-names>V.A.</given-names></string-name>, <string-name><surname>Restrepo-Calle</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Gonzalez</surname>, <given-names>F.A.</given-names></string-name> (<year>2018</year>). <chapter-title>A strategy based on technological maps for the identification of the state-of-the-art techniques in software development projects: Virtual judge projects as a case study</chapter-title>. In: <string-name><surname>Serrano</surname>, <given-names>C.J.</given-names></string-name>, <string-name><surname>Martínez-Santos</surname>, <given-names>J.</given-names></string-name> (Eds.), <source>Advances in Computing. CCC 2018</source>, <series>Communications in Computer and Information Science</series>, Vol. <volume>885</volume>. <publisher-name>Springer</publisher-name>, <publisher-loc>Cham</publisher-loc>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/978-3-319-98998-3_27" xlink:type="simple">https://doi.org/10.1007/978-3-319-98998-3_27</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_050">
<mixed-citation publication-type="journal"><string-name><surname>Ivarsson</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Gorschek</surname>, <given-names>T.</given-names></string-name> (<year>2011</year>). <article-title>A method for evaluating rigor and industrial relevance of technology evaluations</article-title>. <source> Empirical Software Engineering</source>, <volume>16</volume>(<issue>3</issue>), <fpage>365</fpage>–<lpage>395</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s10664-010-9146-4" xlink:type="simple">https://doi.org/10.1007/s10664-010-9146-4</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_051">
<mixed-citation publication-type="journal"><string-name><surname>Jarczyk</surname>, <given-names>O.</given-names></string-name>, <string-name><surname>Jaroszewicz</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Wierzbicki</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Pawlak</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Jankowski-Lorek</surname>, <given-names>M.</given-names></string-name> (<year>2017</year>). <article-title>Surgical teams on GitHub: modeling performance of GitHub project development processes</article-title>. <source>Information and Software Technology</source>, <volume>100</volume>, <fpage>32</fpage>–<lpage>46</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.infsof.2018.03.010" xlink:type="simple">https://doi.org/10.1016/j.infsof.2018.03.010</ext-link>. <comment>2018</comment>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_052">
<mixed-citation publication-type="journal"><string-name><surname>Jiang</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Lo</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Zheng</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Xia</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Yang</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>L.</given-names></string-name>, (<year>2019</year>). <article-title>Who should make decision on this pull request? Analyzing time-decaying relationships and file similarities for integrator prediction</article-title>. <source>Journal of Systems and Software</source>, <volume>154</volume>, <fpage>196</fpage>–<lpage>210</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.jss.2019.04.055" xlink:type="simple">https://doi.org/10.1016/j.jss.2019.04.055</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_053">
<mixed-citation publication-type="chapter"><string-name><surname>Joblin</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Apel</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Riehle</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Mauerer</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Siegmund</surname>, <given-names>J.</given-names></string-name> (<year>2015</year>). <chapter-title>From developer networks to verified communities: a fine-grained approach</chapter-title>. In: <source>2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE)</source>, pp. <fpage>563</fpage>–<lpage>573</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/ICSE.2015.73" xlink:type="simple">https://doi.org/10.1109/ICSE.2015.73</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_054">
<mixed-citation publication-type="chapter"><string-name><surname>Joy</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Thangavelu</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Jyotishi</surname>, <given-names>A.</given-names></string-name> (<year>2018</year>). <chapter-title>Performance of GitHub open-source software project: an empirical analysis</chapter-title>. In: <source>2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC)</source>, pp. <fpage>1</fpage>–<lpage>6</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/ICAECC.2018.8479462" xlink:type="simple">https://doi.org/10.1109/ICAECC.2018.8479462</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_055">
<mixed-citation publication-type="chapter"><string-name><surname>Just</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Herzig</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Czerwonka</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Murphy</surname>, <given-names>B.</given-names></string-name> (<year>2016</year>). <chapter-title>Switching to git: the good, the bad, and the ugly</chapter-title>. In: <source>2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE)</source>, pp. <fpage>400</fpage>–<lpage>411</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/ISSRE.2016.38" xlink:type="simple">https://doi.org/10.1109/ISSRE.2016.38</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_056">
<mixed-citation publication-type="journal"><string-name><surname>Kagdi</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Collard</surname>, <given-names>M.L.</given-names></string-name>, <string-name><surname>Maletic</surname>, <given-names>J.I.</given-names></string-name> (<year>2007</year>). <article-title>A Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution</article-title>. <source>Journal of Software: Evolution and Process</source>, <volume>19</volume>(<issue>2</issue>), <fpage>77</fpage>–<lpage>131</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1002/smr.344" xlink:type="simple">https://doi.org/10.1002/smr.344</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_057">
<mixed-citation publication-type="journal"><string-name><surname>Kagdi1</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Gethers</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Poshyvanyk</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Hammad</surname>, <given-names>M.</given-names></string-name> (<year>2014</year>). <article-title>Assigning change requests to software developers</article-title>. <source>Journal of Software: Evolution and Process</source>, <volume>26</volume>(<issue>12</issue>), <fpage>1172</fpage>–<lpage>1192</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1002/smr.530" xlink:type="simple">https://doi.org/10.1002/smr.530</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_058">
<mixed-citation publication-type="journal"><string-name><surname>Kalliamvakou</surname>, <given-names>E.</given-names></string-name>, <string-name><surname>Gousios</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Blincoe</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Singer</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>German</surname>, <given-names>D.M.</given-names></string-name>, <string-name><surname>Damian</surname>, <given-names>D.</given-names></string-name> (<year>2016</year>). <article-title>An in-depth study of the promises and perils of mining GitHub</article-title>. <source>Empirical Software Engineering</source>, <volume>21</volume>(<issue>5</issue>), <fpage>2035</fpage>–<lpage>2071</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s10664-015-9393-5" xlink:type="simple">https://doi.org/10.1007/s10664-015-9393-5</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_059">
<mixed-citation publication-type="journal"><string-name><surname>Kasurinen</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Knutas</surname>, <given-names>A.</given-names></string-name> (<year>2018</year>). <article-title>Publication trends in gamification: a systematic mapping study</article-title>. <source>Computer Science Review</source>, <volume>27</volume>, <fpage>33</fpage>–<lpage>44</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.cosrev.2017.10.003" xlink:type="simple">https://doi.org/10.1016/j.cosrev.2017.10.003</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_060">
<mixed-citation publication-type="chapter"><string-name><surname>Kirinuki</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Higo</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Hotta</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Kusumoto</surname>, <given-names>S.</given-names></string-name> (<year>2014</year>). <chapter-title>Hey! Are you committing tangled changes?</chapter-title> In: <source>ICPC 2014: Proceedings of the 22nd International Conference on Program Comprehension</source>, pp. <fpage>262</fpage>–<lpage>265</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1145/2597008.2597798" xlink:type="simple">https://doi.org/10.1145/2597008.2597798</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_061">
<mixed-citation publication-type="other"><string-name><surname>Kitchenham</surname>, <given-names>B.</given-names></string-name> (2007). <italic>Guidelines for performing Systematic Literature Reviews in Software Engineering</italic>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1145/1134285.1134500" xlink:type="simple">https://doi.org/10.1145/1134285.1134500</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_062">
<mixed-citation publication-type="journal"><string-name><surname>Kitchenham</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Pearl Brereton</surname>, <given-names>O.</given-names></string-name>, <string-name><surname>Budgen</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Turner</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Bailey</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Linkman</surname>, <given-names>S.</given-names></string-name> (<year>2009</year>). <article-title>Systematic literature reviews in software engineering – a systematic literature review</article-title>. <source>Information and Software Technology</source>, <volume>51</volume>(<issue>1</issue>), <fpage>7</fpage>–<lpage>15</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.infsof.2008.09.009" xlink:type="simple">https://doi.org/10.1016/j.infsof.2008.09.009</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_063">
<mixed-citation publication-type="journal"><string-name><surname>Kitchenham</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Sjøberg</surname>, <given-names>D.I.K.</given-names></string-name>, <string-name><surname>Dyba</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Pearl Brereton</surname>, <given-names>O.</given-names></string-name>, <string-name><surname>Budgen</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Höst</surname>, <given-names>M.</given-names></string-name> (<year>2013</year>). <article-title>Trends in the quality of human-centric software engineering experiments – a quasi-experiment</article-title>. <source>IEEE Transactions on Software Engineering</source>, <volume>39</volume>(<issue>7</issue>), <fpage>1002</fpage>–<lpage>1017</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/TSE.2012.76" xlink:type="simple">https://doi.org/10.1109/TSE.2012.76</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_064">
<mixed-citation publication-type="journal"><string-name><surname>Kitchenham</surname>, <given-names>B.A.</given-names></string-name>, <string-name><surname>Budgen</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Pearl Brereton</surname>, <given-names>O.</given-names></string-name> (<year>2011</year>). <article-title>Using mapping studies as the basis for further research – a participant-observer case study</article-title>. <source>Information and Software Technology</source>, <volume>53</volume>(<issue>6</issue>), <fpage>638</fpage>–<lpage>651</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.infsof.2010.12.011" xlink:type="simple">https://doi.org/10.1016/j.infsof.2010.12.011</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_065">
<mixed-citation publication-type="journal"><string-name><surname>Kuhrmann</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Méndez Fernández</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Daneva</surname>, <given-names>M.</given-names></string-name> (<year>2017</year>). <article-title>On the pragmatic design of literature studies in software engineering: an experience-based guideline</article-title>. <source>Empirical Software Engineering</source>, <volume>22</volume>(<issue>6</issue>), <fpage>2852</fpage>–<lpage>2891</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s10664-016-9492-y" xlink:type="simple">https://doi.org/10.1007/s10664-016-9492-y</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_066">
<mixed-citation publication-type="journal"><string-name><surname>Kumar</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Sripada</surname>, <given-names>S.K.</given-names></string-name>, <string-name><surname>Sureka</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Rath</surname>, <given-names>S.K.</given-names></string-name> (<year>2018</year>). <article-title>Effective fault prediction model developed using Least Square Support Vector Machine (LSSVM)</article-title>. <source>Journal of Systems and Software</source>, <volume>137</volume>, <fpage>686</fpage>–<lpage>712</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.jss.2017.04.016" xlink:type="simple">https://doi.org/10.1016/j.jss.2017.04.016</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_067">
<mixed-citation publication-type="journal"><string-name><surname>Laukkanen</surname>, <given-names>E.</given-names></string-name>, <string-name><surname>Itkonen</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Lassenius</surname>, <given-names>C.</given-names></string-name> (<year>2017</year>). <article-title>Problems, causes and solutions when adopting continuous delivery—a systematic literature review</article-title>. <source>Information and Software Technology</source>, <volume>82</volume>, <fpage>55</fpage>–<lpage>79</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.infsof.2016.10.001" xlink:type="simple">https://doi.org/10.1016/j.infsof.2016.10.001</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_068">
<mixed-citation publication-type="chapter"><string-name><surname>Lee</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Seo</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Seo</surname>, <given-names>E.</given-names></string-name> (<year>2013</year>). <chapter-title>A git source repository analysis tool based on a novel branch-oriented approach</chapter-title>. In: <source>2013 International Conference on Information Science and Applications (ICISA)</source>, pp. <fpage>1</fpage>–<lpage>4</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/ICISA.2013.6579457" xlink:type="simple">https://doi.org/10.1109/ICISA.2013.6579457</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_069">
<mixed-citation publication-type="journal"><string-name><surname>Li</surname>, <given-names>H.Y.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Zhou</surname>, <given-names>Z.-H.</given-names></string-name> (<year>2019</year>). <article-title>Towards one reusable model for various software defect mining tasks</article-title>. <source>Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics)</source>, <volume>11441 LNAI</volume>, <fpage>212</fpage>–<lpage>224</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/978-3-030-16142-2_17" xlink:type="simple">https://doi.org/10.1007/978-3-030-16142-2_17</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_070">
<mixed-citation publication-type="chapter"><string-name><surname>Liu</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>He</surname>, <given-names>L.</given-names></string-name> (<year>2016</year>). <chapter-title>A comparative study of the effects of pull request on GitHub projects</chapter-title>. In: <source>2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC)</source>, pp. <fpage>313</fpage>–<lpage>322</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/COMPSAC.2016.27" xlink:type="simple">https://doi.org/10.1109/COMPSAC.2016.27</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_071">
<mixed-citation publication-type="chapter"><string-name><surname>Malheiros</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Moraes</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Trindade</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Meira</surname>, <given-names>S.</given-names></string-name> (<year>2012</year>). <chapter-title>A source code recommender system to support newcomers</chapter-title>. In: <source>COMPSAC ’12: Proceedings of the 2012 IEEE 36th Annual Computer Software and Applications Conference</source>, pp. <fpage>19</fpage>–<lpage>24</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/COMPSAC.2012.11" xlink:type="simple">https://doi.org/10.1109/COMPSAC.2012.11</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_072">
<mixed-citation publication-type="chapter"><string-name><surname>Maqsood</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Eshraghi</surname>, <given-names>I.</given-names></string-name>, <string-name><surname>Sarmad Ali</surname>, <given-names>S.</given-names></string-name> (<year>2017</year>). <chapter-title>Success or failure identification for GitHub’s open source projects</chapter-title>. In: <source>ICMSS ’17: Proceedings of the 2017 International Conference on Management Engineering, Software Engineering and Service Sciences</source>, pp. <fpage>145</fpage>–<lpage>150</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1145/3034950.3034957" xlink:type="simple">https://doi.org/10.1145/3034950.3034957</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_073">
<mixed-citation publication-type="journal"><string-name><surname>Martínez-Torres</surname>, <given-names>M.R.</given-names></string-name>, <string-name><surname>Toral</surname>, <given-names>S.L.</given-names></string-name>, <string-name><surname>Barrero</surname>, <given-names>F.J.</given-names></string-name>, <string-name><surname>Gregor</surname>, <given-names>D.</given-names></string-name> (<year>2013</year>). <article-title>A text categorisation tool for open source communities based on semantic analysis</article-title>. <source>Behaviour &amp; Information Technology</source>, <volume>32</volume>(<issue>6</issue>), <fpage>532</fpage>–<lpage>544</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/0144929X.2011.624634" xlink:type="simple">https://doi.org/10.1080/0144929X.2011.624634</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_074">
<mixed-citation publication-type="journal"><string-name><surname>Munir</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Wnuk</surname> <given-names>K.</given-names></string-name>, <string-name><surname>Runeson</surname> <given-names>P.</given-names></string-name> (<year>2016</year>). <article-title>Open innovation in software engineering: a systematic mapping study</article-title>. <source>Empirical Software Engineering</source>, <volume>21</volume>(<issue>2</issue>), <fpage>684</fpage>–<lpage>723</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s10664-015-9380-x" xlink:type="simple">https://doi.org/10.1007/s10664-015-9380-x</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_075">
<mixed-citation publication-type="chapter"><string-name><surname>Murgia</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Tourani</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Adams</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Ortu</surname>, <given-names>M.</given-names></string-name> (<year>2014</year>). <chapter-title>Do developers feel emotions? An exploratory analysis of emotions in software artifacts</chapter-title>. In: <source>MSR 2014: Proceedings of the 11th Working Conference on Mining Software Repositories</source>, pp. <fpage>262</fpage>–<lpage>271</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1145/2597073.2597086" xlink:type="simple">https://doi.org/10.1145/2597073.2597086</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_076">
<mixed-citation publication-type="chapter"><string-name><surname>Negara</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Codoban</surname> <given-names>M.</given-names></string-name>, <string-name><surname>Dig</surname> <given-names>D.</given-names></string-name>, <string-name><surname>Johnson</surname> <given-names>R.E.</given-names></string-name> (<year>2014</year>). <chapter-title>Mining fine-grained code changes to detect unknown change patterns</chapter-title>. In: <source>ICSE 2014: Proceedings of the 36th International Conference on Software Engineering</source>, pp. <fpage>803</fpage>–<lpage>813</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1145/2568225.2568317" xlink:type="simple">https://doi.org/10.1145/2568225.2568317</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_077">
<mixed-citation publication-type="journal"><string-name><surname>Nishi</surname>, <given-names>M.A.</given-names></string-name>, <string-name><surname>Damevski</surname>, <given-names>K.</given-names></string-name> (<year>2018</year>). <article-title>Scalable code clone detection and search based on adaptive prefix filtering</article-title>. <source>Journal of Systems and Software</source>, <volume>137</volume>, <fpage>130</fpage>–<lpage>142</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.jss.2017.11.039" xlink:type="simple">https://doi.org/10.1016/j.jss.2017.11.039</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_078">
<mixed-citation publication-type="chapter"><string-name><surname>Novielli</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Girardi</surname> <given-names>D.</given-names></string-name>, <string-name><surname>Lanubile</surname> <given-names>F.</given-names></string-name> (<year>2018</year>). <chapter-title>A benchmark study on sentiment analysis for software engineering research</chapter-title>. In: <source>MSR ’18: Proceedings of the 15th International Conference on Mining Software Repositories</source>, pp. <fpage>364</fpage>–<lpage>375</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1145/3196398.3196403" xlink:type="simple">https://doi.org/10.1145/3196398.3196403</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_079">
<mixed-citation publication-type="chapter"><string-name><surname>Ozbas-Caglayan</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Dogru</surname>, <given-names>A.H.</given-names></string-name> (<year>2013</year>). <chapter-title>Software repository analysis for investigating design-code compliance</chapter-title>. In: <source>2013 Joint Conference of the 23rd International Workshop on Software Measurement and the 8th International Conference on Software Process and Product Measurement</source>, pp. <fpage>231</fpage>–<lpage>233</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/IWSM-Mensura.2013.40" xlink:type="simple">https://doi.org/10.1109/IWSM-Mensura.2013.40</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_080">
<mixed-citation publication-type="journal"><string-name><surname>Pedreira</surname>, <given-names>O.</given-names></string-name>, <string-name><surname>García</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Brisaboa</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Piattini</surname>, <given-names>M.</given-names></string-name> (<year>2015</year>). <article-title>Gamification in software engineering – a systematic mapping</article-title>. <source>Information and Software Technology</source>, <volume>57</volume>(<issue>1</issue>), <fpage>157</fpage>–<lpage>168</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.infsof.2014.08.007" xlink:type="simple">https://doi.org/10.1016/j.infsof.2014.08.007</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_081">
<mixed-citation publication-type="journal"><string-name><surname>Perez-Castillo</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Ruiz-Gonzalez</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Genero</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Piattini</surname>, <given-names>M.</given-names></string-name> (<year>2019</year>). <article-title>A systematic mapping study on enterprise architecture mining A systematic mapping study on enterprise architecture mining</article-title>. <source>Enterprise Information Systems</source>, <volume>13</volume>(<issue>5</issue>), <fpage>675</fpage>–<lpage>718</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/17517575.2019.1590859" xlink:type="simple">https://doi.org/10.1080/17517575.2019.1590859</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_082">
<mixed-citation publication-type="chapter"><string-name><surname>Petersen</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Gencel</surname>, <given-names>C.</given-names></string-name> (<year>2013</year>). <chapter-title>Worldviews, research methods, and their relationship to validity in empirical software engineering research</chapter-title>. In: <source>2013 Joint Conference of the 23rd International Workshop on Software Measurement and the 8th International Conference on Software Process and Product Measurement</source>, pp. <fpage>81</fpage>–<lpage>89</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/IWSM-Mensura.2013.22" xlink:type="simple">https://doi.org/10.1109/IWSM-Mensura.2013.22</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_083">
<mixed-citation publication-type="chapter"><string-name><surname>Petersen</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Feldt</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Mujtaba</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Mattsson</surname>, <given-names>M.</given-names></string-name> (<year>2008</year>). <chapter-title>Systematic mapping studies in software engineering</chapter-title>. In: <source>EASE’08: Proceedings of the 12th international conference on Evaluation and Assessment in Software Engineering</source>, pp. <fpage>68</fpage>–<lpage>77</lpage>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_084">
<mixed-citation publication-type="journal"><string-name><surname>Petersen</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Vakkalanka</surname> <given-names>S.</given-names></string-name> <string-name><surname>Kuzniarz</surname> <given-names>L.</given-names></string-name> (<year>2015</year>). <article-title>Guidelines for conducting systematic mapping studies in software engineering: an update</article-title>. <source>Information and Software Technology</source>, <volume>64</volume>, <fpage>1</fpage>–<lpage>18</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.infsof.2015.03.007" xlink:type="simple">https://doi.org/10.1016/j.infsof.2015.03.007</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_085">
<mixed-citation publication-type="chapter"><string-name><surname>Rosen</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Grawi</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Shihab</surname>, <given-names>E.</given-names></string-name> (<year>2015</year>). <chapter-title>Commit guru: analytics and risk prediction of software commits</chapter-title>. In: <source>ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering</source>, pp. <fpage>966</fpage>–<lpage>969</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1145/2786805.2803183" xlink:type="simple">https://doi.org/10.1145/2786805.2803183</ext-link>. <comment>2015</comment>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_086">
<mixed-citation publication-type="journal"><string-name><surname>Shamseer</surname>, <given-names>L.</given-names></string-name> <string-name><surname>Moher</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Clarke</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Ghersi</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Liberati</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Petticrew</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Shekelle</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Stewart</surname>, <given-names>L.A.</given-names></string-name> (<year>2015</year>). <article-title>Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation</article-title>. <source>The BMJ</source>, <volume>349</volume>, <fpage>1</fpage>–<lpage>25</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1136/bmj.g7647" xlink:type="simple">https://doi.org/10.1136/bmj.g7647</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_087">
<mixed-citation publication-type="journal"><string-name><surname>Siddiqui</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Ahmad</surname>, <given-names>A.</given-names></string-name> (<year>2018</year>). <article-title>Data mining tools and techniques for mining software repositories: a systematic review</article-title>. <source>Advances in Intelligent Systems and Computing</source>, <volume>654</volume>, <fpage>717</fpage>–<lpage>726</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/978-981-10-6620-7_70" xlink:type="simple">https://doi.org/10.1007/978-981-10-6620-7_70</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_088">
<mixed-citation publication-type="chapter"><string-name><surname>Stol</surname>, <given-names>K.J.</given-names></string-name>, <string-name><surname>Ralph</surname> <given-names>P.</given-names></string-name>, <string-name><surname>Fitzgerald</surname> <given-names>B.</given-names></string-name> (<year>2016</year>). <chapter-title>Grounded theory in software engineering research: a critical review and guidelines</chapter-title> In: <source>ICSE ’16: Proceedings of the 38th International Conference on Software Engineering</source>, pp. <fpage>120</fpage>–<lpage>131</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1145/2884781.2884833" xlink:type="simple">https://doi.org/10.1145/2884781.2884833</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_089">
<mixed-citation publication-type="journal"><string-name><surname>Tahir</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Tosi</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Morasca</surname>, <given-names>S.</given-names></string-name> (<year>2013</year>). <article-title>A systematic review on the functional testing of semantic web services</article-title>. <source>Journal of Systems and Software</source>, <volume>86</volume>(<issue>11</issue>), <fpage>2877</fpage>–<lpage>2889</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.jss.2013.06.064" xlink:type="simple">https://doi.org/10.1016/j.jss.2013.06.064</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_090">
<mixed-citation publication-type="journal"><string-name><surname>van Eck</surname>, <given-names>N.J.</given-names></string-name>, <string-name><surname>Waltman</surname>, <given-names>L.</given-names></string-name> (<year>2017</year>). <article-title>Citation-based clustering of publications using CitNetExplorer and VOSviewer</article-title>. <source>Scientometrics</source>, <volume>111</volume>(<issue>2</issue>), <fpage>1053</fpage>–<lpage>1070</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s11192-017-2300-7" xlink:type="simple">https://doi.org/10.1007/s11192-017-2300-7</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_091">
<mixed-citation publication-type="journal"><string-name><surname>Waltman</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>van Eck</surname> <given-names>N.J.</given-names></string-name> <string-name><surname>Noyons</surname> <given-names>Ed.C.M.</given-names></string-name> (<year>2010</year>). <article-title>A unified approach to mapping and clustering of bibliometric networks</article-title>. <source>Journal of Informetrics</source>, <volume>4</volume>(<issue>4</issue>), <fpage>629</fpage>–<lpage>635</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.joi.2010.07.002" xlink:type="simple">https://doi.org/10.1016/j.joi.2010.07.002</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_092">
<mixed-citation publication-type="journal"><string-name><surname>Wang</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Liang</surname>, <given-names>H.</given-names></string-name> <string-name><surname>Jia</surname>, <given-names>Y.</given-names></string-name> <string-name><surname>Ge</surname>, <given-names>S.</given-names></string-name> <string-name><surname>Xue</surname>, <given-names>Y.</given-names></string-name> <string-name><surname>Wang</surname>, <given-names>Z.</given-names></string-name> (<year>2016</year>). <article-title>Cloud computing research in the IS discipline: a citation/co-citation analysis</article-title>. <source>Decision Support Systems</source>, <volume>86</volume>, <fpage>35</fpage>–<lpage>47</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.dss.2016.03.006" xlink:type="simple">https://doi.org/10.1016/j.dss.2016.03.006</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_093">
<mixed-citation publication-type="journal"><string-name><surname>Wijesiriwardana</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Wimalaratne</surname>, <given-names>P.</given-names></string-name> (<year>2018</year>). <article-title>Fostering real-time software analysis by leveraging heterogeneous and autonomous software repositories</article-title>. <source>IEICE Transactions on Information and Systems E</source>, <volume>101D</volume>(<issue>11</issue>), <fpage>2730</fpage>–<lpage>2743</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1587/transinf.2018EDP7094" xlink:type="simple">https://doi.org/10.1587/transinf.2018EDP7094</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_094">
<mixed-citation publication-type="book"><string-name><surname>Wohlin</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Runeson</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Höt</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Ohlsson</surname>, <given-names>M.C.</given-names></string-name>, <string-name><surname>Regnell</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Wesslén</surname>, <given-names>A.</given-names></string-name> (<year>2012</year>). <source>Experimentation in Software Engineering</source>. <publisher-name>Springer Publishing Company, Incorporated</publisher-name>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_095">
<mixed-citation publication-type="chapter"><string-name><surname>Wu</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Kropczynski</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Shih</surname>, <given-names>P.C.</given-names></string-name>, <string-name><surname>Carroll</surname>, <given-names>J.M.</given-names></string-name> (<year>2014</year>). <chapter-title>Exploring the ecosystem of software developers on GitHub and other platforms</chapter-title>. In: <source>CSCW Companion ’14: Proceedings of the companion publication of the 17th ACM conference on Computer Supported Cooperative Work &amp; Social Computing</source>, pp. <fpage>265</fpage>–<lpage>268</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1145/2556420.2556483" xlink:type="simple">https://doi.org/10.1145/2556420.2556483</ext-link>.</mixed-citation>
</ref>
<ref id="j_infor454_ref_096">
<mixed-citation publication-type="journal"><string-name><surname>Zolkifli</surname>, <given-names>N.N.</given-names></string-name>, <string-name><surname>Ngah</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Deraman</surname>, <given-names>A.</given-names></string-name> (<year>2018</year>). <article-title>Version control system: a review</article-title>. <source>Procedia Computer Science</source>, <volume>135</volume>, <fpage>408</fpage>–<lpage>415</lpage>. <ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.procs.2018.08.191" xlink:type="simple">https://doi.org/10.1016/j.procs.2018.08.191</ext-link>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>
