TOSEM 2018顶刊论文How Far We Have Progressed in the Journey?An Examination of Cross-Project Defect ...

版权声明:如需转载或引用,请注明出处。 https://blog.csdn.net/weixin_39278265/article/details/82594098

前言

本文旨在研究 TOSEM 2018顶刊论文“How Far We Have Progressed in the Journey?An Examination of Cross-Project Defect Prediction”

1 作者信息

YUMING ZHOU, YIBIAO YANG, HONGMIN LU, LIN CHEN, YANHUI LI, and YANGYANG ZHAO, Nanjing University
JUNYAN QIAN, Guilin University of Electronic Technology
BAOWEN XU, Nanjing University

南京大学,厉害。

2 现在的摘要都很清楚了,很结构化

1)背景
Background. Recent years have seen an increasing interest in cross-project defect prediction (CPDP), which
aims to apply defect prediction models built on source projects to a target project. Currently, a variety of
(complex) CPDP models have been proposed with a promising prediction performance.

没想到跨程序的缺陷检测这么火?我之前都没看到过。

2)问题所在
Most, if not all, of the existing CPDP models are not compared against those simple module size
models that are easy to implement and have shown a good performance in defect prediction in the literature.
这里还有点没看太懂

simple module size models 是什么?

3)目标
We aim to investigate how far we have really progressed in the journey by comparing the performance in defect prediction between the existing CPDP models and simple module size models

噢噢,意思就是要用CPDP模型和简单的模块大小模型来比较,做一个实证分析咯,没想到,这个实证分析看来确实很广泛。

4)方法

We frst use module size in the target project to build two simple defect prediction models, ManualDown and ManualUp, which do not require any training data from source projects. ManualDown considers a larger module as more defect-prone, while ManualUp considers a smaller module as more defect-prone.

Then, we take the following measures to ensure a fair comparison on the performance in defect prediction between the existing CPDP models and the simple module size models: using the same publicly available data sets, using the same performance indicators, and using the prediction performance reported in the original cross-project defect prediction studies.

控制变量来进行比较,目前为止还好理解。

5)结果

The simple module size models have a prediction performance comparable or even superior to most of the existing CPDP models in the literature, including many newly proposed models

这个简单的模块大小模型竟然这么厉害吗。
看来这个结果确实很有意义。

6)结论

The results caution us that, if the prediction performance is the goal, the real progress in CPDP
is not being achieved as it might have been envisaged. We hence recommend that future studies should
include ManualDown/ManualUp as the baseline models for comparison when developing new CPDP models
to predict defects in a complete target project.

作者算是给出了一个学术界万万没想到(意思就是之前忽略了的事实)的结论,所以,,这个算是很有指导意义的吧。

3 defect prediction model的概念

A defect prediction model can predict the defect-proneness of the modules (e.g., fles, classes, or
functions) in a software project. Given the prediction result, a project manager can (1) classify
the modules into two categories, high defect-prone or low defect-prone, or (2) rank the modules
from the highest to lowest defect-proneness.

原来是从历史版本里面进行训练:
This will help to fnd more defects in the project if the model can accurately predict defect-proneness. For a given project, it is common to use the historical project data (e.g., module-level complexity metric data and defect data in the previous releases of the project) to train the model.

问题(局限性):
Prior studies have shown that the model predicts defects well in the test data if it is trained using a sufciently large amount of data [23]. However, in practice, it might be difcult to obtain sufcient training data [129]. This is especially true for a new type of projects or projects with little historical data collected.

4 接3,局限性如何解决/克服

One way to deal with the shortage of training data is to leverage the data from other projects
(i.e., source projects) to build the model and apply it to predict the defects in the current project
(i.e., the target project) [9]. However, in practice, it is challenging to achieve accurate cross-project
defect prediction (CPDP) [129].

[9] Lionel C. Briand, Walcelio L. Melo, and Jurgen Wust. 2002. Assessing the applicability of fault-proneness models
across object-oriented software projects. IEEE Trans. Softw. Eng. 28, 7 (2002), 706–720.
这文献也不早了诶。

[129] Thomas Zimmermann, Nachiappan Nagappan, Harald Gall, Emanuel Giger, and Brendan Murphy. 2009. Crossproject defect prediction: A large scale experiment on data vs. domain vs. process. In Proceedings of the 7th joint
meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’09). 91–100
感觉这些人都好强啊。人才众多。

然而,还是有问题:
The main reason is that the source and target project data usually
exhibit signifcantly different distributions, which violates the similar distribution assumption of
the training and test data required by most modeling techniques [61, 80, 123].

然而,最近又给出了解决方案,真是一波三折:

In recent years, various techniques have been proposed to address these challenges and a
large number of CPDP models have hence been developed (see Section 2.4). In particular, it has
been reported that these CPDP models produce a promising prediction performance [27, 41, 61,
79, 80].

[123] Feng Zhang, Audris Mockus, Iman Keivanloo, and Ying Zou. 2016. Towards building a universal defect prediction
model with rank transformed predictors. Emp. Softw. Eng. 21, 5 (2016), 2107–2145.
[80] Jaechang Nam, Sinno Jialin Pan, and Sunghun Kim. 2013. Transfer defect learning. In Proceedings of the 35th International Conference on Software Engineering (ICSE’13). 382–391.
[79] Jaechang Nam and Sunghun Kim. 2015. Heterogeneous defect prediction. In Proceedings of the 10th Joint Meeting
of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software
Engineering (FSE’15). 508–519.
感觉和自动修复没关系。但是说不定可以交叉。

5 仓促小结

这篇文章,目测和自动修复没有直接的关系,主要研究的是缺陷检测,用的还是一些模型(CPDP),所以我觉得目前看到这里就ok了,不需要再更进一步。

当然,如果以后有时间的话,我觉得还是很有必要一读的,毕竟软件工程各个方向之间可能都是有交叉集合的,这个方向的技术,可以用到其他方向。

猜你喜欢

转载自blog.csdn.net/weixin_39278265/article/details/82594098