Towards a Model-based Software Mining Infrastructure
Abstract
Software mining is concerned with two primary goals: the extraction of basic facts from software repositories and the derivation of knowledge resulting from the assessment of the basic facts. Facts extraction approaches rely on custom and task-specific infrastructures and tools. The resulting facts assets are usually represented in heterogeneous formats at a low level of abstraction. Due to this, facts extracted from different sources are also not well integrated, even if they are related. To manage this, existing infrastructures often aim at supporting an all-in-one information meta-structures which try to integrate all facts in one connected whole.
We propose a generic infrastructure that translates extracted facts to homogeneous high-level representations conforming to domain-specific meta-models, and then transforms these high-level model instances to instances of domain-specific models related to a particular assessment task, which can be incrementally enriched with additional facts as these become available or necessary. This allows researchers and practitioners to focus on the assessment task at hand, without being concerned with low-level representation details or complex data models containing large amounts of often irrelevant data. We present an example scenario with a concrete instantiation of the proposed infrastructure targeting the assessment of developer behaviour.
Keywords:
data mining, facts extraction, data integration, domain modeling
Document Type:
Journal Articles
Publisher:
ACM
Journal:
SIGSOFT Softw. Eng. Notes
Volume:
40
Number:
1
Pages:
1-8
Month:
1
Year:
2015
DOI:
10.1145/2693208.2693224
Bibtex
2024 © Software Engineering For Distributed Systems Group