The following sections describe how to identify and extract eligible objects from primary studies. Mandatory attributes are marked with an asterisk.
A quality factor (QF) represents a normative metric which maps a textual requirement of a specific granularity to a scale and therefore informs about the quality of that input.
scope note*
The name of a quality factor shall reflect the quality factor in relation to the whole set of quality factors. This means that the name can be extracted from the paper directly, but if the name does not sufficiently demarcate the quality factor from others, then it is feasible to select a new name. This is due to the fact that several publications propose and name a quality factor without awareness of other quality factors, which makes demarcation by design impossible.
dimension cluster*
This dimension-cluster is built on the conceptual notion that a quality factor is a normative rule, where a violation against it is hypothesized to have an impact of the type aspect on an activity in which the requirement is used. It captures all quality aspects that are explicitly mentioned in the description of the impact of the QA.
Extraction rule: A violation against the quality factor has an impact on
Dimensions | Extraction Rule |
adequacy | the appropriateness of the requirement in its respective context |
atomicity | the confinement of the requirements scope to only one, not further splittable element |
completeness | the explicit availability of all relevant information |
compliance | the adherence to external rules |
consistency | the satisfiability of all requirements in conjunction |
correctness | the alignment of the stated text with the intended objects |
designindependence | the confinement of the requirement to the problem space |
feasibility | the chance of realistically implementing the requirement |
maintainability | the ability to continuously ensure the quality of the requirement |
modifiability | the ability to change the requirement |
necessity | the singularity of written text |
precision | the level of unique specification of the text |
reusability | the ease of reuse |
simplicity | the intricacy of the written text |
traceability | the explicit connections to other artifacts |
unambiguouseness | the unique interpretation of a requirement |
understandability | the comprehensibility of written text |
verifiability | the ability to assess whether a requirement is met |
Characteristics | Extraction Rule |
- | in a negative way |
+ | in a positive way |
? | in an unknown way |
no way at all |
dimension*
The linguistic complexity classifies a quality factor regarding the type of information that needs to be available to determine a violation against the quality factor. This informs about the complexity of automatically detecting this quality factor: while lexical factors can be decided for example using regular expressions, syntactic factors using POS, constituency or dependency parsing, and structural using metadata, semantic factors require an understanding of the input, which might only be approximated using thesauri or a relationship to an ontology.
Extraction rule: In order to determine a violation against the quality factor, one must know at least …
Characteristic | Extraction Rule |
lexical | the literal words (i.e., a regular expression can be used to automate the rule). |
structural | the structure of sentences (i.e., metadata (headings, emphasis, …) is necessary to automate the rule). |
syntactic | the grammatical relationships between words (i.e., POS tags, constituency tags, dependency tags, etc. can be used to automate the rule). |
semantic | the meaning of the words (i.e., a semantic comprehension of the text is necessary to automate the rule). |
dimension*
The scope classifies a quality factor regarding the extent of information that is necessary in order to determine a violation against the formal rule of the quality factor. The minimal scope shall be chosen, i.e., for one violation against a given quality factor, how much textual information must be seen to detect that violation? The classification of the scope shall not be approached from a standpoint of 'ensuring that a given input document is free of violations against that factor', because that would always entail a global/document scope.
Extraction rule: To determine one violation against the formal rule of the quality attribute, it suffices to see …
Characteristic | Extraction Rule |
word | a single token/word |
phrase | multiple, coherent words |
sentence | a full, grammatically correct sentence |
structured/tabular text | a structured text (use case specification, user story, feature table) |
user story | a structured text following the Cohn/Connextra template ('As a <user> I want to <goal> so that <justification>.') |
use case | a structured text describing a set of connected scenarios in the form of consecutive steps |
requirement | a structured functional, non-functional, or process requirement |
section | a full, coherent section |
document | a full, coherent document |
global | all textual requirements artifacts associated to the product/service |
A description object explains (a) what the quality factor means and (b) how this quality factor is hypothesized to inform about the quality of the requirement. A quality attribute can be associated with multiple descriptions, which may be the result of parallel work or updating definitions or impact descriptions.
scope note*
A definition is an informal rule, which must be complied with in order to ensure good quality of the requirement according to the authors. The definition may be simply postulated, but can also be derived empirically from data or developed in collaboration with industry.
scope note
The impact scope note explicitly describes how the QA affects the actual quality of the requirements. A manuscript should make the hypothesized impact explicit, but does not need to in order to be included.
dimension*
This dimension captures whether the given description and/or impact is rooted in any sort of empirical evidence. This may simply be practitioners reporting violations against the quality factor as a challenge, or an investigation of requirements artifacts. Empirical evidence for the description or impact corroborates the relevance of this QF.
Characteristic | Extraction Rule |
true | An empirical method has been applied to validate the definition or the impact (or both) of the quality attribute. |
false | The quality attribute has simply been postulated without any empirical validation of its definition or impact. |
dimension*
This dimension indicates whether practitioners - collaborators working primarily in industry - were involved in the creation or validation of the quality attribute.
Characteristic | Extraction Rule |
true | Practitioners were involved in the validation of the description or impact of the quality attribute. |
false | No practitioners were involved in the validation of the description or impact of the quality attribute or there was no empirical method applied at all. |
A data set object is an arbitrarily large set of natural language requirements, which may make one or more specific quality factors explicit (e.g., through annotations) and are usable as gold standards to evaluate newly proposed approaches.
scope note*
The description of a data set object contains information about the origin of this data. Descriptions may be vague in the case of confidential data or explicit in the case of open-source data.
dimension*
This dimension classifies a data set object regarding the type of the author. If no author is ever explicitly mentioned in the reference and accessing the reference does not reveal the author either, the data sets author must be exposed as unknown.
Characteristic | Extraction Rule |
practitioner data | Data that was extracted from contexts in which practitioners work |
student data | Data that was created or extracted in the context of student work. |
mocked data | Data that was fabricated for the purpose of being studied. |
unknown | A dataset exists, but it is not clear who created the data. |
dimension*
This dimension classifies an object regarding who is responsible for annotating the ground truth embedded in the data set if such an annotation exists. Data that is used as is without any additional information embedded into it have a ground truth annotator of none.
Characteristic | Extraction Rule |
practitioners | Practitioners annotated the data. |
researchers | Ph.D., PostDocs, Professors, or independent researchers annotated the data. |
students | BSc or MSc students annotated the data. |
authors | Researchers listed as authors on the paper. |
inherent | The truth is embedded in the data in some way. Could be just analysing the data the way it is, or the truth was added to the data in the way it was created. |
none | Data was not annotated |
unknown | The dataset was annotated, but it is not clear who annotated the data |
numeric
This dimension quantifies an object regarding the number of contained elements, which shall support to estimate whether a data set contains a sufficient amount of entries for specific training tasks.
dimension*
The granularity classifies an object regarding the scope of the elements contained in the data set.
Characteristic | Extraction Rule |
word | a single token/word |
phrase | a substring of a sentence |
sentence | a full, grammatically correct sentence |
structured/tabular text | a structured text (use case specification, user story, feature table) |
user story | a structured text following the Cohn/Connextra template ('As a <user> I want to <goal> so that <justification>.') |
use case | a structured text describing a set of connected scenarios in the form of consecutive steps |
requirement | a structured functional, non-functional, or process requirement |
section | a full, coherent section |
document | a full, coherent document |
global | all textual requirements artifacts associated to the product/service |
dimension*
The accessibility classifies an object regarding the degree to which it is currently available and usable.
Characteristic | Extraction Rule |
open access | The dataset is hosted in a service that satisfies the following criteria: (1) Immutable URL: cannot be altered by the author or someone else, (2) Permanent: the hosting organization has a mission to maintain artefacts for the foreseeable future, (3) Accessible: There is a DOI pointing to the real datasource URL, (4) Open-Source License: The dataset has a proper licence which grants access and re-use of data, material, and source code |
available in paper | The dataset is small enough that the authors disclose the entire dataset in the paper itself (e.g. a set of 14 requirements, listed in a table). |
reachable link | The dataset is reachable now, but is missing some aspect above to be considered Open Access. |
broken link | Link in paper, but does not resolve. |
no link | A dataset is discussed, but no link is provided. |
upon request | Authors say the dataset is available upon request. |
private | The authors say that a dataset exists, but is private for some reasons (such as industry collaboration with private data, etc.). |
proprietary | The approach is available but proprietary |
scope note
The source or link is the pointer towards the location where the object can be found.
An approach is an implementation of automatic detection of a violation against the formal rule which the quality attribute entails.
dimension*
The type of proposed solution classifies an approach regarding the general paradigm utilized to implement a detection algorithm.
Characteristic | Extraction Rule |
rule-based | Violations against the quality attribute are detected based on a static set of predefined rules. |
supervised ml | The detection of violations against the quality attribute is realized through a supervised machine learning approach. |
unsupervised ml | The detection of violations against the quality attribute is realized through an unsupervised machine learning approach. |
supervised dl | The detection of violations against the quality attribute is realized through a supervised deep learning approach. |
unsupervised dl | The detection of violations against the quality attribute is realized through an unsupervised deep learning approach. |
dimension*
The accessibility classifies an approach regarding the degree to which it is available.
Characteristic | Extraction Rule |
open access | The approach is hosted in a service that satisfies all of the following criteria: (1) Immutable URL: cannot be altered by the author or someone else, (2) Permanent: the hosting organization has a mission to maintain artefacts for the foreseeable future, (3) Accessible: There is a DOI pointing to the real approach URL, (4) Open-Source License: The approach has a proper licence which grants access and re-use of data, material, and source code |
open source | The approach is available for all to use and the codebase has been disclosed |
reachable link | The approach is reachable now, but is missing some aspect above to be considered Open Access. |
broken link | A link is given in paper, but does not resolve. |
no link | An approach is discussed, but no link is provided. |
upon request | Authors say the approach is available upon request. |
private | The authors say that an approach exists, but is private for some reasons (such as industry collaboration with private data, etc.) |
proprietary | The approach is available but proprietary |
scope note
The source or link is the pointer towards the location where the object can be found.
dimension*
This dimension determines whether an approach has been evaluated with some sort of empirical method: this can be a formal experiment comparing the efficiency of the approach, but may also appear in the form of interviews confirming the findings of the approach.
Characteristic | Extraction Rule |
true | An empirical method has been applied to validate the approach. |
false | The approach has simply been postulated without any empirical validation. |
dimension*
This dimension captures whether the application/evaluation of the approach involved actual practitioners. We currently do not differentiate whether the practitioners involved with the evaluation were also the practitioners who worked with the data set used for the evaluation.
Characteristic | Extraction Rule |
true | The evaluation of the approach involved practitioners, which primarily work in industry. |
false | The evaluation of the approach involved no practitioners (hence instead: authors, research staff, students, etc.). |
dimension cluster
The relase classifies an approach regarding the type of solution that was disclosed to the public. While some approaches are disclosed in the form of executable tools, also publishing the source code in order to improve reuse and maintainance shall be encouraged.
Dimensions | Extraction Rule |
tool | A standalone tool |
webservice | A online interface hosted as a webservice |
library | A library |
api | An API or library |
code | The source code of the approach |
notebook | A (Jupyter) notebook demonstrating the approach |
model | A pre-trained model (resulting from an ML/DL solution) |
Characteristics | Extraction Rule |
y | has been released |
has not been released |
dimension cluster
The necessary information classifies an approach regarding the type of information that needs to be available in order to automatically determine a violation against the formal rule.
Extraction rule: To automatically determine a violation against the formal rule of the quality factor,
Dimensions | Extraction Rule |
part-of-speech tags | an association of each token with its corresponding part-of-speech tag |
dependency tags | an association of each token with the token it depends on |
consistyency tags | an association of each token with its parenting constituent |
lemmatization | an association of each token with its lemmatized form |
stemming | an association of each token with its word stem |
phrase chunks | an association of phrases to containing chunks |
stop word removal | the automatic removal of words that do not add value to the text |
semantic role labeling | the annotation of semantic roles to parts of the text |
thesaurus | a graph connecting words with synonyms |
named entity recognition | the automatic recognition of named entities from noun phrases |
parse tree | an acyclic graph representing the syntactical hierarchy of a sentence |
Characteristics | Extraction Rule |
y | is necessary |
? | is unclear whether it is necessary |
is not necessary |