Requirements Quality Factor Ontology

RQFO

Guideline

The following sections describe how to identify and extract eligible objects from primary studies. Mandatory attributes are marked with an asterisk.

factor

A quality factor (QF) represents a normative metric which maps a textual requirement of a specific granularity to a scale and therefore informs about the quality of that input.

name

scope note*

The name of a quality factor shall reflect the quality factor in relation to the whole set of quality factors. This means that the name can be extracted from the paper directly, but if the name does not sufficiently demarcate the quality factor from others, then it is feasible to select a new name. This is due to the fact that several publications propose and name a quality factor without awareness of other quality factors, which makes demarcation by design impossible.

aspect

dimension cluster*

This dimension-cluster is built on the conceptual notion that a quality factor is a normative rule, where a violation against it is hypothesized to have an impact of the type aspect on an activity in which the requirement is used. It captures all quality aspects that are explicitly mentioned in the description of the impact of the QA.

Extraction rule: A violation against the quality factor has an impact on

Dimensions	Extraction Rule
adequacy	the appropriateness of the requirement in its respective context
atomicity	the confinement of the requirements scope to only one, not further splittable element
completeness	the explicit availability of all relevant information
compliance	the adherence to external rules
consistency	the satisfiability of all requirements in conjunction
correctness	the alignment of the stated text with the intended objects
designindependence	the confinement of the requirement to the problem space
feasibility	the chance of realistically implementing the requirement
maintainability	the ability to continuously ensure the quality of the requirement
modifiability	the ability to change the requirement
necessity	the singularity of written text
precision	the level of unique specification of the text
reusability	the ease of reuse
simplicity	the intricacy of the written text
traceability	the explicit connections to other artifacts
unambiguouseness	the unique interpretation of a requirement
understandability	the comprehensibility of written text
verifiability	the ability to assess whether a requirement is met

Characteristics	Extraction Rule
-	in a negative way
+	in a positive way
?	in an unknown way
	no way at all

linguistic complexity

dimension*

The linguistic complexity classifies a quality factor regarding the type of information that needs to be available to determine a violation against the quality factor. This informs about the complexity of automatically detecting this quality factor: while lexical factors can be decided for example using regular expressions, syntactic factors using POS, constituency or dependency parsing, and structural using metadata, semantic factors require an understanding of the input, which might only be approximated using thesauri or a relationship to an ontology.

Extraction rule: In order to determine a violation against the quality factor, one must know at least …

Characteristic	Extraction Rule
lexical	the literal words (i.e., a regular expression can be used to automate the rule).
structural	the structure of sentences (i.e., metadata (headings, emphasis, …) is necessary to automate the rule).
syntactic	the grammatical relationships between words (i.e., POS tags, constituency tags, dependency tags, etc. can be used to automate the rule).
semantic	the meaning of the words (i.e., a semantic comprehension of the text is necessary to automate the rule).

scope

dimension*

The scope classifies a quality factor regarding the extent of information that is necessary in order to determine a violation against the formal rule of the quality factor. The minimal scope shall be chosen, i.e., for one violation against a given quality factor, how much textual information must be seen to detect that violation? The classification of the scope shall not be approached from a standpoint of 'ensuring that a given input document is free of violations against that factor', because that would always entail a global/document scope.

Extraction rule: To determine one violation against the formal rule of the quality attribute, it suffices to see …

Characteristic	Extraction Rule
word	a single token/word
phrase	multiple, coherent words
sentence	a full, grammatically correct sentence
structured/tabular text	a structured text (use case specification, user story, feature table)
user story	a structured text following the Cohn/Connextra template ('As a <user> I want to <goal> so that <justification>.')
use case	a structured text describing a set of connected scenarios in the form of consecutive steps
requirement	a structured functional, non-functional, or process requirement
section	a full, coherent section
document	a full, coherent document
global	all textual requirements artifacts associated to the product/service

description

A description object explains (a) what the quality factor means and (b) how this quality factor is hypothesized to inform about the quality of the requirement. A quality attribute can be associated with multiple descriptions, which may be the result of parallel work or updating definitions or impact descriptions.

definition

scope note*

A definition is an informal rule, which must be complied with in order to ensure good quality of the requirement according to the authors. The definition may be simply postulated, but can also be derived empirically from data or developed in collaboration with industry.

impact

scope note

The impact scope note explicitly describes how the QA affects the actual quality of the requirements. A manuscript should make the hypothesized impact explicit, but does not need to in order to be included.

empirical evidence

dimension*

This dimension captures whether the given description and/or impact is rooted in any sort of empirical evidence. This may simply be practitioners reporting violations against the quality factor as a challenge, or an investigation of requirements artifacts. Empirical evidence for the description or impact corroborates the relevance of this QF.

Characteristic	Extraction Rule
true	An empirical method has been applied to validate the definition or the impact (or both) of the quality attribute.
false	The quality attribute has simply been postulated without any empirical validation of its definition or impact.

practitioners involved

dimension*

This dimension indicates whether practitioners - collaborators working primarily in industry - were involved in the creation or validation of the quality attribute.

Characteristic	Extraction Rule
true	Practitioners were involved in the validation of the description or impact of the quality attribute.
false	No practitioners were involved in the validation of the description or impact of the quality attribute or there was no empirical method applied at all.

dataset

A data set object is an arbitrarily large set of natural language requirements, which may make one or more specific quality factors explicit (e.g., through annotations) and are usable as gold standards to evaluate newly proposed approaches.

description

scope note*

The description of a data set object contains information about the origin of this data. Descriptions may be vague in the case of confidential data or explicit in the case of open-source data.

origin

dimension*

This dimension classifies a data set object regarding the type of the author. If no author is ever explicitly mentioned in the reference and accessing the reference does not reveal the author either, the data sets author must be exposed as unknown.

Characteristic	Extraction Rule
practitioner data	Data that was extracted from contexts in which practitioners work
student data	Data that was created or extracted in the context of student work.
mocked data	Data that was fabricated for the purpose of being studied.
unknown	A dataset exists, but it is not clear who created the data.

ground truth annotators

dimension*

This dimension classifies an object regarding who is responsible for annotating the ground truth embedded in the data set if such an annotation exists. Data that is used as is without any additional information embedded into it have a ground truth annotator of none.

Characteristic	Extraction Rule
practitioners	Practitioners annotated the data.
researchers	Ph.D., PostDocs, Professors, or independent researchers annotated the data.
students	BSc or MSc students annotated the data.
authors	Researchers listed as authors on the paper.
inherent	The truth is embedded in the data in some way. Could be just analysing the data the way it is, or the truth was added to the data in the way it was created.
none	Data was not annotated
unknown	The dataset was annotated, but it is not clear who annotated the data

size

numeric

This dimension quantifies an object regarding the number of contained elements, which shall support to estimate whether a data set contains a sufficient amount of entries for specific training tasks.

granularity

dimension*

The granularity classifies an object regarding the scope of the elements contained in the data set.

Characteristic	Extraction Rule
word	a single token/word
phrase	a substring of a sentence
sentence	a full, grammatically correct sentence
structured/tabular text	a structured text (use case specification, user story, feature table)
user story	a structured text following the Cohn/Connextra template ('As a <user> I want to <goal> so that <justification>.')
use case	a structured text describing a set of connected scenarios in the form of consecutive steps
requirement	a structured functional, non-functional, or process requirement
section	a full, coherent section
document	a full, coherent document
global	all textual requirements artifacts associated to the product/service

accessibility

dimension*

The accessibility classifies an object regarding the degree to which it is currently available and usable.

Characteristic	Extraction Rule
open access	The dataset is hosted in a service that satisfies the following criteria: (1) Immutable URL: cannot be altered by the author or someone else, (2) Permanent: the hosting organization has a mission to maintain artefacts for the foreseeable future, (3) Accessible: There is a DOI pointing to the real datasource URL, (4) Open-Source License: The dataset has a proper licence which grants access and re-use of data, material, and source code
available in paper	The dataset is small enough that the authors disclose the entire dataset in the paper itself (e.g. a set of 14 requirements, listed in a table).
reachable link	The dataset is reachable now, but is missing some aspect above to be considered Open Access.
broken link	Link in paper, but does not resolve.
no link	A dataset is discussed, but no link is provided.
upon request	Authors say the dataset is available upon request.
private	The authors say that a dataset exists, but is private for some reasons (such as industry collaboration with private data, etc.).
proprietary	The approach is available but proprietary

source link

scope note

The source or link is the pointer towards the location where the object can be found.

approach

An approach is an implementation of automatic detection of a violation against the formal rule which the quality attribute entails.

type

dimension*

The type of proposed solution classifies an approach regarding the general paradigm utilized to implement a detection algorithm.

Characteristic	Extraction Rule
rule-based	Violations against the quality attribute are detected based on a static set of predefined rules.
supervised ml	The detection of violations against the quality attribute is realized through a supervised machine learning approach.
unsupervised ml	The detection of violations against the quality attribute is realized through an unsupervised machine learning approach.
supervised dl	The detection of violations against the quality attribute is realized through a supervised deep learning approach.
unsupervised dl	The detection of violations against the quality attribute is realized through an unsupervised deep learning approach.

accessibility

dimension*

The accessibility classifies an approach regarding the degree to which it is available.

Characteristic	Extraction Rule
open access	The approach is hosted in a service that satisfies all of the following criteria: (1) Immutable URL: cannot be altered by the author or someone else, (2) Permanent: the hosting organization has a mission to maintain artefacts for the foreseeable future, (3) Accessible: There is a DOI pointing to the real approach URL, (4) Open-Source License: The approach has a proper licence which grants access and re-use of data, material, and source code
open source	The approach is available for all to use and the codebase has been disclosed
reachable link	The approach is reachable now, but is missing some aspect above to be considered Open Access.
broken link	A link is given in paper, but does not resolve.
no link	An approach is discussed, but no link is provided.
upon request	Authors say the approach is available upon request.
private	The authors say that an approach exists, but is private for some reasons (such as industry collaboration with private data, etc.)
proprietary	The approach is available but proprietary

source link

scope note

The source or link is the pointer towards the location where the object can be found.

empirical method applied

dimension*

This dimension determines whether an approach has been evaluated with some sort of empirical method: this can be a formal experiment comparing the efficiency of the approach, but may also appear in the form of interviews confirming the findings of the approach.

Characteristic	Extraction Rule
true	An empirical method has been applied to validate the approach.
false	The approach has simply been postulated without any empirical validation.

practitioners involved

dimension*

This dimension captures whether the application/evaluation of the approach involved actual practitioners. We currently do not differentiate whether the practitioners involved with the evaluation were also the practitioners who worked with the data set used for the evaluation.

Characteristic	Extraction Rule
true	The evaluation of the approach involved practitioners, which primarily work in industry.
false	The evaluation of the approach involved no practitioners (hence instead: authors, research staff, students, etc.).

releases

dimension cluster

The relase classifies an approach regarding the type of solution that was disclosed to the public. While some approaches are disclosed in the form of executable tools, also publishing the source code in order to improve reuse and maintainance shall be encouraged.

Dimensions	Extraction Rule
tool	A standalone tool
webservice	A online interface hosted as a webservice
library	A library
api	An API or library
code	The source code of the approach
notebook	A (Jupyter) notebook demonstrating the approach
model	A pre-trained model (resulting from an ML/DL solution)

Characteristics	Extraction Rule
y	has been released
	has not been released

necessary information

dimension cluster

The necessary information classifies an approach regarding the type of information that needs to be available in order to automatically determine a violation against the formal rule.

Extraction rule: To automatically determine a violation against the formal rule of the quality factor,

Dimensions	Extraction Rule
part-of-speech tags	an association of each token with its corresponding part-of-speech tag
dependency tags	an association of each token with the token it depends on
consistyency tags	an association of each token with its parenting constituent
lemmatization	an association of each token with its lemmatized form
stemming	an association of each token with its word stem
phrase chunks	an association of phrases to containing chunks
stop word removal	the automatic removal of words that do not add value to the text
semantic role labeling	the annotation of semantic roles to parts of the text
thesaurus	a graph connecting words with synonyms
named entity recognition	the automatic recognition of named entities from noun phrases
parse tree	an acyclic graph representing the syntactical hierarchy of a sentence

Characteristics	Extraction Rule
y	is necessary
?	is unclear whether it is necessary
	is not necessary