Natural Language Processing for Requirements Engineering

Aug 27, 2023


Natural Language Processing (NLP) is a branch of artificial intelligence that aims to allow machines to comprehend, interpret, and generate human language. It comprises developing algorithms and models capable of processing natural language input such as text, voice, and pictures in order to do activities traditionally performed by humans. Recent developments in machine learning technology, as well as the availability of large natural language datasets, have allowed NLP to make great strides in recent years. Quality checks, extraction, classification of requirements, requirements modeling, traceability of requirements, and retrieval are the six main areas of focus for NLP tools and studies. In this article, I discuss these advances in NLP for requirements engineering (RE). NLP for requirements engineering is a growing field of study, yet there is a disconnect between research findings and practice. This is because there aren’t enough high-quality data sources and domain-specific requirements sources. Despite this, scientific progress has been made in showing potential. The community of practitioners should collaborate with academics and tool suppliers to influence the direction of NPL for RE.

Natural Language Processing for Requirements Engineering

1 Introduction

Natural Language Processing (NLP) is a subfield of Artificial Intelligence (AI) that seeks to enable machines to comprehend, interpret, and produce human language. It entails creating algorithms and models capable of processing natural language data such as text, audio, and images in order to perform tasks that previously required human intelligence, such as language translation, sentiment analysis, speech recognition, and text summarization. Large datasets, faster processing power, and more powerful machine learning techniques have enabled NLP to make enormous advances in recent years. Part-of-speech tagging, named entity recognition, text translation, sentiment analysis, and text classification are some typical NLP applications. NLP may be used to translate text from one language to another automatically. Google Translate, for example, employs NLP algorithms to translate text across over 100 languages. NLP may also be used to identify sentiments in text or speech. For example, this can be useful for analyzing consumer feedback or social media postings. A company may employ sentiment analysis to monitor social media to understand how customers feel about its goods and services. Since the early 1990s, requirements engineering (RE) researchers have been investigating the role of natural language processing and its potential applications. The academic community has looked at applications including detecting quality faults and ambiguity in requirements, classifying and grouping large sets of textual requirements, extracting important abstractions, creating models, and tracing requirements and code. Until recently, these endeavors were limited to research activities. However, developments in machine learning technology and the availability of massive natural language datasets have given previously unheard-of possibilities to apply NLP approaches to RE practice.

2 NLP for Requirements Engineering

NLP tools and research are classified into six major categories: (1) quality checks, (2) extraction, (3) classification, (4) modeling, (5) traceability, and (6) search and retrieval.

  • Quality checks: This research stream and its technologies aim to impose rigor in requirements wording and execute quality checks. The goal of research initiatives has been to help requirements reviewers ensure that requirements artifacts are unambiguous. By evaluating the language used in the requirements document, NLP may be used to automatically evaluate and assess the quality of software requirements. These quality checks include recognizing inconsistencies and ambiguities in requirements language, detecting requirements that are too ambiguous or too complex, recommending adjustments to requirements language, and identifying requirements that are incomplete or lacking critical information. Tools specialized to this job, such as ScopeMaster [5] and QVscribe [6], are developing on the market. These are just a handful of the numerous technologies that utilize NLP to validate requirements. As NLP technology advances, it is likely to see ever more inventive applications devoted to requirements quality checks.
  • Extraction: This category’s research and technologies have focused on identifying essential domain abstractions and notions. As requirements often contain domain-specific, complex phrases that are not regularly used, this work typically seeks to extract single or multi-word terms from requirements documents in order to construct domain-specific and project-specific glossaries. The retrieved glossaries may be used for a variety of purposes, including verification for completeness or consistency, product comparison, categorization, and modeling. Tools in this category have been proposed in academic work since the 90’s, e.g., AbstFinder [7]. However, to the best of our knowledge, commercial tools in this category are still not available. Highly likely due to low demand, and the features proposed so far are not a priority for the requirements engineering practice yet.
  • Classification: Research in this field explored the use of natural language processing to categorize requirements. The goal is to sort things into different groups depending on what we’re trying to learn from the text. Classification, when applied to customer feedback and online conversations, may assist in identifying input that is particularly concerned with new requirements, or comments relating to specific aspects of interest, sometimes together with the attitude and sentiments conveyed by the product’s users. Few tools emerged from research projects, e.g., NFRLocator [3] which categorizes non-functional requirements from natural language text.
  • Requirements modeling: The goal of this line of study and the resulting set of tools is to facilitate the process of building conceptual models from textual requirements. It includes a wide range of outcomes, such as UML models for analysis and design, feature model syntheses in a product-line engineering framework, and high-level models of early requirements or user stories for project scoping. The potential of the tools in this area is promising, and they are improving. The Natural Language Processing to Feature Diagram (NLPtoPF) [1] tool, for instance, analyzes NL documents (natural language requirements, or publicly accessible natural language descriptions of products), extracts common and variant characteristics, and visually constructs the product family in the form of a feature diagram. Visual Narrator [2]is another example; it transforms user stories into an entity-relationship conceptual model. While still in their infancy, these tools show great promise and pave the way for future exploration.
  • Traceability: Traceability of requirements refers to establishing links or relationships that exist between requirements or between requirements and other software artifacts. This research stream’s main goal is to replace manual traceability efforts. Automatic Traceability Analysis and Textual Similarity are two popular techniques used for implementing requirements traceability. Automatic Traceability Analysis: This NLP approach is used to automatically evaluate and extract useful information from a variety of sources, including requirements documents, design documents, test cases, and code. This analysis may aid in detecting relationships between distinct artifacts and tracking requirements across the development process. Textual Similarity: This NLP approach is used to assess the similarity of various documents and requirements. This may help find possible matches between requirements and other documents, as well as missing or incomplete requirements. To the best of our knowledge, tools using these techniques for traceability are yet to come.
  • Search & retrieval: The aim of this area of research is to reuse existing requirements sources to propose requirements, by recommending specific features based on existing software descriptions available online or other sources like online reviews. RE-SWOT [4], for example, uses NLP and visualization to allow competitive analysis from app evaluations.

     The best of NLP is yet to be seen! NLP for requirements engineering is a dynamic area of study, with an immense number of possible solutions and applications of NLP to requirements activities. However, we cannot help but note a significant gap between the research results and the practice. The quantity of commercial tools available demonstrates that there is minimal uptake. This might be attributed to two major issues: the quality and the quantity of data available to train NLP models. Large volumes of high-quality data are necessary to construct accurate and trustworthy NLP models. However, in industrial contexts, such data may not be readily available or easily accessible. Another issue is that requirements documents are domain-specific. NLP models trained on general text data may be ineffective when processing requirements documents containing domain-specific terminology or language. This necessitates the creation of domain-specific models or the use of transfer learning methods to tailor generic models to particular domains.

     Despite this constraint, the scientific community has made tremendous strides in demonstrating its potential. Once progress is achieved on major adoption hurdles such as data quality and quantity, as well as domain-specific data, we should see a genuine buzz of excitement from the industry. As a community of practice, we should actively engage with both researchers and tool vendors to influence the direction that NPL may take to shape our practice.

Author: Adam Alami

Adam Alami is an assistant professor with Aalborg University, Denmark. He has broad experience in information technology practices. His career began in software development, before progressing to include business analysis and project management. Involvement in major IT transformation projects has for twenty years been the mainstay of his work. His chosen fields of research fit within the broad topic of cooperative, social, and human aspects of software engineering. He has a keen interest in business analysis and contemporary software development practices. He holds a PhD degree in Computer Science from the IT University of Copenhagen, Denmark, a Master degree in Computer Science from the University of Technology (UTS), Sydney, and a Bachelor degree in Software Engineering from the Université du Québec à Montréal. Email: [email protected]. Twitter: @AdamAlamiDK.


  1. isti-fmt-nlp/tool-nlptofp: Cmt and fde tools for feature mining, and feature diagram design. (Accessed on 04/27/2023).
  2. Marcelrobeer/visualnarrator: Marcel robeer (2015-2019) *** developed for utrecht university re lab. (Accessed on 04/27/2023).
  3. Realsearchgroup/nfrlocator: Project to find and categorize non-functional requirements within uncon- strained natural language documents. contains nfr category listing, labeled documents, and software. (Accessed on 04/27/2023).
  4. Relabuu/re-swot: Visualization tool for competitive analysis from app reviews. RELabUU/re-swot. (Accessed on 04/27/2023).
  5. Software requirements analyser - 10x with scopemaster®. requirements-analyser/. (Accessed on 04/27/2023).
  6. Try qvscribe - qra. (Accessed on 04/27/2023).
  7. L. Goldin and D. M. Berry. Abstfinder, a prototype natural language text abstraction finder for use in requirements elicitation. Automated Software Engineering, 4:375–412, 1997.



Copyright 2006-2024 by Modern Analyst Media LLC