NOISE DETECTION IN SOFTWARE REQUIREMENTS SPECIFICATION DOCUMENT USING SPECTRAL CLUSTERING

Software Requirements Specification (SRS) is a specification of functional (capabilities) and non-functional (constraints, characteristics, qualities, and properties) of a software that should be developed based on the conditions or capabilities that must be equipped the software or required by the users. For developers, SRS document is used as a reference in every stage of development. In the process of making the SRS document, there could be some defects. One of those defects, which were called as The Seven Sins of Specifier, is noise. Noise in the SRS could occur during the software requirements specification process. It lateralized in the form of information that is not relevant to the list user requirements. This study aims to detect noise within the SRS document. The noise detection uses spectral clustering to separate and group noise (which is considered as outlier) aside from the rest of requirements. In this method, by using natural language processing each requirement statement is represented as a vector of frequencies of unique words appeared within the requirement statement. A cluster that has the widest average distance between its individual to its centroid is considered to have noise. The experimentation shows that the proposed method has a high sensitivity, i.e. 1.0, for both real and synthetic data. Nevertheless, the method has a low specisivity for real data, i.e. 0.19, compare to the synthetic data, i.e. 0.83.


I. INTRODUCTION
N requirements specification stage, software development becomes an important process according to Boehm and Basili [1].Fixing errors during the implementation process would spend exponentially higher cost compared to fixing the errors during requirements and design processes [2].Therefore, developers should pay more attention on requirements process.
There are three approach methods that can be used to analyze requirements specification, namely: natural language, semi-formal language, and formal language.Natural language is a requirements specification formulation approach which uses common language.Semi-formal language uses a graphical language and textual explanation within.Formal language uses mathematical concepts such as finite-state machines.According to Rossi [3], as much as 71,8% software requirements specification documents have been created using natural language approach.This is because the approach is easily understandable for software engineers as well as the customers with various background.
Meyer [4]- [8] explains that the process of Software Requirements Specification using natural language has several weakness compared to formal language.The weakness is caused by seven common mistakes which are often done by software developers.Meyer formulated the seven mistakes as "The seven sins of specifier".One of seven sins of specifier is noise.Noise may appear in Software Requirements Specification (SRS) document.In a I requirements specification, noise occurs when software developers add some irrelevant information to the overall software requirements.
As the development of information technology, noise on the data has a lot of meaning.Noise can represent useless data, imperfect data, or data that cannot be read by a computer [9].Noise can affects the results of data mining analysis.Within this study, a noise is considered as any statement within the requirements specification that is not relevant to the system being developed or is not relevant to the requirements engineering process [2], [10].There are several studies about noise or outlier detection of text data.Detection of text deviations has been developed using conceptual graphs [11].The study provides graph visualization to the deviations that occur in a text data.
Clustering methods can be used to detect noise or outliers in the data [12]- [17].Clustering classifies objects which have similar characteristics within specific clusters.Outliers is an object that has dissimilar characteristics with respect to the rest of its neighbors.Cai et al [12] developed an application based on spectral clustering to detect a sentence which is considered as noise within a summarization section within a document.The output of the application is the outlier sentences within the summarization section.Furthermore, another study used onepass clustering to detect outliers on existing datasets [18].The method divides the dataset into several hyperspheres that have almost the same radius.Clustering results are divided into two groups, namely: "normal" or "outlier" based on its outlier factors.In addition, the k-Means clustering method has also been developed to detect noise [13], [19], [20].
The noise detection based on spectral clustering in a textual document has been developed as a preprocessing stage in text summarization method [12].The study used spectral clustering algorithm to detect noise in sentences.Spectral clustering method with noise detection [21] was considered in detecting noise in software requirements.The study uses a synthetic data for the experimentation.Spectral clustering methods have shown better performance than k-Means clustering or clustering based on hierarchy [21].One of the advantages of spectral clustering is that it can adapt to various forms of data.Clustering methods can also be implemented to look for data that have fewer characteristic similarities to other data or can be considered as noise.
This research proposes a method for detecting noise in Software Requirements Specification (SRS) by using spectral clustering.This research is intended to provide information for developers about the characteristics of a set of requirements statement in a Software Requirements Specification (SRS).

II. RESEARCH METHODOLOGY
There are some three main steps in this process, i.e. text preprocessing, spectral clustering, and outlier selection.

A. Text Preprocessing
Software Requirements Specification (SRS) document contains a list of requirement statements.Before the outlier detection process, each statement needs to be textually preprocessed.The preprocessing involves tokenizing, lowercase converting, stopword removing, and lemmatizing sentences of each requirement statement.For the sake of illustration, let's consider the following requirement statement.
Submit jobs with the associated deadline cost and execution time.
Basically, tokenizing is a process that breaks a text into its individual linguistic units.The statement would be first tokenized into the following tokens.
Submit / jobs / with / the / associated / deadline / cost / and / execution / time /.Then, each token is converted into lowercase.submit / jobs / with / the / associated / deadline / cost / and / execution / time /.The next process is removing stopwords.Stopword removal is a text preprocess that removes any token of word that listed in a predefined stopword corpus.The stopword removal produces the following tokens.
submit / jobs / associated / deadline / cost / execution / time The final process of text preprocessing is lemmatizing each token.The lemmatization would identified any inflectional form of word and returns its base or dictionary form of the word.In this study, it changes plural noun into singular noun and inflected ending of verb into its base form.

B. Spectral Clustering
The next step is clustering the set of preprocessed requirement statements using Spectral Clustering.The clustering should group statements that describe closely related functions into the same cluster.Assume G=(V,W) is an undirected weighted graph with node V contains points {v i ϵℝd|i=1,2,...,n}, where d is the amount of features in TF-IDF.Matrix W is a symmetrical matrix where the elements are values of gaussian similarity between data in V. D is a diagonal matrix where dii=Σjwij, then make a normalized Laplacian matrix (L).
Then, it calculates eigenvector of Lnorm.It takes only first k-eigenvector and creates matrix X=[xij]nxk from them, where k is the number of clusters.Next, it normalizes the length unit for each row of matrix X.Each row of X corresponds to an initial data of TF-IDF.It calculates k-means of matrix X, where each row represents a data point in k-dimension.The clustering process would produce k clusters, that contain data point X.The detail of spectral clutering is described in Figure 1.

C. Outlier Selection
The clusters produced from the previous result are supposed to separate the cluster that contains noise statements from the rest of the clusters that contains requirement statements.The noise statements are considered the That means that this outlier cluster contains noise statement.

III. RESULTS AND DISCUSSION
For the experimentation, this study uses 648 manually extracted software requirement statements from 14 SRS documents.Table I shows the dataset used in this study.Each document refers to different project from various problem domains.Each requirements statement labeled by 3 human annotators.The labels are Boolean type values.An annotator would label a statement with 1 (true) if and only if he/she thought that the statement is a noise.An annotator would label a statement with 0 (false) if and only if he/she thought that the statement is a relevant requirement statement.In this study, the method was considered as the fourth annotator that would be evaluated based on its reliability with respect with the human annotators.Therefore, the same labeling rules applied to the method.The reliability of the method would be measured with each annotator using a confusion matrix.Table II shows the confusion matrix used in this study.
The experimentation scenario follows these steps: a. Store each preprocessed statement of each dataset into a single .csvdocument.Each statement is separated by a line.b.Create a questionnaire form for each dataset.Each human annotator labels each statement within the questionnaire with 1 (noise) or 0 (not-noise, good requirement statement).Validate the result and aggregate the answers from all annotators based on majority rule.c.Cluster each dataset (.csv) using spectral clustering algorithm.Select the outlier cluster.Based on the selected outlier cluster, label each statement.d.Measure the reliability between method and each human annotator.The noise prediction result generated by the proposed method was evaluated using kappa statistics.It measures the reliability of method with respect to the human annotator.The measurement is expressed by a kappa value [22].Table II shows the confusion matrix used for calculating kappa coefficient value.Kappa coefficient measures the degree of agreement of 2 assessments.In this study, the value of kappa coefficient used to represent consistency between noise prediction and noise assessment by human annotators.This measurement also used to determine the consistency of noise assessment among the human annotators.Calculation of the kappa coefficient requires the number of events of assessor 1 and assessor 2 in giving a noise assessment to the software requirements statements.In this study, the assessor 1 and assessor 2 can either be the noise detector, and the human annotators, or one human assessor and other human annotators.Assessor 1 and assessor 2 can provide the requirements statements a true or false value as a noise assessment.Here is the explanation of kappa statistics calculation input: -Ptp = The number of occurrences of human annotator and method (second annotator) give true values (TP) to the given requirement statement.-Pfn = The number of occurrences of human annotator gives a true value and method (second annotator) gives a false value (FN) to the given the requirement statement.-Pfp = The number of occurrences of human annotator gives a false value and method (second annotator) gives a true value (FP) to the given the requirement statement.
Ptn = The number of occurrences of human annotator and method (second annotator) give false values (FF) to the given requirement statement.
Based on GWET-AC1, the kappa value () can be calculated if the number of assessments of assessor 1 and assessor 2 is known.First the probability of the observed agreement between the two annotators, or Po is calculated by (2).
Then the probability of the two annotators give a random value of true, or we can call it PT, can be calculated as well as the value of PF for the value of false.The probability value of random agreement, or Pe, can be calculated by (3).Thus, the coefficient value of kappa () can be calculated by (4).

P e = P T + P F
(3) As already mentioned, to evaluate the performance of the proposed method, this study compares the annotations by the human annotators with the annotations by the method.The experiment was done by calculating the kappa coefficient value on the noise assessment made by both noise detector method and the human annotators.In this study there are 3 human annotators involved.Each annotator was given 14 questionnaires.Each questionnaire contains requirement statements of a specific software project.The annotator annotated each statement of a given questionnaire manually.In each questionnaire, a SRS document was also attached to allow the annotator to grasp the mission and business background of the developed system.
The selection of human annotators was based on their professional and academic experiences in the field of The noise prediction result from the noise detection system is compared with the noise assessment from each human annotator and the majority answer of the three human annotators, as shown in Table III.In the table, the consistency between the noise prediction and the noise assessment from the human annotators is still very low.The highest consistency is resulted from the noise prediction and the majority noise assessment from the three human annotators, with kappa coefficient value 0.4426.
A comparison of reliability assessment among the three human annotators had also been done to deepen the noise detection evaluation information in the software requirements specification document, as shown in IV.Through this experiment scenario, the consistency of the noise assessment among the three human annotators can be concluded.
Each of SRS (Software Requirements Specification) documents has different structure of natural languages.The different structure of natural language in the SRS (Software Requirements Specification) documents could be the reason why the kappa values of each dataset was at the low level.Table IV is one example of a software requirements statements that produces a kappa value between the noise prediction and the noise assessment of all three human annotators by majority.As in the table, the requirements statements that produce high coefficient kappa value, has a simple structure and has a short length.
To further analyzed the result, Table V shows the detail reliability measurements of each dataset.The table shows five measurements, i.e. kappa score, kappa GWET's AC1, sensitivity, specificity, and F1-score.The kappa score and kappa GWET's AC1 measures reliability of annotators.The sensitivity measures the ability of the method to correctly identify noise statements.The specificity measures the ability of the method to correctly identify not-noise statements.Thus, F1-score measures the balance between sensitivity and specificity of the method.This experimentation also compared two policies on selecting outlier cluster.Column 'A' refers to the policy where a cluster with the furthest distance is the outlier cluster (proposed in this method).While column 'B'refers to the policy where a cluster with the closest distance is the outlier cluster.The result shows that in general by using the further distance policy, the method would perform better.This indicate that clustering process tends to group noise statement separated from the rest of the requirement statements.Nevertheless, the balance between sensitivity and specificity is low.This may be due to two reasons.First, some of the statement is relevant to the system, but not a requirement.During the requirements engineering process, engineer might add statements related to project management (e.g.project schedule, human resource, deliverables, etc), design (e.g.approach, modeling language, etc.), implementation (framework, programing language, vendors, etc.), or testing (e.g.tester, test methods, test data, etc.)These statements may have close distance to the rest of requirement statement, but they are also a noise.Second, the text preprocesses consider all types of POS.However, a requirement statement primarily contains two important parts, i.e. actor and action.An actor is formed by a noun phrase.An action is formed by a verb phrase.Therefore, many unnecessary POS were included in the sentence vector that shape the affinity matrix.This cause that the similarity value between sentences are not well distributed between noise and not noise.
Based on the above experimental results, the noise prediction and the noise assessment from the three human annotators yields very low kappa value.Thus, the noise prediction has low consistency with the noise assessment from human annotators.This means that the method is considerably in fair agreement with the human annotators.This low consistency could be the result of the structure of natural language used on the software requirements statements.From further observation on each statement, the experimentation also indicates that a simpler structure sentences would be more accurately predicted than the complex ones.

IV. CONCLUSION
The noise detection in the software requirements specification has benefits in software development.The benefits of noise detection in the software requirements include avoiding errors as early as possible so as to reduce repair costs and can provide information about software requirements priorities.
Spectral clustering methods can be used to detect noise in software requirements.However, there are several factors that affect the performance of noise detectors, such as sentence structure, sentence length, and number of software requirements present in a software requirements engineering document.The experiment in this research uses kappa coefficient value which measure the level of agreement between noise prediction by noise detector and noise assessment by 3 human annotators.The resulting kappa coefficient is 0.4426 which is still unsatisfactory.
Further study may focus on adding classification process before the clustering process in order to improve the performance of the method.The classification process would separate the non-requirement but project related statement from the rest of requirement statements.In addition, further research is required to improve the text preprocesses in order to make sure that only actor or action related tokens is included in the sentence vector.

V. RECOMMENDATION
For the further research, the analysis of sentence context in detecting noise can be done.The noise rating would be better if done by an expert in software requirements engineering.

Fig. 1 .
Fig. 1.Spectral Flowchart All human annotators have academic background in software engineering.They have experience in teaching requirements engineering course in university level.They have been working on software requirements specification document for at least the last five years.