Transcript
Journal of Information & Computational Science 9: 18 (2012) 5755–5764 Available at http://www.joics.com
A Novel DB and RB Design Method of IDS System by Data Mining Dongying WANG ∗ Chongqing University of Posts and Telecommunications, Chongqing 400065, China
Abstract Because of rapidly change of current invasion system, the original database defense has been unable to meet the defense needs of database. Data mining technology was adopted to improve data rule base of the IDS system. According to the two kinds of intrusion rule database and data solutions, an improved IDS system was designed with a rule base of self-renewal and automatic update and perfect functions were realized for the data rules. Experimental results show that, the system can greatly improve the system defense performance, detection capability has been greatly improved, and the automatic defense system against invasion has been realized. Keywords: Data Mining; IDS; Database and Rulebase; Improvement; Intrusion Detect System
1
Introduction
Computer technology has been widely popularized since twentieth century. The scale of network has extended gradually into our life, our enterprises and countries by 2012, but along with the database capacity and information synchronization in a geometric increase and large groups of network attackers and victims. Recently, the network problems, such as CASN website’s data leaking by repeatedly invasion, were reported frequently, which seriously reminded us of network security in jeopardy and improvement of database rules. In study over Intrusion Detect System (IDS), Arman Tajbakhsh provided a fuzzy algorithm in his paper for the uncertainty because of low attribute boundary definition of numerical quantity. This algorithm can improve the mining accuracy, though, much more times to scan the sample database are needed and it is not easy to change the database rules [1]. Han et al proposed a digging method of only scanning sample database twice with higher efficiency, but lower accuracy and essentially unchanged database rules [2]. Their methods have a major problem in database rules. In view of the problem of rules improvement, an improving method of IDS database rules based on data mining is put forward in this paper. ∗
Corresponding author. Email address:
[email protected] (Dongying WANG).
1548–7741 / Copyright © 2012 Binary Information Press December 30, 2012
5756
2 2.1
D. Wang. / Journal of Information & Computational Science 9: 18 (2012) 5755–5764
Study on the IDS System in the Data Mining IDS based on host machine
Host Intrusion Detection System (HIDS) is a kind of intrusion detection system based on host machine, the detection system of which is located in the monitored host. It extracts and analyzes the operation data of monitored host system so as to realize the function of intrusion detection [3]. The detection range of HIDS is not the information from computer network but from the usage in computer’s host system and local system, focusing on the screening of abuse of internal privilege attack, key data modification, security configuration changes. HIDS product focuses on the detection of real-time network data as well as audit data on key computer, on-line connection with the information and synchronization analysis of the audit data so as to determine whether intrusion behavior happens.
2.2
IDS based on network
The detection region of Network Intrusion Detection System (NIDS) is the key area of network infrastructure and the object is the flow to other computer host. With the increasing popularity of computer network, information security has received more and more threats from network intrusion, at the same time becomes the aim of IDS [4]. NIDS mainly captures the computer data from important segment by wiretap technology, does packet processing, extracts the effective information and then determines the whether this behavior is intrusion behavior by comparing with the known intrusion features or normal network behavior. Taking the data such as protocol analysis, network traffic, the network management protocol etc. as a foundation, HIDS is used to detect the intrusion behavior.
2.3
Mixed distributed IDS
After comprehensive analysis of the advantages of two kinds of systems above, a mixed distributed intrusion detection system is developed. With the rapid development of network technology, the way of network intrusion becomes more and more high-tech, at the same time evolved into a complex behavior of several ways fusing and the intrusion path more secret. HIDS and NIDS cannot detect them totally, and then bring out the security vulnerability of active defense system. At the same time, only through coordination and joint detection of different types of intrusion detection system, extracting the advantages of NIDS and HIDS respectively, can be developed such a comprehensive mixed distributed intrusion detection system used for the detection of a variety of network intrusion behaviors and actions [5].
3 3.1
Problem Research and General Design of the System Problem research of database and rulebase in the IDS
In some complex application environment, using rules to realize the active mechanism will bring new problems.
D. Wang. / Journal of Information & Computational Science 9: 18 (2012) 5755–5764
5757
(1) The activity of active database is mainly reflected in the pre-definition rule. Therefore, the rule design directly affects whether the active database system can be successfully implemented. However, we still lack practical experience of using rules to realize the active mechanism. For example, in the practical application, there is lack of precedents about semantics expression by choosing regular expression, reflecting in the database structure or being written in the code. It is unable to use a common database design to solve these problems. Therefore, it is badly in need of new design method to define the active behavior of a system [6]. (2) In practice, especially in the dynamic environment, most of data is demonstrated in the lower layer (metalinguistic layer), where the metalinguistic data have no clear data relationship or law. At the same time, users are accustomed to analyzing the system conditions or expressing the semantics at higher level. Therefore, the expression of control metalinguistic at higher level is different from the data processing at low layer. New technology is needed in analysis and abstraction of data in lower layer. (3) Part of active sematic of application background was reflected in the active rule, but there is no guarantee that the needs and expectations of users are fully reflected in the rules. In the passive database application program, semantics including storing data quality is completely controlled and responsible for by the programmer, which will also make the right of control always stay in application program. However, during execution of active application, right of control will be continuous switching between application program and rules processing due to appearance of events. The implementation of rules is its own without outside interference. But rules may exist between some invisible unpredictable interactions due to the uncertainty. As a result, behavior control of the active application program, especially the active rules is more difficult. If bad rules behavior cannot be terminated, there will appear infinite trigger between the rules, which is difficult to detect and will continue the automated system program [7]. (4) Under the operating environment of active system, especially the dynamic environment, the rule set (Library) needs to be constantly updated to meet the timely reflection of latest application semantic for active rules. However, because of the difference between the design time of these rules and developers, conversion and understand among many rules was much difficult, dependency relations between rules are fuzzy. As a result, designers in the rules design easily make redundant or semantic errors which will bring tremendous hidden trouble to the system operation. Therefore, when the rules were updated, it should be clearly known which rules need to consolidate, which needs to be changed, which have lost semantic effect to reflect the reality. The more rules rule set (Library) contains, the more serious problems above are. Therefore, the discovery of new semantic knowledge, analyzing and updating the rule is also very important.
3.2
Selection of system tools
SQL Server 2005 is a relational database management system which organizes the construction of data management system around the database. It provides a powerful platform for the design and implementation of data analysis plan. Many data mining algorithms embedded in the Analysis Service (AS) function bring great convenience to the design and implementation of data mining system. Its reporting service function, however, also provides a convenient platform for extracting data patterns from a data set, solving the problems and demonstrating the results.
5758
3.3
D. Wang. / Journal of Information & Computational Science 9: 18 (2012) 5755–5764
General design of the database
When the data mining techniques are applied to the database IDS, it can be found effectively from a large number of audit data the detection rules and methods responding to invasion so as to self-improve and strengthen the rule base of internal database and achieve the function of selfimprovement, intelligent invasion resistance. Therefore, when establishing intrusion detection rule base of database, data mining technology is required to search the method dealing with intrusion detection and intrusion prevention. Once the users’ behavior is abnormal, a corresponding rule database is immediately generated, and then detection for the current audit data is conducted to judge whether intrusion behavior occurs and to make corresponding measures. The premise of establishment this model is that a certain behavior law is reflected in the process of users long-term operation on the database [8]. Based on the above analysis, in the study, we produce the entire database structure model based on the data mining in accordance with the above requirements, the function module and its effects are as following: (1) Fault case library: fault case library stores the condition modules all fault cases must have, decision conclusion part is stored in result table and the two forms have respective numbers for external connection. (2) Signature database: we store the signature data collection obtained through data mining as well as the optimal solution set as the standby (3) Conclusion diagnosis library: this module is the most important part in our whole system, contained by self-perfect diagnostic conclusion table form. Every time when abnormal data are detected in the database through the IDS, diagnostic conclusion table will automatically search and call directly meeting similar situations, and if the same problem could not be solved, diagnosis library will upload data to manual processing.
Fig. 1: Main function module chart of database The faulty cases library and conclusion diagnosis library compose of faults knowledge database,
D. Wang. / Journal of Information & Computational Science 9: 18 (2012) 5755–5764
5759
maintenance of knowledge database mainly include the addition, modification, deletion and save operation etc. of cases.
4
Design Principles of Data Mining Rulebase
According to different intrusion detection method, intrusion detection can be subdivided into two types: one is the misuse detection; another is anomaly detection. Similarly, the rule base correspondingly can be subdivided into two types: misuse detection rule base and anomaly detection rule base [9]. Active database emphasizes on initiative and intelligence, this initiative is conducted through active mechanism. The more popular present active mechanism is the trigger mechanism. From the CODASYL database system before, ON condition has the embryonic concept of triggers; to the use of System R, TRIGGER and ASSERT commands reflects the necessity of triggers. From today’s popular commercial database systems such as Oracle, SQL Server, Sybase etc., we can also see the simple trigger mechanism. However, in the strict sense, this is not the true active database system. The main reasons are the following two points: (1) The triggers’ structures are simple in the system above which cannot form more complex events and also meet the users’ requirements of setting up the desired events; in addition, the trigger is inspired only through the immediate implementation, the application scope is limited to data integrity constraints of the database system, such as achieving the entity, reference, and user defined integrity constraints. (2) The trigger mechanism is achieved on special facilities in particular system, which cannot reflect the generality and unity, and it greatly depends on the software system so that implementation methods of the active function in the application are different. In addition to the general database functions of a variety of data definition, operation, maintenance and management, active database has an effect on responding the monitoring and trigger execution of any event or state change [10]. Transferring semantics reflecting the behavior from the program code to the database management system is an important characteristic of active database different from other database, therefore, active database should include the following mechanisms: (a) Determination of active behavior: ECA rules are usually embedded in the active database system, to ensure the timely response to emergency situations. That is to say, an event causes the state changes of system detection database, as long as meeting the conditions, compulsory action will automatically execute. (b) Implementation of active behavior: the system can monitor the occurrence of corresponding event and the change of the system state, and respond to them. This process is automatically completed by the system without manual intervention. That is to say, system should have the capacity dealing with active rules (rules of ECA) during operation, and the processing of active rule (execution model) [11].
5760
5
D. Wang. / Journal of Information & Computational Science 9: 18 (2012) 5755–5764
Design of Base Based on Data Mining
5.1
Overall design of rule database
A database IDS with self-improvement is as followed in Figure 2.
Fig. 2: Improvement of database intrusion detection system structure System can be divided into 4 main modules from implementation of system functions: the intrusion detection module, a rule generation module, data processing module and an operation module, respectively [12]. We add a self-improvement module in order to consolidate the adaptability of intrusion detection system and the function of self-perfection.
5.2
Algorithm design of database
In the intrusion detection of Database, there are many data mining methods. Our detection module is designed based on these and a new algorithm is put forward according to Apriori algorithm: Method: input database A, and the minimum supporting value min z; output the frequent set B in A; Bl=find frequcnt l-itemsets(A); If (B16=m) { C2=appriori gen(B1, rain sup); For each transaction t∈A{
D. Wang. / Journal of Information & Computational Science 9: 18 (2012) 5755–5764
5761
C1=subset(C2, t); For each candidate c∈C1 c.count++; } If (c.count≥min z)k={cECilecounti>min z) If (c.count=O) Set Si=Sj=random Retum B=B1∪B2 For (k=3;Bk-16=m;k++) { Ck=appriori gen(Bk-1, min z); for each transaction t∈A { C1=subset(Ck, t) for each candidate c∈C1 c.count++; } Bk={C∈Ck|c.count≥min sup} } Return B←B U Bk In this algorithm, there are two processes setting Si which has different functions: the first is used to find out and display the exclusive attributes, to monitor number of 2-set less support value of 0 and reduce the consumption of time of a pruning operation and technical operation; while the second set of Si can find out hidden exclusive attributes. The discovery of dominant exclusive attributes and hidden exclusive attributes can reduce the k-set generation for support value of 0 for, so it avoids a large amount of pruning operation, which speeds up the algorithm speed and reduces the analysis time and consumption of time generating frequent item sets [13].
5.3
Similarity of rules
By comparing the similarity and association degree between new-found knowledge and prior knowledge, it is able to judge the importance and fresh degree of new discovered semantic knowledge. Due to the need to express new data mining method or active semantic knowledge through rules, this article calculates the similarity among rules to analyze the association of new-found knowledge and prior knowledge. Suppose the object: m = {m1 , m2 , m3 , · · · , mn }, mi (1 ≤ i ≤ n) is the attribute value. m is a a point at n-dimensional attribute space T = (T 1×T 2×T 3∗· · ·×T n), mi ∈ Di , ∀x, y ∈ D, the distance Y of m, Z to attribute F on the attribute D is dF m, z= [
X ieF
(Φ(maj , zaj ))r ]1/r
(1)
5762
D. Wang. / Journal of Information & Computational Science 9: 18 (2012) 5755–5764
Φ(mi , zi ) =
|mi − zi |, Di continuous 0, Di discrete, mi = zi 1, Di discrete, m 6= z i i
(2)
The distance between the object m, Z is a comprehensive, quantitative measurement, considering the importance of attributes M and Z contain, numerical distance between the M and Z properties, and the impact on distance between M and Z because of its attribute difference. The analysis of the attention degree is usually different for different objects distance. Under normal circumstances, property concern is expressed by the importance of attribute, that is to say, the more important the attribute is, the more attention attribute will be paid to. If the key attributes of object m and Z are same and the attribute value is same, then m and Z are basically the same. If subdividing ai, aj properties in M and in Z combined with the importance, definition can be obtained as the following: X dF (m, z)= [ (Φ(maj , zaj ))r ]1/r (3) ieF
Among that, dF (m, z) becomes small while SF (m, z) increases.
5.4
Updated rules
After identifying and obtaining new active semantic of the system, rules updating is required for these new semantics. In order not to make the rule set increase gradually and eliminate redundant rules when updating, the active rule in a rule set (Library) should be fully understood, thus deciding which rules need to be modified or added. By comparing the similarity and association degree between new-found knowledge and prior knowledge, it is able to judge the importance and fresh degree of new discovered semantic knowledge. Then it is certain to determine which active rules should be updated. The new-found semantics by data mining basically have three kinds following: (1) One rule in a rule base is verified in the new background, thus consolidating or complementing the original condition or conclusion (2) In some conditions, or difference of the result, some original rules fail in the rule base. (3) By comparing the reality, the importance and fresh degree of some original rules is low, and they should not exist in the rule base. In order to improve the efficiency of rule analysis, data mining technology also can be firstly used to classify the active rules in active database available so that a large set of rules are divided into a number of small rules set limited in different range to form a “tree” structure. Such rules, analysis of similarity, can be classified, to get a better effect of analysis [14].
6
Results and Analysis
Experiment was conducted in the network system constructed by ourselves, in accordance with the complete rule base structure. On the basis of part of intrusion detection attack scenarios data
D. Wang. / Journal of Information & Computational Science 9: 18 (2012) 5755–5764
5763
sets given by the MIT Lincoln lab laboratory, rules library kernel of the data source is initialized and 30 selected attack scene data with the highest frequency and the most typical rules are added into the rule base. The rule numbers generated by cycle study in rule base system are visible in the table. Based on the above experiment environment, five times of experiments were conducted to describe rule-based systems generating the rules accurately and truly. The complete extent of Rule base approaches all of the rules same in the intrusion data scene due to rules constantly updated, that is to say under the relatively stable external environment, the rule base can achieve stable network protection. By setting the parameters of network behavior, description number of DNA is upward basically as exponential saturation curve, and stable in the controllable range, and at the same time can be adjusted by the user to customize the condition.
7
Conclusions
Data rule base was improved in view of data mining IDS system, and an improved IDS system of rule base self-renewal was designed. Solutions to two invasion ways and data rule bases were optimized and solved. In recent years, due to the rapid development of network, Data mining in IDS system in this paper provides a kind of IDS system database rule based on data mining so that system can automatically update the data rules library, which has played a great role in many tasks and actual applications.
References [1]
M. Dai, Y. L. Huang, W. Wang, Trojan Horse Detection Model Based on File’s Static Attributes, Computer Engineering, 32 (2006), 198 – 200.
[2]
M. Dai, Y. L. Huang, Presenting association rules by hierarchy, Journal of Computer Applications, 26 (2006), 207 – 209.
[3]
R. Z. Zhao, C. Li, Y. Y. Zhang, Study on fault knowledge processing modes in intelligent diagnosis based on theory of rough set, Journal of Vibration and Shock, 26 (2007), 71 – 74.
[4]
X. Wu, J. Cheng, R. Q. Li, State monitoring and fault diagnosis of equipments based on data mining, Journal of Vibration and Shock, 22 (2004), 70 – 75.
[5]
Q. Y. Li, X. L. Wang, The Study of Different SVM Algorithm Applied to Intrusion Detection, Journal of Gansu Normal Colleges, 16 (2011), 35 – 37.
[6]
Z. Ren, X. W. Hu, H. Zhang, Improved FCM Algorithm in Network Intrusion Detection Research, Journal of Gansu Normal Colleges, 16 (2011), 42 – 44.
[7]
X. Li, Intrusion analysis, Journal of Daqing Normal University, 31 (2011), 18 – 21.
[8]
Z. X. Jiang, H. Jing, BI-PaaS: parallel-based business intelligence system, Journal of Computer Application, 32 (2012), 595 – 598.
[9]
M. H. Zhang, Comparison of the two intrusion detection methods based on abnormal network, Telecommunications Technology, (2011), 21 – 23.
[10] J. Cui, L. Xu, X. D. Wang, H. Xiao, Study on Intrusion Detection System in Wireless Sensor Networks, Electronic Science and Technology, 24 (2011), 144 – 146.
5764
D. Wang. / Journal of Information & Computational Science 9: 18 (2012) 5755–5764
[11] J. S. Zhou, X. Y. Dai, C. Y. Yi, J. J. Chen, Automatic Recognition of Chinese Organization Name Based on Cascaded Conditional Random Fields, Acta Electronica Sinica, 34 (2006), 55 – 58. [12] S. Y. Wu, J. Yu, X. P. Fan, Improved SVM co-training based intrusion detection, Journal of Computer Applications, 31 (2011), 3337 – 3339. [13] W. P. Wang, J. Huang, Improved algorithm of BM for pattern matching, Computer Engineering and Applications, 47 (2011), 108 – 111 [14] G. Zhang, Y. Liu, Wavelet-based detection method for low-rate TCP-targeted DoS, Computer Engineering and Applications, 47 (2011), 115 – 117.