A Partial Weighted Utility Measure for Fuzzy Association Rule Mining

Background/Objectives: Association rules are generated from frequent item set by Association mining. The generation of frequent item set makes a great impact on decision making. The objective of the work is to introduce a new measure called SUF (Skill Utility Factor) to extract meaningful hidden item set and develop a hybrid algorithm FPWUM (Fuzzy Partial Weighted Utility Mining) for decision making. Methods/Statistical Analysis: The traditional measures support and confidence is augmented with SUF which can be useful for Human resource personnel to easily predict the work-force calibre in an organization. Using different methods like association rule mining, fuzzy logic and weighted utility mining has improved the prediction of attributes relations efficient and faster. Findings: The FPWUM extracts more efficient hidden frequent item sets through which many new and interesting rules are generated. Since the application of attribute’s weight are handled wisely and improvising factor is used only for hidden item set the model process time is reduced fairly. The idea of integrating the conventional measures and the SUF is a unique technique. The approach works well on real time dataset compared to the conventional models. The comparative result shows the algorithm’s ability. Improvements/ Applications: The algorithm uses predefined weighting scheme. It can be enhanced by using dynamic intelligent weighting factor.


Data mining
Raw data in large volume can be of no use in Data processing techniques or Decision making. The data and their relationship can be discovered through Data mining which is an intelligent task that can turn large raw data volume into Knowledge. For such mining process various database techniques, Statistical concepts, Machine learning tools and Artificial Intelligence algorithms are employed 1 .Though the arise of new technique and application are appreciable, other way there is enormous amount of issue related to new discovery of algorithm and knowledge related to performance. Always algorithms which are robust and scalable are noticeable 2 .

Association Rule Mining
A standard association rule 3 is a rule of the form X→Y which says that if X is true of an instance in a database, so is Y true of the same instance, with a certain level of significance as measured by two indicators, support and confidence 4 .Here let imagine I = {i 1 , i 2 , ..., i n } be Items which is a set of attributes and D = {t 1 , t 2 , ..., t n } be the Database which is a set of transactions. Each transaction in the Database D contains unique ID and a subset of the items which is in I. A rule can be defined as X→Y where X, Y ⊆ I and X ∩ Y =Ø. The X and Y are called antecedent and consequent of the rule respectively 4 . From this many rules can be generated. In order to choose the most interesting rules from the group of all possible generated rules, some measures are necessary that depicts the rule interestingness and significance. Such universally proved measures are Support and Confidence 4 .

Fuzzy Set and Domain Partitioning
The use of fuzzy approach in our research is well discussed in 5 . The ideology of integration of Association rules with Fuzzy theory is most popular in Big data research for long time. A fuzzy set is a class of objects with grades of membership ranging between [0,1]. The technique called Domain partitioning is used with quantitative association rule mining. This type of partitioning, arise the sharp boundary problem which is explained in 5 . To resolve such problem, a bendable setting of the boundaries interval is enviable, and therefore fuzzy logic is applied 6 .

Fuzzification
The effort of changing a scalar value into a fuzzy value is called fuzzification. A variety of fuzzifiers called as membership functions are used for fuzzification. Some wellknown membership functions are Trapezoidal,Triangular ,Gaussian and Bell. For a value u , knowning x,y,z (start, peak, end), the membership for u in all the assumed partitions can be calculated 5,7 . For example the attribute MG is further defined into four different fuzzy sets on its domain as NS(MG),S(MG),G(MG),VG(MG) with trapezoidal membership function as shown in Figure 1. This mapping of values (quantitative to linguistic) helps in the process of mining most interesting fuzzy association rules. Once the fuzzication process is done the database is populated with multiple sub-attributes for the basic attributes(CS, MG) 5 . The database after fuzzification is shown as in Table 2.

Utility Mining
Utility mining is an extensive step of mining frequent item set. A mining which not only depends on the frequency of the item set but also the utility involved with the item set is called utility mining. Here Utility is referred to the importance of the item set in the transactions considering in terms of any user specific preferences other than frequency of the items alone 1 . This item set consist of frequent item set and rare item set too. In many domain applications, rare item set play a major role in decision making. The high-profit rare item sets are found to be very useful in many application areas. For example, in medical application, the rare combination of symptoms can provide useful insights for doctors [8][9] . Here in our application the frequent item-set mined using Fuzzy Partial Weighted Utility Mining (FPWUM) algorithm, helps in finding out different combination of capabilities which helps to understand and decide a better leader without missing even a single skill of a workforce.

Fuzzy Partial Weighted Utility Mining (FPWUM)
The k-frequent item set is generated using candidate generation. The minimum Fuzzy Item Set Support Value (FISV) is set as 30%. The k-item frequent set is generated from k-1 item. For frequent set mining not only FISV but SUF is also considered. So there is a possibility of considering in frequent item set but with high skill utility factor in our rule mining. When the item set satisfies FISV it is added to the frequent item set and there is no need to calculate SUF. If it does not satisfy, then the SUF of the item set is calculated. If the calculated SUF of that particular item set satisfies the min_SUF then it is added to the frequent item set else it is pruned. The necessity of SUF comes because of setting FISV too high or too low. Both leads to rule over-fitting or rule under-fitting situation respectively. In order to prevent such situation and not to let pass any kind of attributes with higher precedence, FISV combined with SUF works well. Some of the k-item set are given in Table 4.The FISV is set as 30% and min_SUF is set as 0.7 for calculation.

Skill Utility Factor (SUF) Calculation
Consider Table 3, 4 and 5 for transactions, Item-set and weight respectively. The Item-set 3 {CS (G), AS (VG)} has a FISV of 20% which is less than the specified FISV. But it has a higher SUF. Similarly the set 5 has higher SUF. But the set 2 has less FISV and less SUF. So the set 2 is pruned but not the set 3 and 5. This example illustrates the fact that frequent item-set mining approach may not always satisfy the goal but things to think with skill utility attributes too. The comparative results are shown in section 4.

Algorithm
• Identify item set (I i ∈ n) • If it satisfies the min_support threshold (positive item set), add those item set to frequent item set list (freq (I) = I i ). • If it does not satisfy min_support threshold (negative item set) find SUF. • To find SUF, sum the product of item of the item set, and its corresponding weight for all the transaction where the item set occurred, by the no. of item set occurrences. • If SUF >=min_SUF add the item set to frequent set list else the item set is pruned. Algorithmic flow shown in Figure 2.

Weighting Scheme
The weight and threshold values specified by the user are from the margin of significance of personnel skills point of view. Applying weights only to item set which does not satisfy the support, has two significant importance.
• Applying weights to item set above the support value has no meaning, because it does not need any improvising factor to be considered in the algorithm. This enhances model's process time and gets rid of unnecessary weighting process. • The goal of using SUF is to make use of the weight in the mining process and prioritize the selection of different hidden item sets (item set less than support) according to the significant skills required, rather than the frequency alone and now here violating Downward Closure Property (DCP) too.

Fuzzy Association Rule Generation
Mining fuzzy association rules is the discovery of association rules using fuzzy set concepts such that the quantitative attributes can be handled. Here we view each attribute as a linguistic variable, and the variables divided into various linguistic terms. Fuzzy association rules are expressed in form: . The semantics of the fuzzy rule is that when the antecedent X is A is satisfied, we can imply that Y is B is also satisfied, which means there are sufficient records that contribute their counts to the attribute fuzzy set pairs and the sum of these counts is greater than the specified threshold (FISV), and the fuzzy set pairs formed by FPWUM is greater than the min_SUF.

Frequent Item Set Generated with FISV
The frequent item set generated by FISV is shown in Figure 3. The number of frequent item set generated are very less and even it missed many hidden interesting item set compared to the FPWUM method which is shown in Figure 4.

Frequent Item Set Generated by FPWUM
In addition to the frequent item sets formed by FISV there is also some other interesting item set generated part of which is shown in Figure 3, plays a vital role in determining the skill set for the work-force leadership prediction. The greyed box represents the new item set generated by FPWUM and the white box represents the item set that satisfy min_support. This increases the number of interesting fuzzy association rules and we could find more hidden relationships of the attributes.

Comparative Results of FISV Method and FPWUM
Two sets of experiments were undertaken with two different algorithms namely FISV and FPWUM. Figure  5 explains the Number of frequent Item set generated by both the algorithms. Figure 6 shows the Number of Fuzzy association rules generated. Figure 7 displays the Execution time taken to generate the rules.   The results shows that the proposed algorithm produces better results as it uses all the possible interesting hidden item sets from the so called infrequent item set too and generates Fuzzy Association Rules without violating Downwards Closure Property (DCP).

Conclusion
Initially the algorithm takes the entire positive item set considering the universal proven support measure and then it works on the remaining item set for algorithmic utility weighting. We identify the challenge of using weights in the iterative process of generating large meaningful item sets and thereby bringing good fuzzy association rules. Hence the number of hidden correlations of skills set of an individual are found high compared to the traditional FISV method and therefore the leadership capabilities of a work-force can be interpreted clearly and accurately by FPWUM.