Though K-means is very popular for general clustering, its performance, which generally converges to numerous local minima, depends highly on initial cluster centers. In this paper a novel initialization scheme to select initial cluster centers for K-means clustering is proposed. This algorithm is based on reverse nearest neighbor (RNN) search which retrieves all points in a given data set whose nearest neighbor is a given query point. The initial cluster centers computed using this methodology are found to be very close to the desired cluster centers for iterative clustering algorithms. This procedure is applicable to clustering algorithms for continuous data. The application of the proposed algorithm to K-means clustering algorithm is demonstrated. An experiment is carried out on several popular datasets and the results show the advantages of the proposed method.
It has very realistic significance for improving the quality of users' accessing information to filter and selectively retrieve the large number of information on the Internet. On the basis of analyzing the existing users' interest models and some basic questions of users' interest (representation, derivation and identification of users' interest), a Bayesian network based users' interest model is given. In this model, the users' interest reduction algorithm based on Markov Blanket model is used to reduce the interest noise, and then users' interested and not interested documents are used to train the Bayesian network. Compared to the simple model, this model has the following advantages like small space requirements, simple reasoning method and high recognition rate. The experiment result shows this model can more appropriately reflect the user's interest, and has higher performance and good usability.
This paper proposes a method of data-flow testing for Web services composition. Firstly, to facilitate data flow analysis and constraints collecting, the existing model representation of business process execution language (BPEL) is modified in company with the analysis of data dependency and an exact representation of dead path elimination (DPE) is proposed, which over-comes the difficulties brought to dataflow analysis. Then defining and using information based on data flow rules is collected by parsing BPEL and Web services description language (WSDL) documents and the def-use annotated control flow graph is created. Based on this model, data-flow anomalies which indicate potential errors can be discovered by traversing the paths of graph, and all-du-paths used in dynamic data flow testing for Web services composition are automatically generated, then testers can design the test cases according to the collected constraints for each path selected.