英语翻译数据挖掘是从数据库中发现隐含的、新颖的、对决策有潜在价值的知识和规则的过程,目前已经在许多领域得到了广泛的应用.

1个回答

  • Data mining from a database found implied, novel, a potential value of decision-making process of the knowledge and rules in many areas, has been widely used. And clustering analysis is the most important data mining field technology of clustering analysis is put physics or abstract collections of objects into the object by similar composed of multiple cluster process. By clustering generated clusters are a group of collections of objects, the object in the same clusters resemble each other, different with different objects in the cluster. And in many clustering algorithms, K - means clustering algorithm is the most classic.

    K - means algorithm is a kind of typical clustering algorithm based on division, this algorithm has thought is simple, and the mining of large-scale data with efficiency and scalability, time complexity close to linear, etc. But this algorithm also exists weakness: algorithm of initial sensitive; Using random initial value, the algorithm is not quite stable; Algorithm easily into the local minimum, and only commonly found globular clusters; The cluster number K need to be given.

    This paper mainly introduces and analyses tradition K - means clustering algorithms and understand K - means clustering algorithm, and finally the advantages and disadvantages of K - means clustering algorithm was improved. This improvement mainly for K - means clustering algorithm's dependence on initial value this characteristic is improved. Improvement mainly through some algorithm of the initial points, so choose overcomes K - means algorithm unstable, and can make the disadvantages such as clustering results more precise.

    Main content and research results are as follows: 1. Introduction and analysis K - means clustering algorithms, and realize the ideological algorithm. Then through some data to understand the advantages and disadvantages of this algorithm.

    2. The K - means clustering algorithm improved the shortcomings, mainly for K - means clustering algorithm's dependence on initial value this characteristic is improved. Using the two improved methods for reference, the first kind, the second kind of reference Huffman thought Kruskal algorithm greedy algorithm of thoughts and ideas.