The paper proposes a new text similarity computing method based on concept similarity in Chinese text processing. The new method converts text to words vector space model at first, and then splits words into a set of concepts. Through computing the inner products between concepts, it obtains the similarity between words. The new method computes the similarity of text based on the similarity of words at last. The contributions of the paper include: 1) propose a new computing formula between words; 2) propose a new text similarity computing method based on words similarity; 3) successfully use the method in the application of similarity computing of WEB news; and 4) prove the validity of the method through extensive experiments.
PENG JingYANG DongQingTANG ShiWeiWANG TengJiaoGAO Jun
受生物基因片段重叠表达现象的启发,文中提出了一种新的基于重叠表达进化算法——MEOE(Multigene Evolutionary algorithm based on Overlapped Expression).文章具体描述了MEOE的基因表达结构及相应的算法.不同于已有的工作,在MEOE中作为遗传物质的基因具有重复表达的概率,同时算法融合了免疫算法关于浓度的计算技术.文章对MEOE算法作了较为全面的分析,讨论了算法在表达空间、可表达性、性状遗传上的特点,并与传统算法作了相应比较.详尽的实验证明,MEOE算法在速度上是GEP的2.5~9.4倍.在高次函数发现问题上MEOE算法的成功率比GEP提高至少一个数量级.另外,通过实验证明了基于密度的概率选择函数在高次函数发现问题上具有一定优势.