Abstract As the widespread of puter user and the rapid development of , the number of user and site is increasing quickly, hence the information on the is also increasing quickly. It is a challenge that how to deal with so much information. Traditional information retrieval methods that based on string matching can not meet now, and Semantic-based information processing emerges. In the field of natural language processing, intelligent retrieval, text clustering and so on,Semantic similarity calculation is a fundamental are two main ways to calculate words similarity: One is based on the structure of knowledge which build by linguists, such as semantic dictionary or work, and this methed is called the subjective method. The other is based on large-scale corpus and this method is called the subjective method. The method which is based on the structure of knowledge needs linguists to define the information of word, then according to the characteristics of the information to calcutlate the similarity. The method which is based on large corpus use statistical methods to calculate the similarity. This thesis studies the algorithms based on ―‖ and large-scale corpus to calculate the words similarity. An improved objective and bination of word semantic similarity algorithm is proposed. In the calculation process, the algorithm eliminates interference factors and makes the result conform both subjective concept and objective semantic environment. The text is one of the most important carriers in the world, and text similarity calculation is the basis of text classification and text clustering. This thesis proposed a dual-level text similarity algorithm. The text is divided into two levels: one is title information and the other is text content information, and text similarity consists of two parts. In the calculation process, this thesis uses the improved objective and bination of word semantic similarity algorithm, and makes the result conform both sub