Lazy Query Expansion * Alexander Gelbukh Center puting Research (CIC), National Polytechnic Institute (IPN), Av. Juan Dios Bátiz s/n esq. Mendizábal, Col. Zacatenco, . 07738, ., Mexico g e l b u k h * c i c . i p n . mx Abstract An information retrieval or document base system has to somehow deal with various phenomena of equivalence of some strings. These are lowercase versus uppercase match- ing, morphological inflection, derivation, and synonymy of words: ., given a query computer , find Computers , com- puting , workstation . The latter problems are very important in languages with richer morphology and less stable termi- nology than in English. Also, much better recall is achieved by matching hyponyms and hypernyms using a thesaurus, ., given a query computers , find also puter , puter , mainframe , machine , device , processor , UNIX , etc. Technically, this can be handled at the time of indexing by reducing related strings to mon form, or at the time of query processing by expanding the query with the whole set of the related forms. We argue for that the latter way allows for greater flexibility and easier mainte- nance, while being more affordable than it is usually con- sidered. We propose to expand the query with only those words that really appear in the document base. Our experi- ments with a thesaurus-based information retrieval system we are developing for the Senate of Mexican Republic show only insignificant increase of the real user queries on average with the 200-megabyte document base of the Sen- ate, in spite of highly inflective Spanish language. Keywords: full-text database, information retrieval, query expansion, natural language. * An extended version of the paper Lazy Query Enrichment: A Simple Method of Indexing Large Specialized Document Bases , In Proc. DEXA-2000, 11 th International Conference and Workshop on Database and Expert
lazy query expansion:懒惰的查询扩展 来自淘豆网m.daumloan.com转载请标明出处.