arXiv: [] 8 May 2007 IDF Revisited: A Simple New Derivation within the Robertson-Sp ¨arck Jones Probabilistic Model Lillian Lee Dept. puter Science, Cornell University Ithaca, NY 14853-7501 USA .edu/home/llee ******@ ABSTRACT There have been a number of prior attempts to theoretically justify the e?ectiveness of the inverse document frequency (IDF). Those that take as their starting point Robertson and Sp¨arck Jones’s probabilistic model are based on strong plex assumptions. We show that a more intuitively plausible assumption su?ces. Moreover, the new assump- tion, while conceptually very simple, provides a solution to an estimation problem that had been deemed intractable by Robertson and Walker (1997). Categories and Subject Descriptors: [Informa- tion Search and Retrieval]: Retrieval models General Terms:Theory, Algorithms Keywords:inverse document frequency, IDF, probabilistic model, term weighting 1. INTRODUCTION The inverse document frequency (IDF) [12] has been “in- corporated in (probably) all information retrieval systems”([6], pg. 77). Attempts to theoretically explain its empirical esses abound ([2, 14, 1, 11, 5, 8, 4, 3],inter alia). Our focus here is on explanations based on Robertson and Sp¨arck Jones’sprobabilistic-model(RSJ-PM) paradigm of informa- tion retrieval [10], not because of any prejudice against other paradigms, but
IDF revisited A simple new derivation within the Robertson-Sparck Jones probabilistic model 来自淘豆网m.daumloan.com转载请标明出处.