下载此文档

Efficient algorithms for substring near neighbor problem:为子串近邻问题的高效算法.ppt


文档分类:行业资料 | 页数:约25页 举报非法文档有奖
1/25
下载提示
  • 1.该资料是网友上传的,本站提供全文预览,预览什么样,下载就什么样。
  • 2.下载该文档所得收入归上传者、原创者。
  • 3.下载的文档,不会出现我们的网址水印。
1/25 下载此文档
文档列表 文档介绍
1 What ’ s SNN? ? SNN ≈ Text Indexing with mismatches ? Text Indexing: ? Construct a data structure on a text T[1..n], . ? Given query P[1..m], finds occurrences of P in T ? Text indexing with mismatches: ? Given P, find the substrings of T that are equal to P except ≤ R chars. ? Motivation: ., computational bio (BLAST) T= G AGTA ACTCAATA P= AGTA T= G AGTA ACTC AA TA 2 Outline ? General approach ? View: Near Neighbor in Hamming ? Focus: reducing space ? Background ? Locality-Sensitive Hashing (LSH) ? Solution ? Reducing query & preprocessing ? Redesign LSH ? Concluding remarks 3 Approach (Or, why SNN?) ? SNN = a near neighbor problem in Hamming metric with m dimensions: ? Construct data structure on D={all substrings of T of length m}, . ? Given P, find a point in D that is at distance ≤ R from P ?? Use a NN data structure for Hamming D={GAGT, AGTA , GTAA, ….AA TA } T= G AGTA ACTC AA TA P= AGTA 4 Approximate NN ? Exact NN problem seems hard (., hard w/o exponential space or O(n) query time) ? Approximate NN is easier ? Defined for approximation c=1+ ε as ? OK to report a point at distance ≤ cR (when there is a point at distance ≤ R) Query Space [KOR98, IM98] poly(log n, m) n O(1/ ε^2) LSH [IM98] n 1/c +m n 1+1/c R cR q5 Our contribution ? Problem: need m in advance for NN ? Have to construct a data structure for each m ≤M ? Here: approx SNN data structure for unknown m ? Without degradation in space or query time ? Our algorithm for SNN based on LSH: ? Supports patterns of length m ≤M ? Optimal * space: n 1+1/c ? Optimal * query time: n 1/c ? Slightly worse preprocessing time if c>3 ?(* Optimal . LSH, modulo subpoly factors) ? Also extends to l 1 6 Outline ? General approach ? View: Near Neighbor in Hamming ? Focus: reducing space ? Background ? Locality-Sensitive Hashing (LSH) ? Solution ? Reducing query & preprocessing ? Redesign LSH ? Concluding remarks 7 Locality-Sensitive Hashing ? Based on a family of hash functions {

Efficient algorithms for substring near neighbor problem:为子串近邻问题的高效算法 来自淘豆网m.daumloan.com转载请标明出处.

相关文档 更多>>
非法内容举报中心
文档信息
  • 页数25
  • 收藏数0 收藏
  • 顶次数0
  • 上传人薄荷牛奶
  • 文件大小0 KB
  • 时间2016-04-17
最近更新