Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 813 (2004)
(USC DC Other)
USC Computer Science Technical Reports, no. 813 (2004)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
SWAM: A Family of Access Methods for Similarity Search in Querical Data Networks Farnoush Banaei-Kashani and Cyrus Shahabi ¤ Abstract Querical Data Networks (QDNs), e.g., peer-to- peer and sensor networks, are large-scale, self- organizing, distributed query processing sys- tems. We formalize the problem of similarity search in QDNs and propose a family of dis- tributed access methods, termed Small-World AccessMethods(SWAM),whichunlikeLH ¤ and (more recently) DHTs does not control the as- signment of data objects to QDN nodes. We propose a Voronoi-based instance of SWAM that indexes multi-attribute objects and for a QDN with N nodes has query time, communi- cation cost, and computation cost of O(logN) forexact-matchqueries,andO(logN+sN)and O(logN +k) for range queries (with selectivity s) and kNN queries, respectively. 1 Introduction Recently, a family of large-scale, self-organizing, dis- tributed query processing systems has emerged. We term these systems Querical Data Networks (QDNs). A QDN is a federation of a dynamic set of peer, au- tonomous nodes communicating through a transient- topology interconnection. Data is naturally distributed among the QDN nodes in extra-¯ne grain, where a few data items are dynamically created, collected, and/or stored at each node. Therefore, the network scales linearly to the size of the dataset. With a dynamic dataset, node-set, and topology, QDNs should be con- sidered as the new generation of distributed database systemswithsigni¯cantlylessconstrainingassumptions as compared to their ancestors. Peer-to-peer networks and sensor networks are well-known examples of QDN. E±cient execution of similarity search queries (such as exact-match, range, andk-nearest-neighbor queries) in QDNs is an important but challenging requirement. Traditional distributed databases employ hierarchical distributed directories as access methods to execute similarity queries. These access methods often assume small-scalenetworkswithstaticsetofnodesthatmain- taintheobjectdirectory. Besides,theyfailtodistribute thequeryexecutionloadfairlyamongpeernodes. This ¤ Farnoush Banaei-Kashani and Cyrus Shahabi are with the Integrated Media Systems Center, Computer Science Depart- ment, University of Southern California, Los Angeles, Califor- nia 90089; Email: [banaeika,shahabi]@usc.edu; Tel:(213)740 3816; Fax:(213)740 5807 is because with hierarchical index topology of the net- work, nodes at the higher levels of the hierarchy (e.g., root of the tree) inevitably receive more queries to process. Moreover, hierarchical network topologies are loop-free and intolerant to failures and/or autonomous presence of the QDN nodes. On the other hand, hash- based distributed data structures such as LH ¤ [13], and more recently, DHTs [16, 19] assume large-scale networks and construct distributed index structures withnon-hierarchicaltopologies. However,theseaccess methods enforce the location of the content within the network and hinder natural replication of the objects, which are both in con°ict with the autonomy of the QDN nodes in maintaining their own content. Besides, since object assignment to the nodes is performed irre- spectiveoftheobjectdistribution,theseaccessmethods fail to adapt to the distribution of the objects and may su®er from unbalanced query load. Finally, they can only support exact-match queries on a single attribute. Learningfromtheprinciplesofthesmall-worldmod- els, which are proposed to explain the phenomenon of e±cient communication in social networks [20, 12], in this paper we de¯ne a family of access methods, termed Small-World Access Methods (SWAM), for ef- ¯cient similarity search in QDNs. First, in Section 2 we formalize the problem of similarity search in QDNs by 1) modelling the problem, 2) de¯ning a set of met- rics to evaluate the e±ciency of the QDN access meth- ods, and 3) introducing a basic access method to set a lower e±ciency bound for similarity search. In Section 3, we generalize the e±ciency principles of the small- world models to de¯ne three properties that character- ize the SWAM family. These properties ensure that theindexstructuree®ectivelypartitionsthedataspace for e±cient ¯ltering, closely co-locates the nodes with similar content for batch content-retrieval, and prop- erly interconnects the nodes for fast traversal of the data space. To show the e®ectiveness of the SWAM properties, we introduce a Voronoi-based instance of SWAM, termed SWAM-V, which for a QDN with N nodes has query time, communication cost, and com- putation cost of O(logN) for exact-match queries, and O(logN+sN)andO(logN+k)forrangequeries(with selectivity s) and kNN queries, respectively. SWAM- V proposes a non-hierarchical distributed index struc- ture that indexes QDNs with multi-attribute objects. SWAM-ValsorespectstheautonomyoftheQDNnodes and self-con¯gures the topology of the QDN (i.e., the index structure) based on the nodes own content. Con- sequently, it avoids unnecessary content migration and replacement, supports object replication, and adapts to the object distribution as new nodes join the QDN with the new content. In Section 4, we perform a com- parative study via simulation to verify the e±ciency of SWAM-V versus our basic access method, as well as a version of CAN [16] that we extended over its origi- nal DHT to support range and kNN queries. Our ex- periments show that unlike the basic access method, SWAM-V achieves logarithmic query time with limited resource usage, and consistently outperforms CAN in query time and communication cost. Finally, a sum- mary of the related work and our future directions con- clude the paper. 2 Formal De¯nition of the Problem In this section, we state and model the problem of sim- ilarity search in QDN. We also describe a naive access method,whichessentiallyscanstheQDNtoresolvethe similarity queries, as the basic similarity search mech- anism that sets the lower performance bound for more e±cient access methods. 2.1 Model We assume a relational data model for the content of a QDN. A set of (maybe duplicate) tuples with the same schema are distributed among the nodes of the QDN (for multi-schema QDNs, we rely on schema reconcilia- tion techniques such as [8]). Tuples are uniquely iden- ti¯ed by a set of d attributes, the key of the schema. Hereafter, we use the terms tuple and key interchange- ably wherever the meaning is clear. A similarity query isoriginatedataQDNnodeandisansweredbylocating at least one replica of all the tuple(s) with key similar to the query key. A QDN access method is a mecha- nism that de¯nes 1) how to organize the QDN topol- ogy (interconnection) to an index-like structure, and 2) how to use the index structure to process the similarity queries. We are interested in the access methods for e±cient processing of similarity queries in QDNs. We model the QDN key space as a Hilbert space (V;L p ). V =V 1 £V 2 £:::£V d is a d-dimensional vector space, where V i , the domain of the attribute a i for the key ¡ ! k =ha 1 ;a 2 ;:::;a d i in V, is a contiguous and ¯nite interval ofR. TheL p norm with p2Z + is the distance function to measure the dissimilarity (or equivalently similarity) between two keys ¡ ! k 1 and ¡ ! k 2 asL p ( ¡ ! k 1 ¡ ¡ ! k 2 ), whereL p ( ¡ ! x)= ³ P d i=1 jx i j p ´1 p . We are interested in content-based access methods, i.e., access methods that organize the QDN topology based on the content of the QDN nodes. In general each QDN node may include more than one tuple. To explain our content-based access methods, we assume a QDN model where each node stores one and only one tuple. Here, we show how to reduce the general QDN 1 4 3 2 5 1 6 4 7 1 7 6 4 3 5 2 1 4 3 2 5 1 6 4 7 1 7 6 4 3 5 2 III III Figure 1: Reducing the general QDN model model to our assumed QDN model. Consider K as the set of keys (tuples) available in QDN and N as the set of QDN nodes. Assuming a general QDN model, we de¯ne a one-to-many mappingM : N !K that maps each QDN node to the set of keys stored at the node (Figure 1, Step I). Each key is considered as a virtual node embedded in V. Note that since tuples are repli- cated, there might be several virtual nodes with the same key. A content-based access method de¯nes how to organize the set of virtual nodes corresponding to all nodes in N to a virtual QDN with particular topol- ogy and how to process the queries in the virtual QDN (Figure 1, Step II). Finally, the topology of the actual QDN is deduced by inverse mapping from the topology of the virtual QDN: a QDN node n is connected to a node m if and only if at the virtual QDN some virtual node inM(n) is connected to some other virtual node inM(m) (Figure 1, Step III). Also, the semantic of the queryprocessingattheactualQDNnodesisde¯nedby thequeryprocessingsemanticatthecorrespondingvir- tual nodes such that the °ow of the query at the actual QDN is logically identical to that of the virtual QDN. With this approach, the mapping and inverse mapping steps (Steps I and III) are independent of the access method used in Step II, and each access method for virtual QDNs (which is a QDN with only one tuple per node) de¯nes an access method with similar character- istics for general QDNs. Hereafter, we assume the re- duced model for QDNs and characterize the primitives of an access method to construct the topology/index and process the queries in such a QDN. The topology of a QDN can be modelled as a di- rected graph G(N;E), where the edge e(n;m) 2 E represents an asymmetric neighborhood relationship in which node m is a neighbor of node n. Schematically, we depict this relationship by drawing an arrow from node n to node m. A(n) is the set of neighbors for the node n. To achieve scalability, a node only maintains a limited amount of information about its neighbors, which includes the key of the tuples maintained at the neighbors and the physical addresses of the neighbors. A node can directly communicate with its neighbors. To construct the QDN index, an access method de¯nes thejoin primitive 1 (similartotheinsert operationwith 1 This join isdi®erentfrom the join operation in the relational thetraditionaldatabaseaccessmethods), whichisused bythenewnodentodelineateA(n)asitjoinstheexist- ing QDN. We assume that at least the physical address of one node in the existing QDN (if any) is available to n asit joins the QDN. As the newnodes join theQDN, itstopologyincrementallyconvergestotheintendedin- dex structure. Similarly an access method de¯nes the leave operation (equivalent to the delete primitive with the traditional access methods). We are interested in the following types of similarity queries: ² Exact-Match Query: Given the query key ¡ ! q , return the tuple t with key ¡ ! k such that ¡ ! k = ¡ ! q . ² Range Query: Given the query key ¡ ! q and the range r, return all tuples t with key ¡ ! k such that L p ( ¡ ! k ¡ ¡ ! q )·r. ² k-Nearest-Neighbor(kNN)Query: Giventhe query key ¡ ! q and the number k, return the k-ary (t 1 ;t 2 ; :::;t k ) such that ¡ ! k i , key of t i , is the i-th nearest neighbor of the key ¡ ! q . A similarity query can originate from any QDN node at T 0 -th time slot (8 T 0 2Z), assuming a discrete wall- clock time with ¯xed time unit. A node that originates a query or receives the query from other nodes at the (T 0 +i)-th time slot (8 i2Z + [f0g), can process the query locally and/or forward zero or more processed replicas of the query to its immediate neighbors at the (T 0 +i+1)-th time slot. The collective processing of the query by the QDN nodes is completed when all ex- pected tuples in the relevant result set of the query are visited by at least one of the replicas of the query. Be- sidesthejoinandleaveprimitives,anaccessmethodde- ¯nes the forward primitive for query processing based on the constructed QDN index. The forward primi- tive can only use the information at the local node to process the query and to make forwarding decisions. During query processing, the L p distance between the query key ¡ ! q and the local key is computed to verify if the local tuple satis¯es the query condition. Also, with content-based access methods the forward primi- tive may measure the L p distances between the query key ¡ ! q and the neighbor keys to guide the query. Finally,wede¯nemetricstoevaluatethee±ciencyof aQDNaccessmethod. Anaccessmethodcanbeevalu- atedbasedonitsconstructioncost, and/orbasedonits query processing cost and performance. Unless the set of nodes participating in QDN is extremely dynamic, the computation (CPU time) and communication costs ofconstructingandmaintainingtheindexstructureare negligibleascomparedtothoseofthequeryprocessing. On the other hand, the space required to construct the distributed index is proportional to: algebra. S= X 8n2N jA(n)j Assuming a peer-to-peer model for a QDN, a uniform distributionofthespaceS(orequivalently,uniformdis- tributionofthenodeconnectivity)amongallthenodes is favorable. We de¯ne three other metrics to measure the e±ciency of a QDN access method for query pro- cessing. The¯rsttwometricsevaluatethecostofquery processing in terms of the required system resources, whereas the last one measures the system performance from the user perspective: 1. Communication cost (C 1 ): Average number of query replicas forwarded to complete the process- ing of a query. 2. Computationcost (C 2 ): AveragenumberofL p dis- tance computations to complete the processing of a query. 3. Querytime (T): Averageresponse-timeofaquery. If processing of a query starts at time slot T 0 and completes at time slot T 1 , the response-time of the query is equal to T 1 ¡T 0 . 2.2 Basic Access Method To set a lower bound for the e±ciency of the QDN access methods, we consider an access method that naively scans the QDN nodes to resolve the similar- ity queries. Optimally, with scanning C 1 and C 2 are £(jNj), and T is £(1). With QDNs, since queries can be replicated, scanning is not necessarily sequen- tial; hence, T can in fact be independent ofjNj. With any connected topology, simple °ooding of the query ensures a complete scan. However, various topologies balance the e±ciency metrics di®erently. A star topol- ogy is theoretically optimal, but star is not a realis- tic topology for QDNs because it fails to distribute S uniformly. Other examples of the topologies that al- low more uniform distribution of S are ring, spanning tree, and random graph, which all are near optimal in terms of C 1 and C 2 metrics, and O(jNj), O(logjNj), O(logjNj) in terms of T, respectively. We consider an access method with a random graph index topology and °ood-based query processing as the benchmark. The random graph G N;p is a graph with jNj nodes, where each pair of nodes are connected with probability p. Such a graph has Poisson connectivity distribution with average connectivity pjNj. Thus, the space requirement S of the index is fairly distributed among the nodes. Also, it is shown that for p > logjNj jNj thegraphisconnectedwithhighprobability[5]. Sucha graph has jEj =jNjlogjNj edges and the average dis- tancebetweenanytwonodesisO(logjNj);hence,C 1 is O(jNjlogjNj),C 2 isO(jNj),andTisO(logjNj). Ran- domgraphsarefrequentlyusedtomodellargenetworks (1,2) (2,2) (-1,2) (0,2) (-2,2) (1,1) (2,1) (-1,1) (0,1) (-2,1) (1,0) (2,0) (-1,0) (0,0) (-2,0) (1,-1) (2,-1) (-1,-1) (0,-1) (-2,-1) (1,-2) (2,-2) (-1,-2) (0,-2) (-2,-2) a. Hybrid small-world graph b. Small-world as QDN index Figure 2: The small-world model (such as the Internet), and since they are de¯ned prob- abilistically, as compared to loop-free spanning trees with¯xednumberofedgesjNj¡1,theyaremoreappro- priate for modelling the dynamic autonomous QDNs. 3 SWAM: A Family of Small-World-based Access Methods Wede¯neafamilyofe±cientaccessmethodsforQDNs, termed Small-World Access Methods (SWAM), which is designed based on the principles borrowed from the small-world models. Here, after a general overview of the useful properties of the small-world model, we de- ¯ne the SWAM family and characterize its properties. Also, asanexampleweintroduceSWAM-V,aVoronoi- based instance of SWAM, which satis¯es SWAM prop- erties and achieves query time, communication cost, andcomputationcostlogarithmictothesizeofthenet- work for all types of similarity queries. 3.1 Small-World Model as an Index Structure The small-world model is a network topology proposed to explain the small-world phenomenon, the fact that two individuals in a society (i.e., a social network) can e±ciently locate each other through a short chain of acquaintances logarithmic to the size of the network [20, 12]. The small-world graph is a hybrid graph, a superimposition of a regular grid and a dilute random graph(p¿1),inheritingboththeirproperties(seeFig- ure 2-a). It inherits average node-to-node path length O(logjNj)fromtherandomgraphcomponent,andhigh clustering property from the grid. A graph is clustered if the neighbors of a node are more probably the neigh- borsofeachotherratherthantheneighborsoftheother nodes in the network. For a node n clustering is mea- sured by the clustering coe±cient C(n), which is the realized fraction of all possible edges among the neigh- bors of n: C(n)=l Áµ jA(n)j 2 ¶ (1) where l is the number of existing edges among the neighbors of n. The clustering coe±cient of a graph is the average of the clustering coe±cients of its nodes. Foracompletegraph,agrid,andadiluterandomgraph G N;p , the clustering coe±cients are 1,' 3 4 , and p¿1, respectively. To demonstrate a direct application of the small- world graph as an index structure for a QDN, we con- sider the following simple QDN. Assume the key space V is a subspace ofZ d rather thanR d , and also assume all possible keys in V are available within the QDN, one key per QDN node. We can organize the topol- ogy of this QDN based on a small-world graph with a d-dimensional underlying grid as follows: 1. Grid component: The node storing the key ¡ ! k = ha 1 ;a 2 ;:::;a d i is a neighbor of all nodes with keys ¡ ! k 0 whereL p ( ¡ ! k ¡ ¡ ! k 0 )·b (b2Z + ); and 2. Random graph component: The node n k storing the key ¡ ! k = ha 1 ;a 2 ;:::;a d i is a neighbor of one other node n k 0 with key ¡ ! k 0 selected probabilisti- cally such that ifL p ( ¡ ! k ¡ ¡ ! k 0 ) = x, the probability ofselectingn 0 k astheneighborofn k isproportional to x ¡d (i.e., a power-law distribution). See Figure 2-b for an example with 2-dimensional key space, L 1 as the distance measure, and neighbor- hood boundary parameter b = 1. In [12], it is shown that with a greedy forwarding primitive, on average an exact-match query is resolved with T, C 1 , and C 2 all in O(logjNj). With the greedy forwarding, node n for- wardsaquery ¡ ! q onlytooneofitsneighborswithkey ¡ ! k such thatL p ( ¡ ! k ¡ ¡ ! q ) is minimum among all neighbors inA(n), i.e., the neighbor with the most similar key to the query key ¡ ! q is selected to receive the query. It is easy to see the underlying grid topology ensures that when a node with key ¡ ! k receives a query ¡ ! q , always either ¡ ! k = ¡ ! q or the node has at least one neighbor with the key ¡ ! k 0 such that L p ( ¡ ! k 0 ¡ ¡ ! q ) <L p ( ¡ ! k ¡ ¡ ! q ). Therefore, along the forwarding path of the query, the distance between the key at the current node and the targetkey ¡ ! q ismonotonicallydecreasingasthequeryis forwarded. Besides,theprobabilisticallyselectedneigh- borsactaslongjumpsthatensureexponentialdecrease of this distance on average. Thus, the average forward- ingpathlengthislogarithmictothesizeofthenetwork. The way we de¯ned the neighborhood relationship betweentheQDNnodesbasedonthedistancebetween their keys, together with the clustering property of the resulting small-world topology allow for the e®ective execution of other types of similarity queries as well. Ononehand, wede¯nedtheneighborhoodrelationship such that neighbors of a node have keys closely similar to the key of the node, and consequently, similar to each other. On the other hand, due to the clustering property of the generated small-world graph, neighbors of a node are closely connected in terms of the hop- count in the network (i.e., number of the edges on the path between each pair of nodes). Therefore, a locality root k1 k4 k3 k2 k11 k14 k12 k13 root N1 N2 N3 N4 N11 N44 k1 k2 k3 k4 k7 k8 k16 k9 k15 k14 k6 k10 k13 k12 k11 k5 c1 c2 c4 c3 N1 N4 N16 N12 k44 N14 N13 N12 ... N2 N3 N5 N6 N7 N8 N9 N10 N11 N13 N14 N15 c21 c24 c23 c22 a. Recursive partitioning b. Recursive partitioning example: GNA c. Flat partitioning Figure 3: Partitioning of the key space of tightly connected nodes with closely similar keys is created at the neighborhood of each node in the net- work. With a topology constructed out of such locali- ties, range andkNN queries can be executed e±ciently in two phases, ¯rst, by an exact-match query to locate the locality of the query key ¡ ! q , and second, by °ood- ing the query throughout the locality of ¡ ! q . With a localized topology, °ooding at the locality of the query key is e±cient. We can locate all the keys relevant to the range andkNN queries in a limited number of hops h away from ¡ ! q , where h is independent of the size of the network jNj. With our simple QDN example, for range and kNN queries all the relevant keys (and al- most only relevant keys) are visited within h = O(r) and h = O(dk 1 d e) hops from ¡ ! q , respectively. There- fore, for both types of queries, T is O(logjNj+h), C 1 is O(logjNj+h d ), and C 2 is O(dlogjNj+h d ). WithaninclusivekeyspaceV ½Z d ,thesimpleQDN example considered here is only of illustrative signi¯- cance. We,however,usethesamepropertiestodevelop SWAM that applies to more general QDN models. 3.2 SWAM Family Almost all the traditional access methods for database systemsarebasedononecoreideatoreducethesearch space for e±cient access (see the uni¯ed model in [7]). They recursively partition the key space into a set of disjoint similarity classes 2 . An index is then con- structed as a hierarchy of the class representatives at successive levels (see Figure 3-a). The hierarchical in- dex allows ¯ltering out (i.e., to dismiss without inspec- tion) the irrelevant/dissimilar classes while query is di- rected from the root of the hierarchy toward the simi- larity class of the query key. The average query time is logarithmic to the size of the database. By mapping each node of the hierarchy to a QDN node, the same idea can be directly applied to index QDNs, although as we show later the resulting dis- tributed hierarchical index structure is not appropriate for QDNs. Consider K as the set of keys available in a QDN. Any similarity-based relation can be used to 2 The generic mathematical term for similarity class is equiv- alence class. Here, the equivalence relations that partition the space are based on the distance (or similarity) between the keys. partition the key space. For example, in Figure 3-b, V is recursively partitioned based on the GNA approach [6]. Starting from V as the global similarity class, at each level the parent similarity class c with the class representative ¡ ! k 2 K is partitioned into a set of h disjoint subclasses c i with representative keys ¡ ! k i 2 K (i2 I h = [1::h]) such that c i =f ¡ ! k 0 2 VjL p ( ¡ ! k 0 ¡ ¡ ! k i ) < L p ( ¡ ! k 0 ¡ ¡ ! k j );8j 6= ig. Considering that in a QDN each key ¡ ! k resides at a QDN node n k , the GNA-tree cor- responding to such a space partitioning is a distributed GNA-tree in which A(n k ) = fn 2 Njn = n k i ;i 2 I h g. Query processing with such a distributed index tree is similar to thatof itscorresponding centralizedcounter- part, with query actually traversing a physically con- structed tree rather than a tree structure in memory. Although this indexing approach may seem appealing, due to the lack of a balance load among its nodes, is inappropriateforQDNs. Theunbalanceloadisevident by observing that nodes which represent larger similar- ity classes (i.e., nodes at the higher levels of the hierar- chy) receive more queries to process. In the extreme case, the root of the hierarchy processes all queries. Besides, as mentioned in Section 2.2 hierarchical struc- tures are loop-free and intolerant to failures and/or au- tonomous behaviors of the QDN nodes. SWAM also employs the space partitioning idea; however,toavoidtheproblemswithhierarchies,instead ofrecursivepartitioningassumesa°atpartitioning(see Figure3-c). Eachkey ¡ ! k 2K (orn k 2N)representsits own similarity class c k µ V and the set of jKj similar- ity classes are collectively exhaustive V = S k2K c k and mutually exclusive c k \c k 0 =?; ¡ ! k 6= ¡ ! k 0 . An uncharac- teristic case is where two or more nodes store replicas of the same key ¡ ! k. We assume all such nodes repre- sent the same class c k redundantly. Such a partitioning scheme can potentially balance the query processing load among QDN nodes. With hierarchies, neighbor- hood relationship between a pair of nodes is directly derived from parent-child relationship between their corresponding similarity classes to re°ect the similarity between their classes. Similarly, with °at partitioning we de¯ne the neighborhood relationship based on the adjacency relationship between the similarity classes A(n k ) = fn k 0 2 Njc k and c k 0 are adjacent;k 0 2 Kg. The resulting index structure is a graph instead of a loop-freetree. Besides,processingofthequerycanstart fromanynode(e.g., theactualqueryoriginator)rather than exclusively from a unique node, the root. The challenge is to de¯ne the similarity-based par- titioning relation such that the resulting graph-based index structure bears indexing characteristics similar to those of the hierarchical index structures. Particu- larly, itshouldallow¯lteringof(i.e., avoidvisiting)the irrelevant classes e®ectively as query is directed from a query originator toward the similarity class of the query key. Moreover, to support range and kNN simi- larity queries e®ectively, alike hierarchical index struc- turessimilarclassesshouldbeinproximityofeachother in terms of the hop-count in the index topology. Fi- nally, the O(logN) expected query time achieved by the hierarchies is also desirable with the graph-based index structure. As outlined in Section 3.1, these re- quirements are addressed by the properties of a basic small-world graph. A SWAM index structure is a gen- eral graph-based index structure that satis¯es a gener- alization of the same properties as follows: Property 1 : Monotonicapproachtowardquery key When a node with key ¡ ! k receives a query ¡ ! q , always either ¡ ! q 2c k , or the node has at least one neighbor with a key ¡ ! k 0 such that L p ( ¡ ! k 0 ¡ ¡ ! q ) < L p ( ¡ ! k ¡ ¡ ! q ). Consequently, if the node n k receives the query ¡ ! q , it is guaranteed that for all ¡ ! k 00 2 f ¡ ! j 2KjL p ( ¡ ! j ¡ ¡ ! q )¸L p ( ¡ ! k ¡ ¡ ! q )g the node n k 00 will never be visited in future during the greedy forwarding, and the similarity class c k 00 is ¯ltered. Property 2 : Localized index topology With a lo- calized index, for each node n k the set of nodes at its neighborhood A(n k ) are tightly connected and store keys closely similar to ¡ ! k. We mea- sure these two characteristics with the two metrics ClusteringCoe±cient(CC)andNeighborDistance Distribution (NDD), respectively. For a node n, CC n =C(n) is de¯ned by Equation 1. For a graph G,CC G = 1 jNj P 8n2N CC n . Also,NDDistheprob- abilitydistributionfunctionoftherandomvariable X = L p ( ¡ ! k 0 ¡ ¡ ! k), 8n k 2 N 8n k 0 2 A(n k ). As we discussedinSection3.1,alocalizedtopologyallows e±cientprocessingoftherangeandkNNsimilarity queries. Property 3 : Logarithmic forwarding-path length For an exact-match query (processed by greedy forwarding), on average T=O(logN). Any graph-based index structure that maintains these SWAM properties is a member of the SWAM family. In Section 3.3, we introduce an example SWAM index structure. k1 k2 k3 k4 k7 k8 k16 k9 k15 k14 k6 k10 k13 k12 k11 k5 N1 N4 N16 N12 N2 N3 N5 N6 N7 N8 N9 N10 N11 N13 N14 N15 Random Graph Component Delaunay Component a. Voronoi diagram and b. SWAM-V topology dual Delaunay graph Figure 4: SWAM-V index structure 3.3 SWAM-V: A Voronoi-based SWAM SWAM-V partitions the key space V to a Voronoi di- agram [15] (see Figure 4-a). For each key ¡ ! k i 2 K (i 2 I jKj ), n k i 2 N represents the similarity class c k i = f ¡ ! k 2 VjL p ( ¡ ! k ¡ ¡ ! k i ) < L p ( ¡ ! k ¡ ¡ ! k j );8j 6= ig, whichistheVoronoicell ofn k i . Accordingly,theneigh- borhood of the node n k i is de¯ned as A(n k i ) =fn k j 2 Njc k i and c k j are adjacent;8j 2 I jKj g. Nodes that store replicas of the same key share the same neighbor- hood; i.e., if ¡ ! k i = ¡ ! k j , A(n k i ) = A(n k j ). The result- ing graph is the dual Delaunay graph of the Voronoi diagram and is unique for each diagram (see Figure 4-a). Since the neighborhood relationship is symmet- ric, the Delaunary graph is depicted as an undirected graph. The SWAM-V topology consists of a random graph component (identical to that of the small-world graph) that is superimposed over the Delaunay graph (see Figure 4-b). Theorem 1 The SWAM-V index structure satis¯es the SWAM Property 1. Proof: The (extended) boundary between the cells of two neighboring nodes ¡ ! k i and ¡ ! k j is de¯ned as B( ¡ ! k i ; ¡ ! k j ) = f ¡ ! k 2 VjL p ( ¡ ! k ¡ ¡ ! k i ) = L p ( ¡ ! k ¡ ¡ ! k j )g. The boundary B( ¡ ! k i ; ¡ ! k j ) bisects the space into two half-spaces, where H( ¡ ! k i ; ¡ ! k j ) = f ¡ ! k 2 VjL p ( ¡ ! k ¡ ¡ ! k i ) < L p ( ¡ ! k ¡ ¡ ! k j )g is the similarity-dominance space of ¡ ! k i over ¡ ! k j , and vice versa. The simi- larity class of ¡ ! k i is alternatively de¯ned as c k i = T j2I jKj nfig H( ¡ ! k i ; ¡ ! k j ). Assume node n k i receives a query ¡ ! q . If ¡ ! q = 2 c k i , then ¡ ! q = 2 T j2I jKj nfig H( ¡ ! k i ; ¡ ! k j ). Hence, ¡ ! q 2 S j2I jKj nfig H( ¡ ! k j ; ¡ ! k i ). Therefore,9j2I jKj nfig;L p ( ¡ ! q¡ ¡ ! k j )<L p ( ¡ ! q ¡ ¡ ! k i ). The SWAM-V index structure also satis¯es Proper- ties 2 and 3. In Section 4, we verify property 2 by mea- suring the clustering coe±cients NDD and CC. Prop- erty3followsfromthesamepropertyofthesmall-world graph [12]. Below, we describe the primitives of the SWAM-V access method. // Join Delaunay graph n k i ÃExact-Match( ¡ ! k h ); n next Ãn k i ; repeat f mÃn next ; n next à Update(m); A(n k h )ÃA(n k h )[fmg; g until (nnext =n k i ); // Join random graph A(n k h )ÃA(n k h )[fn random g; Figure 5: SWAM-V join primitive 3.3.1 Index Construction As QDN nodes join the network, the SWAM-V in- dex structure is incrementally constructed. Consider a QDN of h¡ 1 nodes n k 1 to n k h¡1 . The new node n k h executes the join primitive shown in Figure 5 to join the two components of the SWAM-V index struc- ture, i.e., the Delaunay graph and the random graph. We assume that n k h has access to at least one of the nodes n k 1 to n k h¡1 , say n k a . Through n k a , n k h is- suesanexact-matchquery 3 (seeSection3.3.2.1)forkey ¡ ! q = ¡ ! k h to locate the node n k i such that ¡ ! k h 2 c k i . Thereafter, n k h constructs its cell c k h one border at a time, startingfromtheborderwith n k i . Againthrough n k a , n k h sends an update request to n k i . The update request is forwarded like an exact match query with ¡ ! q = ¡ ! k i to reach n k i . Upon receiving the update re- quest, n k i calculates the bisector B( ¡ ! k i ; ¡ ! k h ), and up- datesitsneighborhoodA(n k i )basedonitsnewdivided cell. In response, n k i sends the address of its neigh- bor n next to n k h (via n k a ), where n next is a neighbor of n k i such that B( ¡ ! k i ; ¡ ! k h )\ B( ¡ ! k i ; ¡ ¡¡ ! k next ) 6= ?. Since a Voronoi cell is a convex hull, there are at least two such neighbors for n k i . After receiving the update re- sponse, the new node n k h updates its neighborhood by A(n k h )ÃA(n k h )[fn k i g and repeats the same update procedure with n next . The update is completed when all borders of the Voronoi cell c k h are found, i.e., when n k h receives n k i as the response of an update request. An exception to the procedure described above is when ¡ ! k h = ¡ ! k i . In such a case, the new key is a new replica of an existing key and c k h =c k i . To complete the construction of the SWAM-V index structure, in addition to the Delaunay graph the new node n k h must also join the random graph component by selecting a node n random such thatL p ( ¡ ! k h ¡ ¡ ¡¡¡¡ ! k random ) follows the power-law distribution. We piggy-back this step to the previous step by having n k h select n random among all the nodes that are previously visited by the update requests. This step completes the execution of 3 InSection2.1,wede¯nedthequeryresultasatupleset. Here, equivalentlyweconsidertheaddressofthenode(s)storingthere- sultingtuple(s)asthequeryresulttoexplaintheimplementation of the access method primitives. the join primitive. The leave primitive implements the reverseprocedureandistrivial. Weomitdescriptionof the leave primitive due to lack of space. 3.3.2 Query Processing Here, we describe three forwarding primitives to pro- cess various types of queries using the SWAM-V index structure. 3.3.2.1 Exact-Match Query An exact-match query is executed by greedy forward- ing. Whennoden k receivesaquery ¡ ! q ,ifL p ( ¡ ! k¡ ¡ ! q )< min 8n k i 2A(n k ) L p ( ¡ ! k i ¡ ¡ ! q ), then ¡ ! q 2c k . Therefore, ei- ther ¡ ! q = ¡ ! k or ¡ ! q = 2 K, where in both cases query is terminated with the result sets R =fn k g and R =?, respectively. Otherwise, n k continues forwarding the query by sending the query to one of its neighbors n k m such that L p ( ¡ ! k m ¡ ¡ ! q ) = min 8n k i 2A(n k ) L p ( ¡ ! k i ¡ ¡ ! q ). FromTheorem1,weknowthatinworst-casethegreedy forwarding terminates injNj hops. 3.3.2.2 Range Query Arangequerywithquerykey ¡ ! q andrangerisexecuted in two successive phases: 1) to locate the locality (or similarity class) of the query key, and 2) to explore the nodes located within the query sphere with radius r centered at the locality of the query key (see Figure 6). At Phase 1, the query is interpreted as an exact-match query with query key ¡ ! q . This phase terminates when the query reaches n k such that ¡ ! q 2 c k . At Phase 2, starting from n k , each node n that receives the query for the ¯rst time, forwards it to all its neighbors m k 0 2 A(n) if and only ifL p ( ¡ ! k 0 ¡ ¡ ! q )<r. Theorem 2 ensures thatwhentheselective°oodingisterminated,allnodes n k 00 2N withL p ( ¡ ! k 00 ¡ ¡ ! q )<rarevisitedbysomequery replica. Theorem 2 SWAM-V answers all range queries with- out any false dismissal. Proof: Assume n k is the nearest neighbor to the query key ¡ ! q ; i.e., ¡ ! q 2 c k . We prove the the- orem by contradiction. Suppose 9 n k 00 2 N such that L p ( ¡ ! k 00 ¡ ¡ ! q ) < r, but n k 00 is not visited. From Theorem 1, we know that between n k 00 and n k there exist a path P k 00, a sequence of neighboring nodes q Querier Phase 1 Phase 2 Figure 6: SWAM-V range query n k h =k 00;n k h¡1 ;:::;n k 2 ;n k 1 =k , such thatL p ( ¡ ¡ ! k i¡1 ¡ ¡ ! q ) < L p ( ¡ ! k i ¡ ¡ ! q ) 8i 2 [2::h]. Therefore, 8i 2 [1::h ¡ 1]; L p ( ¡ ! k i ¡ ¡ ! q ) < r. However, the node n k=k 1 re- ceives the query at the end of Phase 1. Thus, based on the condition for selective °ooding 8i 2 [2::h];n k i must also receive the query. Hence, n k 00 =k h receives the query, contradicting our assumption. WithProperty2,the°oodingtimeisproportionalto the range r and independent of the size of the network jNj (see Section 3.3.3). 3.3.2.3 kNN Query Similartorangequeries,akNNqueryisexecutedintwo phases. As with range queries, Phase 1 is equivalent to an exact-match query for the query key ¡ ! q to locate n k such that ¡ ! q 2 c k . Since8k 0 2 Knfkg, L p ( ¡ ! k ¡ ¡ ! q ) < L p ( ¡ ! k 0 ¡ ¡ ! q ), the node n k is the 1st nearest neighbor n k 1NN of the query ¡ ! q . At Phase 2, the rest of the k nearestneighbors, n k 2NN , n k 3NN , ..., n k kNN , arelocated following Theorem 3. Theorem 3 Thek-thnearestneighborto ¡ ! q isaneigh- bor of one of the nearer neighbors of ¡ ! q ; i.e., 8 k 2 [2::jKj], n k kNN 2 S i2[1::k¡1] A(n k iNN ). Proof: Weprovethetheorembycontradiction. Sup- pose n k kNN = 2 S i2[1::k¡1] A(n k iNN ). From Theorem 1, we know 9 m k 0 2 A(n k kNN ) such that L p ( ¡ ! k 0 ¡ ¡ ! q ) < L p ( ¡¡¡! k kNN ¡ ¡ ! q ). Therefore,m k 0 2fn k iNN ji2[1::k¡1]g. Hence, n k kNN 2 S i2[1::k¡1] A(n k iNN ). Thus, at Phase 2, starting from n k 1NN = n k , query is forwarded from n k iNN to n k (i+1)NN ;i2[1::k¡1] until it visits all k nearest neighbors of ¡ ! q . When the i-th nearest neighbor n k iNN receives the query, it locates the next node n k (i+1)NN as follows. The (i+1)-th near- est neighbor is n k (i+1)NN 2 S j2[1::i] A(n k jNN ) such that L p ( ¡ ¡¡¡¡¡ ! k (i+1)NN ¡ ¡ ! q ) = min 8m k 02 S j2[1::i] A(n k jNN ) L p ( ¡ ! k 0 ¡ ¡ ! q ). n k iNN forwards the query, which includes the set S j2[1::i¡1] A(n k jNN ) received from the previous node piggy-backed with its own neighborhood A(n k iNN ), to the next node n k (i+1)NN . Phase 2 terminates ink hops. 3.3.3 Analysis We evaluate the e±ciency metrics for SWAM-V as fol- lows: ² Space: From [18], we have: S= 8 < : P s i=1 jNj i µ jNj¡i¡1 i¡1 ¶µ i 2¡i ¶ d+1=2s P s i=1 3 i+1 µ jNj¡i¡1 i ¶µ i+1 2¡i ¶ d=2s S is uniformly distributed among n 2 N and, e.g., with a 2-dimensional space jA(n)j average = S=jNj'6. ² Querytime: Fortheexact-matchquery,withProp- erty 1 in the worst-case T = O(jNj), and with Property 3 on average T = O(logjNj). Also, on average for the two-phase range queries (with se- lectivitys) andkNN queries,T is O(logjNj+sN) and O(logjNj+k), respectively. ² Communication cost: For the exact-match query on average C 1 = O(logjNj). Also, on average for range and kNN queries, C 1 is O(logjNj + sNjA(n)j average ) and O(logjNj+k), respectively. ² Computation cost: Similarly, for the exact-match query on average C 2 = O(jA(n)j average logjNj). Also, on average for range and kNN queries, C 2 is O(jA(n)j average (logjNj+sN)) and O(jA(n)j average (logjNj+k)), respectively. 4 Experimental Analysis We conducted a set of experiments via simulation to verifytheresultsofourstudy. Weimplementedamulti- thread simulator in C and used two Enterprise 250 Sun servers to preform the experiments. Based on the ef- ¯ciency metrics introduced in Section 2.1, we compare SWAM-V versus our basic access method (see Section 2.2) as well as CAN [16], which is a multi-dimensional DHT. 4.1 Experimental Methodology Our simulation consists of a set of \runs". We setup each run by 1) generating a dataset, 2) distributing the dataset among a set of QDN nodes, 3) indexing the QDNwiththethreeaccessmethodsSWAM-V,BASIC, and CAN, and 4) running 1000 queries (all of which either exact-match, range, or kNN) and recording the averageT,C 1 , andC 2 for each of the index structures astheresultoftherun. Eachresultdata-pointreported in Section 4.2 is the average result of 10 runs. Below, we explain the detail of each setup. We consider the Hilbert spaces (V;L 1 ), (V;L 2 ), and (V;L 1 )asthekey/dataspacewithV =V 1 £V 2 £:::£V d as a d-dimensional hypercube, where V i = [¡1;1]. We generate a dataset of jKj keys by selecting each at- tribute a i 2 V i of a key ¡ ! k = ha 1 ;a 2 ;:::;a d i follow- ing either a uniform distribution, or a normal distri- bution 4 with expected value ¹ = 0 and standard devi- ation ¾ =0:33. We distribute the set of keys K among a set of nodes N, where jNj = c ¡1 jKj with c = 2 as the data replication coe±cient. With SWAM-V and BASIC, ¯rst keys are randomly assigned to the nodes andthentheindexstructures(i.e.,QDNtopologies)are constructed based on the actual content of the nodes. For the comparison to be fair, we select p for the BA- SIC random graph G N;p such that S BASIC = S SWAM¡V . 4 f(x)= 1 ¾ p 2¼ e ¡(x¡¹) 2 =2¾ 2 Access Methods Exact-Match Range (s=1%) kNN (k=5) T C 1 C 2 T C 1 C 2 T C 1 C 2 BASIC 3.77 16835.53 5000 3.77 16835.53 5000 3.77 16835.53 5000 SWAM-V 3.82 3.82 68.51 6.84 57.27 125.3 9.24 9.24 171.35 CAN 5.67 5.67 47.73 13.38 78.36 111.28 10.9 51.37 93.72 Table 1: Comparative study results with d=5 and N =5000 On the other hand, with CAN keys cannot be assigned to the nodes randomly. The CAN index structure is ¯rst constructed based on the identi¯ers of the nodes (randomly selected from V) rather than their content. Each key is then assigned to a particular node with similar identi¯er. We described query processing with SWAM-V and BASIC in Sections 3.3.2 and 2.2. The original CAN access method only supports exact-match queries via greedy forwarding. Even for the exact-match queries, since CAN does not satisfy the SWAM Property 1, during query forwarding the distance from the current node to the query key ¡ ! q may notbe monotonically de- creasing. Therefore, with greedy forwarding query may be trapped at a node that does not have any neigh- bor closer than itself to ¡ ! q , and consequently, query may never reach the target. We exclude the CAN queries that result in false dismissal from our com- parative study. Also, to implement range and kNN queries with CAN we adopt the 2-phase query process- ing scheme from SWAM-V. However, since Theorems 2 and 3 do not hold for CAN, for both range and kNN queries we use naive, non-selective °ooding (similar to queryprocessingwithBASIC)inthesecondphase. We applyscope-limited°oodingandonlyconsiderthetime TandcostsC 1 andC 2 requiredto completethequery. For each query, we select the node that originates the query and the query key ¡ ! q randomly from N and K, respectively. For range queries, the range r is se- lected such that the selectivity s of the query varies from 0:01% to 1%. Finally, for kNN queries we con- sider queries with k=2;5;10. 4.2 Experimental Results The di®erence between the trend of the results for uni- form and normal data distributions, as well as the re- sults for various distance functions L 1 , L 2 , and L 1 , is insigni¯cant. Here, we report the results for uniform datadistributionwithL 2 asthedistancefunctionofthe space. Table1showsatypicalsetofcomparativeresults foraQDNwithd=5dimensionsandN =5000nodes. To achieve logarithmic query time, BASIC spends the communication and computation resources of the QDN unlimitedly. As illustrated, SWAM-V and CAN are both comparable to BASIC in terms of the query time, andinvesttheresourcesmuchmoree±ciently. SWAM- VconsistentlyoutperformsCANintermsofTandC 1 . We attribute this advantage to the SWAM properties. Particularly, with Properties 1 and 2, the index struc- ture accurately partitions the space and co-locates the similarcontent,enablingbetter¯lteringandlessredun- dant query processing. Also, Property 3 allows more e±cient traversal of the index structure. The cost of more accurate partitioning is higher connectivity of the nodes, and consequently, higher computation cost C 2 . Figure 7 depicts the scaling properties of SWAM-V ascomparedtothoseofCAN.Figure7-aillustrateshow thecommunicationcostofthekNNquerieswithk=10 in a QDN of N =10000 nodes scales as dimensionality of the data space increases. With SWAM-V, C 1 re- mains almost unchanged at di®erent dimensions, while CAN is less e±cient at high-dimensional spaces. The- orem 3 ensures ¯nding each next nearest-neighbor in onehop,irrespectiveofthedimensionalityofthespace. On the other hand, with inaccurate space partitioning CAN cannot bene¯t from such a property. With blind °ooding, the communication cost of the query is pro- portional to the connectivity of nodes, which grows as dimensionality of the space increases. Figure 7-b shows howthequerytimeoftherangequerieswithselectivity s=0:1% scales as QDN grows in size. The dimension- ality of the data space is ¯xed to d = 10. The random graphcomponentofSWAM-Vsatis¯esProperty3, and similar to BASIC, enables SWAM-V to maintain the logarithmic query processing time as the network size scales. On the other hand, since CAN only consists of a grid-like component, the traversal of the index slows sublinearly with increase in the size of the network. Finally, we calculated the locality measures CC G and NDD for 1000 SWAM-V index structures with size and data dimensionality randomly selected from f500;1000;5000;10000g and f2;5;10;20g, respectively. Onaverage,wefoundCC G '0:54,whichiscomparable to CC G = 0:75 for the small-world graph. Also, NDD is a sharp normal distribution with standard deviation ¾ = 0:02, which ensures high content locality with the SWAM-V index topology. 0 25 50 75 100 125 150 175 200 225 250 2 5 10 20 Dimensionality (d) Communication Cost (C1) SWAM-V CAN 0 5 10 15 500 1000 5000 10000 Network Size (N) Query Time (T) BASIC SWAM-V CAN a. Dimensionality scaling b. Network size scaling Figure 7: The scaling results 5 Related Work Multidimensional access methods for database systems can be categorized to two classes [9]: hierarchical ac- cess methods and hash-based access methods. With the advent of the distributed and networked database systems,andmostrecentlyQDNs,accessmethodsfrom each of these classes are extended to support e±cient access in distributed environments. In [21], a 2-level hierarchy with a distributed root architecture is pro- posedtoindexaQDN.Withtwolevelsinthehierarchy, the ¯ltering power of this access method is limited. In [1], pre¯x search tree is used as an index structure for QDNs. Also,in[14]aVoronoi-basedheuristicisapplied todevelopasearchtree. AswediscussedinSection3.2, hierarchicalindexstructuresfailtodistributethequery execution load fairly among QDN nodes. On the other hand, in [11] and [4] DHTs are assumed as the fun- damental data access mechanism for in-network query processing with QDNs. Also, in [10] a range-caching scheme is developed based on the DHT indexing to support range queries. Distributed hash-based access methods such as LH ¤ [13] and DHTs [16, 19] assume QDN as a distributed storage system, where users are indi®erenttotheplacementofthedatawithinthedata network and only insist on data availability [17]. Such a service model is inconsistent with the typical usage of QDNs as content-sharing federation of autonomous nodes that maintain their own content. SWAM-V re- spects autonomy of the QDN nodes. 6 Conclusion and Future Work In this paper, ¯rst we de¯ned a formal framework to study the problem of similarity search in QDNs. Sub- sequently, we proposed a set of properties to gener- ate e±cient index structures (i.e., QDN topologies) for processing similarity queries in QDNs. These prop- erties characterize a family of access methods, the SWAM family. We introduced SWAM-V, a member of the SWAM family, which supports exact-match, range, and kNN queries for QDNs with multi-attribute ob- jects. Leveraging on the SWAM properties, SWAM-V achieves query time, communication cost, and compu- tation cost logarithmic to the size of the network. We veri¯ed these results via both analysis and simulation. Moreover, since unlike DHTs, SWAM-V does not en- force the placement of the objects within the network, itavoidsunnecessarycontentreplacement,supportsob- ject replication, and adapts to the object distribution. We intend to extend this study by investigating other members of the SWAM family that as com- pared to SWAM-V enforce less constraining assump- tions and support QDN applications with speci¯c re- strictions/requirements. With some QDN applications, strictenforcementoftheneighborselectionrulestocon- struct the index structure is either impossible or ine±- cient. Currently, we are studying SWAM-P, an access method with probabilistic index topology and °exible neighbor selection policies that allow QDN nodes to exercise their autonomy in selecting their neighbors as they join the QDN. Our initial results show that as the QDN nodes exercise more autonomy, the e±ciency of SWAM-PgracefullydegradesfromthatofSWAM-Vto the e±ciency of the basic access method [3]. Also, with SWAM-V we assume queries are much more frequent than updates. However, with some QDN applications, datasets and/or node-sets are extremely dynamic such that the overhead of maintaining the index structure exceeds the bene¯t of using the access method. With suchapplications,e±cientbutsimplescan-basedaccess methodsmayoutperformtheaccessmethodswithcom- plex index structures. In [2], we present initial results of our e®ort to develop such an access method. References [1] K. Aberer, P. Cudr-Mauroux, A. Datta, Z. Despotovic, M. Hauswirth, M. Punceva, and R. Schmidt. P-grid: A self-organizing structured p2p sys- tem. SIGMOD Record, 32(2), 2003. [2] F. Banaei-Kashani and C. Shahabi. E±cient °ooding in power-law networks. In Proceedings of Twenty-Second ACM Symposium on Principles of Distributed Computing (PODC'03), July 2003. [3] F. Banaei-Kashani and C. Shahabi. Searchable querical data networks. In Proceedings of the International Workshop on Databases, Information Systems and Peer-to-Peer Computing in conjunction with VLDB'03, September 2003. [4] M. Bawa, G. Manku, and P. Raghavan. SETS: Search enhanced by topic- segmentation. In Proceedings of the 26th Annual International Conference on Research and Development in Informaion Retrieval (SIGIR'03), August 2003. [5] B. Bollob¶ as. Random Graphs. Academic Press, New York, 1985. [6] S. Brin. Near neighbor search in large metric spaces. In Proceedings of the 21thInternationalConferenceonVeryLargeDataBases(VLDB'95),September 1995. [7] E. Chavez, G. Navarro, R. A. Baeza-Yates, and J. L. Marroquin. Searching in metric spaces. ACM Computing Surveys, 33(3), 2001. [8] A.Doan, P.Domingos, andA.Halevy. Reconcilingschemasofdisparatedata sources: A machine-learning approach. In Proceedings of ACM International Conference on Management of Data (SIGMOD'01), November 2001. [9] V. Gaede and O. GÄ unther. Multidimensional access methods. ACM Comput- ing Surveys, 30(2), 1997. [10] A. Gupta, D. Agrawal, and A. El Abbadi. Approximate range selection queries in peer-to-peer systems. In Proceedings of the First Biennial Con- ference on Innovative Data Systems Research, January 2003. [11] R. Huebsch, N. Lanham, B. Loo, J. Hellerstein, S. Shenker, and I. Stoica. Querying the inernet with PIER. In Proceedings of 29th International Confer- ence on Very Large Data Bases (VLDB'03), September 2003. [12] J. Kleinberg. The small-world phenomenon: an algorithmic perspective. In Proceedings of the 32nd ACM Symposium on Theory of Computing, May 2000. [13] W. Litwin, M. Neimat, and D. Schneider. LH ¤ : A scalable, distributed data structure. ACM Transactions on Database Systems, 21(4), 1996. [14] G. Navarro. Searching in metric spaces by spatial approximation. The Very Large Databases Journal (VLDBJ), 11(1), 2002. [15] A.Okabe,B.Boots,K.Sugihara,andS.Chiu. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. John Wiley, 2nd edition, 2000. [16] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content-addressable network. In Proceedings of ACM SIGCOMM '01, August 2001. [17] S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. Govindan, L. Yin, and F. Yu. Data-centric storage in sensornets with GHT, a Geographic Hash Table. MONET Journal: Special Issue on Algorithmic Solutions for Wireless, Mobile, Ad Hoc and Sensor Networks, 8(4), 2003. [18] R.Seidel. Exact upper bounds for the number of faces in d-dimensional Voronoi diagrams, DIMACS Series, volume 4. American Mathematical Society, 1991. [19] I.Stoica,R.Morris,D.Karger,M.Kaashoek,andH.Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of ACM SIGCOMM '01, August 2001. [20] D. Watts and S. Strogatz. Collective dynamics of small world networks. Nature, (393):440{442, 1998. [21] B. Yang and H. Garcia-Molina. Designing a super-peer network. In Pro- ceedings of the 19th International Conference on Data Engineering (ICDE'03), March 2003.
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 828 (2004)
PDF
USC Computer Science Technical Reports, no. 896 (2008)
PDF
USC Computer Science Technical Reports, no. 736 (2000)
PDF
USC Computer Science Technical Reports, no. 839 (2004)
PDF
USC Computer Science Technical Reports, no. 893 (2007)
PDF
USC Computer Science Technical Reports, no. 590 (1994)
PDF
USC Computer Science Technical Reports, no. 869 (2005)
PDF
USC Computer Science Technical Reports, no. 622 (1995)
PDF
USC Computer Science Technical Reports, no. 740 (2001)
PDF
USC Computer Science Technical Reports, no. 650 (1997)
PDF
USC Computer Science Technical Reports, no. 855 (2005)
PDF
USC Computer Science Technical Reports, no. 831 (2004)
PDF
USC Computer Science Technical Reports, no. 966 (2016)
PDF
USC Computer Science Technical Reports, no. 754 (2002)
PDF
USC Computer Science Technical Reports, no. 962 (2015)
PDF
USC Computer Science Technical Reports, no. 835 (2004)
PDF
USC Computer Science Technical Reports, no. 587 (1994)
PDF
USC Computer Science Technical Reports, no. 968 (2016)
PDF
USC Computer Science Technical Reports, no. 943 (2014)
PDF
USC Computer Science Technical Reports, no. 826 (2004)
Description
Farnoush Banaei-Kashani (banaeika@usc.edu), Cyrus Shahabi (shahabi@usc.edu). "SWAM: A family of access methods for similarity search in querical data networks." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 813 (2004).
Asset Metadata
Creator
Banaei-Kashani, Farnoush
(author),
Shahabi, Cyrus
(author)
Core Title
USC Computer Science Technical Reports, no. 813 (2004)
Alternative Title
SWAM: A family of access methods for similarity search in querical data networks (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
10 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16269493
Identifier
04-813 SWAM A Family of Access Methods for Similarity Search in Querical Data Networks (filename)
Legacy Identifier
usc-cstr-04-813
Format
10 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Description
Archive of computer science technical reports published by the USC Department of Computer Science from 1991 - 2017.
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/