Please use this identifier to cite or link to this item:
|Title:||Cloud based publish/subscribe model for Top-k matching over continuous data streams|
|Abstract:||Publish/subscribe systems are widely recognized in processing continuous queries over data streams and are augmented by algorithms coming from the eld of data stream processing. Existing functions which are capable of matching publications & subscriptions in state-ofthe- art publish/subscribe systems are depended on a stateless function which provides only a Boolean decision on whether a given publication is to be noti ed to relevant subscriber or not. But in such systems, the large quantity of received publications may be considered as a sort of spam, while a system that delivers too few publications might be recognized as non-working. In our study, we propose an advanced publish/subscribe matching model to control the unpredictable number of delivered publications over a continuous data-stream, where at a given time t our model limits the number of delivered publications by parameter k, while ranks them within a size w of sliding window. A general scoring mechanism is exploited where publications get scored against personalized user subscription spaces based on the relevancy. We adopt an inverted-list data structure to index the subscription space to enhance the e ciency of matching process. Also we focus on the problem of selecting the k-most diverse items from a relevant result set, in a dynamic setting where Top-k results change over time. We formalize the above problem of continuous k-diversity as MAXDIVREL which maps to the independent dominating set problem in graph theory, which is NP-hard. An incremental indexing mechanism is proposed for handling streaming publications that is based on Locality Sensitive Hashing (LSH) to diversify Top-k results continuously. Our prototype model is implemented in a cloud based message broker system and we have designed it to scale on top of Amazon Web Services (AWS): a scalable cloud-service provider. We explore the natural behavior of ranked publications mathematically modeled by zipf property. Based on the experiments across many diversity methods, MAXDIVREL exhibits the strongest natural behavior. Also the proposed LSH indexing mechanism produces MAXDIVREL diverse set of results at 70% accuracy by comparing with naive optimal method. Finally, we report the experimental results concerning the performance & e ciency of the proposed indexing mechanisms on a variety of synthetic datasets.|
|Appears in Collections:||SCS Individual Project - Final Thesis (2014)|
Files in This Item:
|2.97 MB||Adobe PDF||View/Open Request a copy|
Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.