Projects‎ > ‎

ProSem: Internet-Scale Publish/Subscribe Unifying Data Processing and Dissemination

Supported by National Science Foundation Award IIS-0713498, III-COR: Scalable Publish/Subscribe: Unifying Data Processing and Dissemination (abstract)

Principle investigators: Jun Yang and Pankaj K. Agarwal, Duke University

People

Alumni:

  • Badrish Chandramouli (PhD, 2008; now at Microsoft Research)
  • Jeff M. Phillips (PhD, 2009; now at University of Utah)
  • Weiping Zhang (Undergraduate Student)

Introduction

The Digital Age has brought about unprecedented growth in the amount of data being generated, the number of data consumers, and the diversity of their interests and locations. Traditionally, users poll sources for information, but for many applications, polling is hardly scalable and may miss important events. The alternative offered by publish/subscribe systems is to push notifications to users with matching interests. This approach suits many applications, ranging from personal, commercial, medical, to environmental, military, and security. However, traditional publish/subscribe systems are becoming inadequate for advanced applications, where users want to receive information that has been filtered, joined, and summarized, and only when certain conditions are met.

This project aims at building a next-generation publish/subscribe system to face the new challenges. We are developing an end-to-end solution consisting of techniques from subscription processing and indexing to dissemination network design, which work together to support efficient and powerful subscription functionalities, allowing users to control precisely what they want and when they want it.

One main feature distinguishing our approach from previous work is joint consideration of subscription processing and notification dissemination. Traditionally, these problems are considered separately by database and networking communities. However, there exists a wide spectrum of interesting alternatives for interfacing processing with dissemination. We propose a promising novel approach called reforumulation that allows complex, stateful subscriptions to be handled by simple, stateless dissemination mechanisms, with a clean system design that is easy to implement and scale. A cost-based optimizer, inspired by database query optimization, chooses the best processing and dissemination strategies jointly and dynamically.

Besides system building, this project tackles many new algorithmic challenges, including, e.g., scalably processing a large number of complex subscriptions; exploiting event and subscription characteristics to combat worst-case complexity; balancing semantic similarity and network proximity in dissemination network design; and efficiently maintaining statistics for high-dimensional events and subscriptions.

Progress

In the first year of this project, we made progress on the following specific research problems: (1) ProSem system development and demonstration; (2) scalable processing and dissemination of select-join subscriptions; (3) dissemination network design for wide-area publish/subscribe; (4) scalable processing and dissemination of value-based notification conditions; (5) input-sensitive scalable continuous join query processing; (6) querying uncertain data; (7) maintaining data summaries. A detailed description of our contributions can be found below in our 2007-2008 project report.

In the second year of this project, we continued to make progress on (1), (3), (5), (6), and (7). In addition, we worked on the following problems: (8) a generator for wide-area content-based publish/subscribe workloads, which we are making public; (9) extending our ideas and framework beyond relational data and queries to information retrieval and extraction context; (10) data aggregation and scheduling over a network. A detailed description of our contributions can be found below in our 2008-2009 project report.

In the third year of this project, we wrapped up our work on (8), and embarked on the following problems: (11) dissemination network design for wide-area publish/subscribe; (12) scalable support for range top-k subscriptions; (13) scalable support for top-k preference subscriptions. A detailed description of our contributions can be found below in our 2009-2010 project report.

In the fourth and final year of this project, we wrapped up our work on (11), (12), (13), and further studied (14) subscriptions that arise in the context of computational journalism; (15) finding one-of-the-few patterns. A detailed description of these efforts can be found below in our final project report.

We have made a series of contributions to the area of publish/subscribe, its related areas of continuous query processing, as well as application domains that require queries beyond traditional database queries. In particular, we offered unique perspectives on the integration of database processing and network dissemination, and on the use of geometric frameworks for developing our solutions. Many contributions have been published in premier conferences: ISAAC 2005, DASFAA 2006, SIGMOD 2006, SoCG 2006, VLDB 2006, SoCG 2007, VLDB 2007, ACM-GIS 2007, SIGMOD 2008, SoCG 2008, VLDB 2008, SODA 2009, ICDE 2009, DCOSS 2009, SoCG 2009, PODS 2009, SIGMOD 2009, NetDB 2009, TODS (2009), SODA 2010, SoCG 2010, INFOCOM 2010, MILCOM 2010, ESA 2010, ACM-GIS 2010, CIDR 2011, ICDT 2011, SODA 2011, INFOCOM 2011, ICDE 2011, ACM-GIS 2011, SODA 2012, ICDE 2012. For detailed descriptions of these contributions, please refer to our project reports and publications below.

  • Project report (PDF) for academic year 2007-2008.
  • Project report (PDF) for academic year 2008-2009.
  • Project report (PDF) for academic year 2009-2010.
  • Final project report (PDF) in 2011.

Publications

  • Albert Yu, Pankaj K. Agarwal, and Jun Yang. "Processing and Notifying Range Top-k Subscriptions." In Proceedings of the 28th International Conference on Data Engineering (ICDE '12), Washington, DC, USA, April 2012.
    Available for download: paper and technical report.
  • Albert Yu, Pankaj K. Agarwal, and Jun Yang. "Subscriber Assignment for Wide-Area Content-Based Publish/Subscribe." In Proceedings of the 27th International Conference on Data Engineering (ICDE '11), Hannover, Germany, April 2011.
    Available for download: paper and technical report.
  • Sarah Cohen, Chengkai Li, Jun Yang, and Cong Yu. "Computational Journalism: A Call to Arms to Database Researchers." In Proceedings of the 5th Biennial Conference on Innovative Data Systems Research (CIDR '11), Asilomar, California, USA, January 2011. Acceptance rate: 52.2%.
    Available for download: paper.
  • Pankaj K. Agarwal, Junyi Xie, Jun Yang, and Hai Yu. "Input-Sensitive Scalable Continuous Join Query Processing." ACM Transactions on Database Systems (TODS), 34(3):1-41, August 2009. Results in this paper subsume those in the paper titled "Scalable Continuous Query Processing by Tracking Hotspots" in the 32nd International Conference on Very Large Data Bases (VLDB '06).
    Available for download: paper.
  • Albert Yu, Pankaj K. Agarwal, and Jun Yang. "Generating Wide-Area Content-Based Publish/Subscribe Workloads." In Proceedings of the 5th International Workshop on Networking Meets Databases (NetDB '09), Big Sky, Montana, USA, October 2009.
    Available for download: paper.
  • Fei Chen, Byron Gao, AnHai Doan, Jun Yang, and Raghu Ramakrishnan. "Optimizing Complex Extraction Programs over Evolving Text Data." In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD '09), Providence, Rhode Island, USA, June 2009. Acceptance rate: 15.9%.
  • Risi Thonangi, Hao He, AnHai Doan, Haixun Wang, and Jun Yang. "Weighted Proximity Best-Joins for Information Retrieval." In Proceedings of the 25th International Conference on Data Engineering (ICDE '09), Shanghai, China, March 2009. Acceptance rate: 16.8%.
    Available for download: paper.
  • Badrish Chandramouli and Jun Yang. "End-to-End Support for Joins in Large-Scale Publish/Subscribe Systems." In Proceedings of the 34th International Conference on Very Large Data Bases (VLDB '08), Auckland, New Zealand, August 2008. Acceptance rate: 16.5%.
    Available for download: paper.
  • Badrish Chandramouli, Jun Yang, Pankaj K. Agarwal, Albert Yu, and Ying Zheng. "ProSem: Scalable Wide-Area Publish/Subscribe." In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08), Vancouver, Canada, June 2008. System demonstration description. Acceptance rate: 31.9%.
    Available for download: paper.
  • Badrish Chandramouli, Jeff M. Phillips, and Jun Yang. "Value-Based Notification Conditions in Large-Scale Publish/Subscribe Systems." In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB '07), Vienna, Austria, September 2007. Acceptance rate: 16.4%.
    Available for download: paper.
  • Pankaj K. Agarwal, Junyi Xie, Jun Yang, and Hai Yu. "Scalable Continuous Query Processing by Tracking Hotspots." In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB '06), Seoul, Korea, September 2006. Acceptance rate: 13.8%. Results in this paper are subsumed by those in the paper titled "Input-Sensitive Scalable Continuous Join Query Processing" in ACM Transactions on Database Systems (TODS) 2009.
    Available for download: paper and technical report.
  • Badrish Chandramouli, Junyi Xie, and Jun Yang. "On the Database/Network Interface in Large-Scale Publish/Subscribe Systems." In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD '06), Chicago, Illinois, USA, June 2006. Acceptance rate: 13.0%.
    Available for download: paper and technical report.
  • Badrish Chandramouli, Jun Yang, and Amin Vahdat. "Distributed Network Querying with Bounded Approximate Caching." In Proceedings of the 11th International Conference on Database Systems for Advanced Applications (DASFAA '06), Singapore, April 2006. Acceptance rate: 24.5%.
    Available for download: paper and technical report.
  • Pankaj K. Agarwal, Junyi Xie, Jun Yang, and Hai Yu. "Monitoring Continuous Band-Join Queries over Dynamic Data." In Proceedings of the 16th Annual International Symposium on Algorithms and Computation (ISAAC '05), Sanya, Hainan, China, December 2005.
    Available for download: paper.
  • P. K. Agarwal, T. Mølhave, B. Sadri, "I/O-efficient contour queries on terrains", Proc. 22nd Annual ACM-SIAM Symposium on Discrete Algorithms, pages 268-284, 2011.
  • P. Afshani, P. K. Agarwal, L. Arge, K. D. Larsen, and J. M. Phillips, "(Approximate) uncertain skylines," Fourteenth International Conference on Database Theory, pages 186-196, 2011.
  • P. K. Agarwal, R. Sharathkumar, "Streaming algorithms for extent problems in high dimensions," Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, 2010.
  • P.K. Agarwal, S. Bereg, O. Daescu, H. Kaplan, S. Ntafos, M. Sharir, B.H. Zhu, "Guarding a Terrain by Two Watchtowers", Algorithmica, p. 352, vol. 58, 2010.
  • P.K. Agarwal, S. Har-Peled, M. Sharir, and Y. Wang, "Hausdorff distance under translation for points and balls", ACM Transactions on Algorithms, 6, 2010.
  • P. K. Agarwal, J. Phillips, and B. Sadri, "Lipschitz unimodel and isotonic regression on paths and trees", Ninth Annual Latin American Theoretical Informatics Symposium, 2010.
  • P. K. Agarwal, S. Sankararaman, A. Efrat, and S. Ramasubramanian, "On channel-discontinuity-constraint routing in wireless networks", IEEE INFOCOM, 2010.
  • P. K. Agarwal, B. Aronov, M. van Kreveld, M. Löffler, and R. I. Silveira, "Computing similarity of piecewise-linear functions", Twenty-Sixth Annual Symposium on Computational Geometry, 2010.
  • P. K. Agarwal, "An improved algorithm for computing the volume of the union of cubes", Twenty-Sixth Annual Symposium on Computational Geometry, 2010.
  • P. K. Agarwal, R. Ben Avraham and M. Sharir, "The 2-center problem in three dimensions", Twenty-Sixth Annual Symposium on Computational Geometry, 2010.
  • P. K. Agarwal, J. Gao, L. Guibas, H. Kaplan, V. Koltun, N. Rubin, and M. Sharir, "Kinetic stable Delaunay graphs", Twenty-Sixth Annual Symposium on Computational Geometry, 2010.
  • P. K. Agarwal, T. Mølhave, L. Arge, and M. Revsbæk, "Scalable algorithms for large high-resolution terrain data", First International Conference on Computing for Geospatial Research and Applications, 2010.
  • P. K. Agarwal, J. Phillips and H. Yu, "Stability of epsilon-kernels", Eighteenth European Symposium on Algorithms, 2010.
  • P. K. Agarwal, A. Efrat, S. K. Ganjugunte, D. Hay, S. Sankararaman, G. Zussman, "A probabilistic model for network vulnerability under physical attack", Military Communications Conference, 2010.
  • A. Beutel, T. Mølhave, and P. K. Agarwal, "Natural neighbor interpolation based grid DEM construction using a GPU", Proc. Nineteenth ACM Symposium on Advances in Geographic Information Systems, pages 172-181, 2010.
  • Pankaj K. Agarwal and Ke Yi, "Indexing Uncertain Data", Proceedings of the 2009 ACM Symposium on Principles of Database Systems (PODS '09), Providence, Rhode Island, USA, June 2009.
  • Pankaj K. Agarwal, Esther Ezra, and Shashidhar Gujgunte, "Efficient Sensor Placement for Surveillance Problems", Proceedings of the IEEE 2009 Conference on Distributed Computing in Sensor Systems (DCOSS '09), Marina Del Rey, California, USA, June 2009.
  • P. K. Agarwal, S. Har-Peled, and H. Yu, "Robust shape fitting via peeling and grating coresets", Discrete Comput Geome, p. 38, vol. 39, 2008.
  • P.K. Agarwal, R. Poreddy, K. Varadarajan, and H. Yu, "Practical methods for shape fitting and kinetic data structures using coresets", Algorithmica, 2008.
  • P. K. Agarwal, L. Arge, T. Mølhave, and B. Sadri, "I/O efficient algorithms for computing contour lines in a terrain", Proc. 24th Annu. ACM Sympos. Comput. Geom., College Park, Maryland, June 2008.
  • P.K. Agarwal, S. Har-Peled and H. Yu, "Embeddings of surfaces, curves, and moving points in Euclidean space", Proc. 23rd Annu. ACM Sympos. Comput. Geom. Gyeongju, South Korea, June 2007.
  • P.K. Agarwal and H. Yu, "A space-optimal data-stream algorithm for coresets in the plane", Proc. 23rd Annu. ACM Sympos. Comput. Geom. Gyeongju, South Korea, June 2007.
  • A. Danner, K. Yi, T. Molhave, P. K. Agarwal, L. Arge, H. Mitasova, "From elevation data to watershed hierarchies", Proc. ACM-GIS, Seattle, Washington, November 2007.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.