Supported by National Science Foundation Award IIS-0713498, III-COR: Scalable Publish/Subscribe: Unifying Data Processing and Dissemination (abstract)
Principle investigators: Jun Yang and Pankaj K. Agarwal, Duke University People
Alumni:
- Badrish Chandramouli (PhD, 2008; now at Microsoft Research)
- Jeff M. Phillips (PhD, 2009; now at University of Utah)
- Weiping Zhang (Undergraduate Student)
Introduction
The Digital Age has brought about unprecedented growth in the amount
of data being generated, the number of data consumers, and the diversity
of their interests and locations. Traditionally, users poll
sources for information, but for many applications, polling is hardly
scalable and may miss important events. The alternative offered by
publish/subscribe systems is to push notifications to
users with matching interests. This approach suits many applications,
ranging from personal, commercial, medical, to environmental, military,
and security. However, traditional publish/subscribe systems are
becoming inadequate for advanced applications, where users want to
receive information that has been filtered, joined, and summarized, and
only when certain conditions are met.
This project aims at building a next-generation
publish/subscribe system to face the new challenges. We are developing
an end-to-end solution consisting of techniques from subscription
processing and indexing to dissemination network design, which work
together to support efficient and powerful subscription functionalities,
allowing users to control precisely what they want and when they want it.
One main feature distinguishing our approach from previous work is joint consideration of subscription processing and notification dissemination.
Traditionally, these problems are considered separately by database and
networking communities. However, there exists a wide spectrum of
interesting alternatives for interfacing processing with dissemination.
We propose a promising novel approach called reforumulation
that allows complex, stateful subscriptions to be handled by simple,
stateless dissemination mechanisms, with a clean system design that is
easy to implement and scale. A cost-based optimizer, inspired by
database query optimization, chooses the best processing and
dissemination strategies jointly and dynamically.
Besides system building, this project tackles many new
algorithmic challenges, including, e.g., scalably processing a large
number of complex subscriptions; exploiting event and subscription
characteristics to combat worst-case complexity; balancing semantic
similarity and network proximity in dissemination network design; and
efficiently maintaining statistics for high-dimensional events and
subscriptions. Progress
In the first year of this project, we made progress on the following specific research problems: (1) ProSem
system development and demonstration; (2) scalable processing and
dissemination of select-join subscriptions; (3) dissemination network
design for wide-area publish/subscribe; (4) scalable processing and
dissemination of value-based notification conditions; (5)
input-sensitive scalable continuous join query processing; (6) querying
uncertain data; (7) maintaining data summaries. A detailed description
of our contributions can be found below in our 2007-2008 project report.
In the second year of this project, we continued to
make progress on (1), (3), (5), (6), and (7). In addition, we worked on
the following problems: (8) a generator for wide-area content-based
publish/subscribe workloads, which we are making public; (9) extending
our ideas and framework beyond relational data and queries to
information retrieval and extraction context; (10) data aggregation and
scheduling over a network. A detailed description of our contributions
can be found below in our 2008-2009 project report.
In the third year of this project, we wrapped up our
work on (8), and embarked on the following problems: (11) dissemination
network design for wide-area publish/subscribe; (12) scalable support
for range top-k subscriptions; (13) scalable support for top-k
preference subscriptions. A detailed description of our contributions
can be found below in our 2009-2010 project report.
In the fourth and final year of this project, we
wrapped up our work on (11), (12), (13), and further studied (14)
subscriptions that arise in the context of computational journalism;
(15) finding one-of-the-few patterns. A detailed description of these
efforts can be found below in our final project report.
We have made a series of contributions to the area of
publish/subscribe, its related areas of continuous query processing, as
well as application domains that require queries beyond traditional
database queries. In particular, we offered unique perspectives on the
integration of database processing and network dissemination, and on the
use of geometric frameworks for developing our solutions. Many
contributions have been published in premier conferences: ISAAC 2005,
DASFAA 2006, SIGMOD 2006, SoCG 2006, VLDB 2006, SoCG 2007, VLDB 2007, ACM-GIS 2007, SIGMOD 2008, SoCG 2008, VLDB 2008, SODA 2009, ICDE 2009, DCOSS 2009, SoCG 2009, PODS 2009, SIGMOD 2009, NetDB 2009, TODS (2009), SODA 2010, SoCG
2010, INFOCOM 2010, MILCOM 2010, ESA 2010, ACM-GIS 2010, CIDR 2011,
ICDT 2011, SODA 2011, INFOCOM 2011, ICDE 2011, ACM-GIS 2011, SODA 2012,
ICDE 2012. For detailed descriptions of these contributions, please
refer to our project reports and publications below.
- Project report (PDF) for academic year 2007-2008.
- Project report (PDF) for academic year 2008-2009.
- Project report (PDF) for academic year 2009-2010.
- Final project report (PDF) in 2011.
Publications
- Albert Yu, Pankaj K. Agarwal, and Jun Yang. "Processing and Notifying Range Top-k Subscriptions." In Proceedings of the 28th International Conference on Data Engineering (ICDE '12), Washington, DC, USA, April 2012.
Available for download: paper and technical report.
- Albert Yu, Pankaj K. Agarwal, and Jun Yang. "Subscriber Assignment for Wide-Area Content-Based Publish/Subscribe." In Proceedings of the 27th International Conference on Data Engineering (ICDE '11), Hannover, Germany, April 2011.
Available for download: paper and technical report.
- Sarah Cohen, Chengkai Li, Jun Yang, and Cong Yu. "Computational Journalism: A Call to Arms to Database Researchers." In Proceedings of the 5th Biennial Conference on Innovative Data Systems Research (CIDR '11), Asilomar, California, USA, January 2011. Acceptance rate: 52.2%.
Available for download: paper.
- Pankaj K. Agarwal, Junyi Xie, Jun Yang, and Hai Yu. "Input-Sensitive Scalable Continuous Join Query Processing." ACM Transactions on Database Systems (TODS),
34(3):1-41, August 2009. Results in this paper subsume those in the
paper titled "Scalable Continuous Query Processing by Tracking Hotspots"
in the 32nd International Conference on Very Large Data Bases (VLDB
'06).
Available for download: paper.
- Albert Yu, Pankaj K. Agarwal, and Jun Yang. "Generating Wide-Area Content-Based Publish/Subscribe Workloads." In Proceedings of the 5th International Workshop on Networking Meets Databases (NetDB '09), Big Sky, Montana, USA, October 2009.
Available for download: paper.
- Fei Chen, Byron Gao, AnHai Doan, Jun Yang, and Raghu
Ramakrishnan. "Optimizing Complex Extraction Programs over Evolving Text
Data." In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD '09), Providence, Rhode Island, USA, June 2009. Acceptance rate: 15.9%.
- Risi Thonangi, Hao He, AnHai Doan, Haixun Wang, and Jun Yang. "Weighted Proximity Best-Joins for Information Retrieval." In Proceedings of the 25th International Conference on Data Engineering (ICDE '09), Shanghai, China, March 2009. Acceptance rate: 16.8%.
Available for download: paper.
- Badrish Chandramouli and Jun Yang. "End-to-End Support for Joins in Large-Scale Publish/Subscribe Systems." In Proceedings of the 34th International Conference on Very Large Data Bases (VLDB '08), Auckland, New Zealand, August 2008. Acceptance rate: 16.5%.
Available for download: paper.
- Badrish Chandramouli, Jun Yang, Pankaj K. Agarwal, Albert Yu, and
Ying Zheng. "ProSem: Scalable Wide-Area Publish/Subscribe." In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08), Vancouver, Canada, June 2008. System demonstration description. Acceptance rate: 31.9%.
Available for download: paper.
- Badrish Chandramouli, Jeff M. Phillips, and Jun Yang.
"Value-Based Notification Conditions in Large-Scale Publish/Subscribe
Systems." In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB '07), Vienna, Austria, September 2007. Acceptance rate: 16.4%.
Available for download: paper.
- Pankaj K. Agarwal, Junyi Xie, Jun Yang, and Hai Yu. "Scalable Continuous Query Processing by Tracking Hotspots." In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB '06), Seoul, Korea, September 2006. Acceptance rate: 13.8%. Results in this paper are subsumed by those in the paper titled "Input-Sensitive Scalable Continuous Join Query Processing"
in ACM Transactions on Database Systems (TODS) 2009.
Available for download: paper and technical report.
- Badrish Chandramouli, Junyi Xie, and Jun Yang. "On the Database/Network Interface in Large-Scale Publish/Subscribe Systems." In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD '06), Chicago, Illinois, USA, June 2006. Acceptance rate: 13.0%.
Available for download: paper and technical report.
- Badrish Chandramouli, Jun Yang, and Amin Vahdat. "Distributed Network Querying with Bounded Approximate Caching." In Proceedings of the 11th International Conference on Database Systems for Advanced Applications (DASFAA '06), Singapore, April 2006. Acceptance rate: 24.5%.
Available for download: paper and technical report.
- Pankaj K. Agarwal, Junyi Xie, Jun Yang, and Hai Yu. "Monitoring Continuous Band-Join Queries over Dynamic Data." In Proceedings of the 16th Annual International Symposium on Algorithms and Computation (ISAAC '05), Sanya, Hainan, China, December 2005.
Available for download: paper. - P. K. Agarwal, T. Mølhave, B. Sadri, "I/O-efficient contour
queries on terrains", Proc. 22nd Annual ACM-SIAM Symposium on
Discrete Algorithms, pages 268-284, 2011.
- P. Afshani, P. K. Agarwal, L. Arge, K. D. Larsen, and
J. M. Phillips, "(Approximate) uncertain skylines," Fourteenth
International Conference on Database Theory, pages 186-196,
2011.
- P. K. Agarwal, R. Sharathkumar, "Streaming algorithms for extent
problems in high dimensions," Twenty-First Annual ACM-SIAM
Symposium on Discrete Algorithms, 2010.
- P.K. Agarwal, S. Bereg, O. Daescu, H. Kaplan, S. Ntafos,
M. Sharir, B.H. Zhu, "Guarding a Terrain by Two Watchtowers",
Algorithmica, p. 352, vol. 58, 2010.
- P.K. Agarwal, S. Har-Peled, M. Sharir, and Y. Wang, "Hausdorff
distance under translation for points and balls", ACM
Transactions on Algorithms, 6, 2010.
- P. K. Agarwal, J. Phillips, and B. Sadri, "Lipschitz unimodel
and isotonic regression on paths and trees", Ninth Annual Latin
American Theoretical Informatics Symposium, 2010.
- P. K. Agarwal, S. Sankararaman, A. Efrat, and
S. Ramasubramanian, "On channel-discontinuity-constraint routing
in wireless networks", IEEE INFOCOM, 2010.
- P. K. Agarwal, B. Aronov, M. van Kreveld, M. Löffler, and
R. I. Silveira, "Computing similarity of piecewise-linear
functions", Twenty-Sixth Annual Symposium on Computational
Geometry, 2010.
- P. K. Agarwal, "An improved algorithm for computing the volume
of the union of cubes", Twenty-Sixth Annual Symposium on
Computational Geometry, 2010.
- P. K. Agarwal, R. Ben Avraham and M. Sharir, "The 2-center
problem in three dimensions", Twenty-Sixth Annual Symposium on
Computational Geometry, 2010.
- P. K. Agarwal, J. Gao, L. Guibas, H. Kaplan, V. Koltun,
N. Rubin, and M. Sharir, "Kinetic stable Delaunay
graphs", Twenty-Sixth Annual Symposium on Computational
Geometry, 2010.
- P. K. Agarwal, T. Mølhave, L. Arge, and M. Revsbæk, "Scalable
algorithms for large high-resolution terrain data", First
International Conference on Computing for Geospatial Research and
Applications, 2010.
- P. K. Agarwal, J. Phillips and H. Yu, "Stability of
epsilon-kernels", Eighteenth European Symposium on
Algorithms, 2010.
- P. K. Agarwal, A. Efrat, S. K. Ganjugunte, D. Hay,
S. Sankararaman, G. Zussman, "A probabilistic model for network
vulnerability under physical attack", Military Communications
Conference, 2010.
- A. Beutel, T. Mølhave, and P. K. Agarwal, "Natural neighbor
interpolation based grid DEM construction using a
GPU", Proc. Nineteenth ACM Symposium on Advances in Geographic
Information Systems, pages 172-181, 2010.
- Pankaj K. Agarwal and Ke Yi, "Indexing Uncertain
Data", Proceedings of the 2009 ACM Symposium on Principles of
Database Systems (PODS '09), Providence, Rhode Island, USA,
June 2009.
- Pankaj K. Agarwal, Esther Ezra, and Shashidhar Gujgunte,
"Efficient Sensor Placement for Surveillance
Problems", Proceedings of the IEEE 2009 Conference on
Distributed Computing in Sensor Systems (DCOSS '09), Marina
Del Rey, California, USA, June 2009.
- P. K. Agarwal, S. Har-Peled, and H. Yu, "Robust shape fitting
via peeling and grating coresets", Discrete Comput Geome,
p. 38, vol. 39, 2008.
- P.K. Agarwal, R. Poreddy, K. Varadarajan, and H. Yu, "Practical
methods for shape fitting and kinetic data structures using
coresets", Algorithmica, 2008.
- P. K. Agarwal, L. Arge, T. Mølhave, and B. Sadri, "I/O efficient
algorithms for computing contour lines in a
terrain", Proc. 24th Annu. ACM Sympos. Comput. Geom.,
College Park, Maryland, June 2008.
- P.K. Agarwal, S. Har-Peled and H. Yu, "Embeddings of surfaces,
curves, and moving points in Euclidean space", Proc. 23rd
Annu. ACM Sympos. Comput. Geom. Gyeongju, South Korea, June
2007.
- P.K. Agarwal and H. Yu, "A space-optimal data-stream algorithm
for coresets in the plane", Proc. 23rd Annu. ACM
Sympos. Comput. Geom. Gyeongju, South Korea, June 2007.
- A. Danner, K. Yi, T. Molhave, P. K. Agarwal, L. Arge,
H. Mitasova, "From elevation data to watershed
hierarchies", Proc. ACM-GIS, Seattle, Washington, November
2007.
Any opinions, findings, and conclusions or
recommendations expressed in this material are those of the author(s)
and do not necessarily reflect the views of the National Science
Foundation.
|