Cloud Computing

From PgroupW

Jump to: navigation, search

Research

GoFFish — Graph-Oriented Framework for Foresight and Insight using Scalable Heuristics

Sensors and online instruments performing high fidelity observations are contributing in a large measure to the growing big data analytics challenge. These datasets are unique in that they represent events, observations and activities that are related to each other while being recorded by independent data streams. Existing data processing frameworks such as MapReduce that operate on file or row based data do not lend themselves to scalable analytics over such an interconnected web of stream-based data.

Goffish.png

We propose GoFFish, a scalable graph-oriented analytics framework that is well suited for trawling over reservoirs of inter-connected data that are fed by event data streams. Our framework will help design optimized graph algorithms that leverage the specialized graph oriented data store, GoFs, and are based on the proposed graph programming abstraction, Gopher, that can be used by analysts to intuitively and rapidly compose graph and event analytical models. The composed application will enhance data parallel analytics at scales far superior to traditional MapReduce models using a novel distributed data partitioning approach based on edge distance heuristics. This will allow unprecedented insight from the reservoirs of stream data for commanders to perform causal graph analysis and strategic planning. Further, we propose to close the loop between insight and foresight by coupling event patterns mined from historical stream reservoirs by graph analytics with realtime event streams from sensors. Such an online stream analytics engine will provide operational leaders with augmented situation awareness and advanced warning about impending conditions.

Floe — Adaptive Framework for Dynamic Applications

x

Traditional scientific workflows deal with static structures and processing data in batch mode. However, the emerging applications require continuous operation over dynamic data and changing application needs. This motivates the need for data flow programming frameworks that can adapt to changes to the application structure, data feeds and speeds, latency requirements with minimal interruptions to the flow of results. In addition, the advent of elastic platforms such as Clouds also required the execution model of these frameworks to adapt to dynamism in the infrastructure. Floe is an adaptive, data flow framework designed for such dynamic applications on Cloud platforms. Floe provides programming abstractions that support traditional data flow and stream processing paradigms, while allowing dynamic application recomposition, changes to streaming data sources at runtime and leveraging elastic Cloud platforms for optimizing resource usage.

The many advantages of Clouds are inhibited by their limitations for resilient computing, caused by the use of commodity hardware and multi-tenancy. We are investigating ways to prospectively plan the execution of Floe graphs that can then adapt to resilience exigencies at runtime while maximizing expected net utility, on unreliable Clouds. These goals will be achieved through a combination of tunable application specification, distributed resource optimization and continuous adaptive recovery.

Floe2 is readily available on GitHub: [1]

Pregel.NET — Parallel Graph Processing using Cloud Platforms

The need for analyzing large scale graphs in parallel is increasing with the growth of social networks and other scale free networks. The Pillcrow project is exploring graph programming abstractions that are well suited for scaling on Cloud platforms. In our initial work, we are investigating the Betweenness centrality algorithm, popular for finding key vertices in many applications such as social networks, bioinformatics, and distribution networks. Several parallel formulations suitable to supercomputers and clusters exist for this. We have studied betweenness centrality in the context of Microsoft Windows Azure and demonstrate scalable parallel performance. Key issues related to a cloud-based implementation include mitigating penalties associated with VM failures as well as the impact of communication overheads in the cloud. We use a combination of empirical and analytical evaluation using both synthetic small-world and real-world social interaction graphs. Further, we are comparing such decoupled programming abstractions with loosely coupled ones like MapReduce and Pregel to evaluate their suitability.

Cryptonite — Data Security and Privacy on Clouds (Dormant)

As Cloud platforms gain increasing traction among scientific and business communities for outsourcing storage, comput- ing and content delivery, there is also growing concern about the associated loss of control over private data hosted in the Cloud. In this paper, we present an architecture for a secure data repository service designed on top of a public Cloud infrastructure to support multi-disciplinary scientific communities dealing with personal and human subject data, motivated by the smart power grid domain. Our repository model allows users to securely store and share their data in the Cloud without revealing the plain text to unauthorized users, the Cloud storage provider or the repository itself. The system masks file names, user permissions and access patterns while providing auditing capabilities with provable data updates.

OpenPlanet — Scalable Machine Learning using MapReduce (Dormant)

OpenPlanet.png

The projected increase in the use of smart meters and data collection in a Smart Grid environment means that all applications, including machine learning for demand forecasting, will be data intensive and require the use of scalable and reliable platforms for operations. For example, the Los Angeles Power Grid with over 1.4 million customers will collect and analyze terabytes of smart meter data. This data will further grow as the frequency of data collection is increased and newer information sources are added.

Power consumption forecasting is one of the analysis that is performed by using machine learning models, such as regression tree, and is compute and data intensive. This problem becomes intractable on a single machine for even 25,000 customers, taking several days to train the model. Our work on OpenPlanet is building scalable machine learning algorithms using the Hadoop MapReduce framework. Specifically, we study the tuning and performance issues of mapping this problem to a Hadoop cluster and investigate incremental learning models that scale over time.

Recent Publications

  1. Efficient Extraction of High Centrality Vertices in Distributed Graphs, Alok Kumbhare, Marc Frincu, Cauligi Raghavendra, and Viktor Prasanna, 18th IEEE High Performance Extreme Computing Conference (HPEC), 2014 (candidate for the best paper award)
  2. Fast Parallel Algorithm for Unfolding of Communities in Large Graphs, Charith Wickramaarachchi, Marc Frincu, Patrick Small and Viktor Prasanna, 18th IEEE High Performance Extreme Computing Conference (HPEC), 2014
  3. GoFFish : A Sub-Graph Centric Framework for Large-Scale Graph Analytics, Yogesh Simmhan, Alok Kumbhare, Charith Wickramaarachchi, Soonil Nagarkar, Santosh Ravi, Cauligi Raghavendra and Viktor Prasanna, EuroPar Conference, 2014
  4. Cost-efficient and Resilient Job Life-cycle Management on Hybrid Clouds, Hsuan-Yi Chu, Yogesh Simmhan, 29th IEEE International Parallel and Distributed Processing Symposium (ISPDC), 2014
  5. PLAStiCC: Predictive Look-Ahead Scheduling for Continuous dataflows on Clouds, Alok Gautam Kumbhare, Yogesh Simmhan and Viktor K. Prasanna, 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2014

Technical Documents and Software

Presentations

Technical Documents

Software

Schedules

Group Members

  • Yogesh Simmhan, Adjunct Research Asst Professor, Electrical Engineering
  • Marc Frincu, Postdoctoral Research Associate, Electrical Engineering
  • Alok Kumbhare, Ph.D. Candidate, Computer Science
  • Mark Redekopp, Ph.D. Candidate and Teaching Faculty, Electrical Engineering
  • Charith Wickramaarachchi, Ph.D. Student, Computer Science

Research Interns

Alumini

  • Naga Raju Bhanoori, M.S. Student, Computer Science
  • Santosh Sathyavijayanagaram Ravi, M.S. Student, Computer Science
  • Soonil Nagarkar, M.S. Student, Computer Science
  • Hsuan-Yi Chu, Ph.D. Student, Computer Science
  • Wei Yin, M.S. Student, Computer Science
  • Sreedhar Natarajan, M.S. Student, Computer Science
  • Baohua Cao, M.S. Student, Computer Science
  • Michail Giakkoupis, M.S. Student, Computer Science

Publications

  1. Efficient Extraction of High Centrality Vertices in Distributed Graphs, Alok Kumbhare, Marc Frincu, Cauligi Raghavendra, and Viktor Prasanna, 18th IEEE High Performance Extreme Computing Conference (HPEC), 2014 (candidate for the best paper award)
  2. Fast Parallel Algorithm for Unfolding of Communities in Large Graphs, Charith Wickramaarachchi, Marc Frincu, Patrick Small and Viktor Prasanna, 18th IEEE High Performance Extreme Computing Conference (HPEC), 2014
  3. GoFFish : A Sub-Graph Centric Framework for Large-Scale Graph Analytics, Yogesh Simmhan, Alok Kumbhare, Charith Wickramaarachchi, Soonil Nagarkar, Santosh Ravi, Cauligi Raghavendra and Viktor Prasanna, EuroPar Conference, 2014
  4. Cost-efficient and Resilient Job Life-cycle Management on Hybrid Clouds, Hsuan-Yi Chu, Yogesh Simmhan, 29th IEEE International Parallel and Distributed Processing Symposium (ISPDC), 2014
  5. PLAStiCC: Predictive Look-Ahead Scheduling for Continuous dataflows on Clouds, Alok Gautam Kumbhare, Yogesh Simmhan and Viktor K. Prasanna, 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2014
  6. Exploiting Application Dynamism and Cloud Elasticity for Continuous Dataflows, Alok Kumbhare, Yogesh Simmhan, and Viktor K. Prasanna, International Conference for High Performance Computing, Networking, Storage and Analysis (SC 13), 2013
  7. Cryptonite: A Secure and Performant Data Repository on Public Clouds, Alok Kumbhare, Yogesh Simmhan and Viktor Prasanna , International Cloud Computing Conference (CLOUD) , 2012
  8. Scalable, Secure Analysis of Social Sciences Data on the Azure Platform, Yogesh Simmhan, Litao Den, Alok Kumbhare, Mark Redekopp and Viktor Prasanna , Cloud Futures Workshop , 2012
  9. Adaptive Energy Forecasting and Information Diffusion for Smart Power Grids, Yogesh Simmhan, Vaibhav Agarwal, Saima Aman, Alok Kumbhare, Sreedhar Natarajan, Nikhil Rajguru, Ian Robinson, Samuel Stevens, Wei Yin, Qunzhi Zhou and Viktor Prasanna , IEEE International Scalable Computing Challenge (SCALE) , 2012 (first prize winner)
  10. Scalable Regression Tree Learning on Hadoop using OpenPlanet, Wei Yin, Yogesh Simmhan and Viktor Prasanna , International Workshop on MapReduce and its Applications (MAPREDUCE) , 2012
  11. Performance Analysis of Vertex-centric Graph Algorithms on the Azure Cloud Platform, Mark Redekopp, Yogesh Simmhan and Viktor K. Prasanna, Workshop on Parallel Algorithms and Software for Analysis of Massive Graphs (ParGraph), 2011
  12. Designing a Secure Storage Repository for Sharing Scientific Datasets using Public Clouds, Alok Kumbhare, Yogesh Simmhan and Viktor Prasanna , International Workshop on Data Intensive Computing in the Clouds (DataCloud-SC11) , 2011
  13. An Analysis of Security and Privacy Issues in Smart Grid Software Architectures on Clouds, Yogesh Simmhan, Alok Kumbhare, Baohua Cao and Viktor K. Prasanna , International Cloud Computing Conference (CLOUD) , 2011 , IEEE. <doi>
  14. Adaptive rate stream processing for smart grid applications on clouds, Yogesh Simmhan, Baohua Cao, Michail Giakkoupis and Viktor K. Prasanna , International Workshop on Scientific Cloud Computing (ScienceCloud) , 2011 , pp. 33-38 , ACM. <doi>
  15. On Using Cloud Platforms in a Software Architecture for Smart Energy Grids, Yogesh Simmhan, Michail Giakkoupis, Baohua Cao and Viktor K. Prasanna , International Conference on Cloud Computing Technology and Science (CloudCom) , 2010 , IEEE. (Poster)
  16. Towards Reliable, Performant Workflows for Streaming-Applications on Cloud Platforms, Daniel Zinn, Quinn Hart, Timothy M. McPhillips, Bertram Lud{\"a}scher, Yogesh Simmhan, Michail Giakkoupis and Viktor K. Prasanna , International Symposium on Cluster, Cloud and Grid Computing (CCGRID) , 2011 , pp. 235-244 , IEEE. <doi>
  17. Social Web-Scale Provenance in the Cloud, Yogesh Simmhan and Karthik Gomadam , 6378 , International Provenance and Annotation Workshop (IPAW) , 2010 , pp. 298-300 , Springer Berlin / Heidelberg. <doi>
Personal tools