English [en], .pdf, 🚀/lgli/lgrs/nexusstc/scihub/zlib, 6.0MB, 📘 Book (non-fiction), nexusstc/Large-scale Graph Analysis: System, Algorithm and Optimization/92a5fb5b447b3221e20932d182a8b851.pdf
Large-scale Graph Analysis: System, Algorithm and Optimization (Big Data Management) 🔍
Springer Singapore, Imprint Springer, Big Data Management, Big Data Management, 1, 2020
Yingxia Shao, Bin Cui, Lei Chen 🔍
description
This book introduces readers to a workload-aware methodology for large-scale graph algorithm optimization in graph-computing systems, and proposes several optimization techniques that can enable these systems to handle advanced graph algorithms efficiently. More concretely, it proposes a workload-aware cost model to guide the development of high-performance algorithms. On the basis of the cost model, the book subsequently presents a system-level optimization resulting in a partition-aware graph-computing engine, PAGE. In addition, it presents three efficient and scalable advanced graph algorithms – the subgraph enumeration, cohesive subgraph detection, and graph extraction algorithms.
This book offers a valuable reference guide for junior researchers, covering the latest advances in large-scale graph analysis; and for senior researchers, sharing state-of-the-art solutions based on advanced graph algorithms. In addition, all readers will find a workload-aware methodology for designing efficient large-scale graph algorithms.
This book offers a valuable reference guide for junior researchers, covering the latest advances in large-scale graph analysis; and for senior researchers, sharing state-of-the-art solutions based on advanced graph algorithms. In addition, all readers will find a workload-aware methodology for designing efficient large-scale graph algorithms.
Alternative filename
lgrsnf/Large-scale Graph Analysis System, Algorithm and Optimization.pdf
Alternative filename
scihub/10.1007/978-981-15-3928-2.pdf
Alternative author
Shao, Yingxia, Cui, Bin, Chen, Lei
Alternative publisher
Springer Nature Singapore Pte Ltd Fka Springer Science + Business Media Singapore Pte Ltd
Alternative edition
Big data management, Singapore, 2020
Alternative edition
Springer Nature, Singapore, 2020
Alternative edition
1st ed. 2020, Singapore, 2020
Alternative edition
Singapore, Singapore
Alternative edition
Jul 02, 2020
metadata comments
lg2558488
metadata comments
{"container_title":"Big Data Management","edition":"1","isbns":["9789811539275","9789811539282","9811539278","9811539286"],"issns":["2522-0179","2522-0187"],"last_page":146,"publisher":"Springer","series":"Big Data Management"}
metadata comments
Source title: Large-scale Graph Analysis: System, Algorithm and Optimization (Big Data Management)
Alternative description
Preface
Acknowledgements
Contents
1 Introduction
1.1 Background
1.2 Graph Analysis Tasks
1.2.1 Subgraph Matching and Enumeration
1.2.2 Graph Extraction
1.2.3 Cohesive Subgraph Detection
1.3 The Research Issues
1.4 The Overview of the Book
References
2 Graph Computing Systems for Large-Scale Graph Analysis
2.1 Distributed Graph Computing Systems
2.2 Vertex Programming Abstraction
2.3 Gather–Apply–Scatter Programming Abstraction
2.4 Workload-Aware Cost Model for Optimizations
2.4.1 Cost Model Analysis
2.4.1.1 Cost Model of BSP-Based Systems
2.4.1.2 Cost Model in the Context of Graph
2.4.1.3 The Problem of Workload Balance
2.4.2 The Principles of Optimizations
2.4.2.1 Optimizations with Respect to the Workload Source
2.4.2.2 Optimizations with Respect to Workload Distribution
References
3 Partition-Aware Graph Computing System
3.1 Introduction
3.2 Message Processing in Pregel-Like Systems
3.2.1 The Influence of Graph Algorithms
3.2.2 The Total Cost of a Worker
3.3 PAGE: A Partition-Aware System for Large-Scale Graph Analysis
3.3.1 Graph Algorithm Execution in PAGE
3.3.2 Dual Concurrent Message Processor
3.3.3 Partition-Aware Concurrency Control Model
3.3.3.1 Mathematical Formulation of DCCM
3.3.3.2 Adaptiveness on Various Graph Partitions
3.3.3.3 Implementation of DCCM
3.4 Experiments
3.4.1 Experimental Setup
3.4.2 Evaluation on the Concurrency Control Model
3.4.2.1 Concurrency Determined by DCCM
3.4.2.2 Results by Manual Tuning
3.4.2.3 Adaptivity of DCCM
3.4.3 Comparison with Other Pregel-Like Systems
3.4.3.1 Advantage of PAGE
3.4.3.2 Performance on Various Graph Algorithms
3.4.3.3 Performance by Varying Numbers of Partitions
3.5 Summary
References
4 Efficient Parallel Subgraph Enumeration
4.1 Introduction
4.2 Problem Definition and Backgrounds
4.2.1 Problem and Notations
4.2.2 Partial Subgraph Instance
4.2.3 Automorphism of a Pattern Graph
4.3 Parallel Subgraph Enumeration Framework
4.3.1 Independence Property of Enumeration
4.3.2 PSgL Framework
4.3.3 Partial Subgraph Instance Expansion
4.3.4 Cost Analysis
4.4 The Optimizations of the Framework
4.4.1 Workload Distribution Strategy
4.4.1.1 Workload-Aware Distribution Strategy
4.4.1.2 Naive Distribution Strategies
4.4.2 Partial Subgraph Instance Reduction
4.4.2.1 Automorphism Breaking of the Pattern Graph
4.4.2.2 Initial Pattern Vertex Selection
4.4.2.3 Pruning Invalid Partial Subgraph Instance
4.4.3 Implementation Details
4.5 Experiments
4.5.1 Experimental Setup
4.5.2 Effects of Workload Distribution Strategies
4.5.3 Effects of Partial Subgraph Instance Reduction
4.5.3.1 Importance of the Initial Pattern Vertex
4.5.3.2 Efficiency of the Light-Weight Edge Index
4.5.4 Performance on Various Pattern Graphs
4.5.5 Scalability
4.5.5.1 Scalability on Large Graphs
4.5.5.2 Scalability on the Number of Workers
4.6 Summary
References
5 Efficient Parallel Graph Extraction
5.1 Introduction
5.2 Graph Extraction Problem
5.2.1 Preliminaries
5.2.2 Definition of Homogeneous Graph Extraction
5.2.3 The Characteristics of Graph Extraction
5.2.3.1 Path Enumeration and Pair-Wise Aggregation
5.2.3.2 The Hardness of Graph Extraction
5.3 Parallel Graph Extraction Framework
5.3.1 Primitive Pattern and Path Concatenation Plan
5.3.2 PCP Evaluation Algorithm
5.3.3 Cost Analysis
5.4 Aggregation in Homogeneous Graph Extraction
5.4.1 Distributive, Algebraic, and Holistic Aggregation
5.4.2 Optimization with Partial Aggregation
5.5 Path Concatenation Plan Selection
5.5.1 The Path Size Estimation for PCP
5.5.2 PCP Selection
5.5.2.1 Iteration Optimized Strategy
5.5.2.2 Path Optimized Strategy
5.5.2.3 Hybrid Strategy
5.6 Experiments
5.6.1 Experiment Settings
5.6.2 Effectiveness of Partial Aggregation Technique
5.6.3 Comparison of Different Plans
5.6.4 Comparison of Standalone Solution
5.6.5 Comparison of RPQ-Based Solution
5.6.6 Scalability
5.7 Summary
References
6 Efficient Parallel Cohesive Subgraph Detection
6.1 Introduction
6.2 Problem Definition
6.2.1 Preliminaries
6.2.2 Fundamental Operation
6.2.3 Parallel Computing Context
6.3 The Existing Parallel Algorithms
6.3.1 Improved L. Quick's Algorithm
6.3.2 Limitations Discussion
6.4 The Framework of PeTa
6.4.1 Subgraph-Oriented Model
6.4.2 Triangle Complete Subgraph
6.4.3 The Local Subgraph Algorithm in PeTa
6.4.4 Complexity Analysis of PeTa
6.4.4.1 Space Cost
6.4.4.2 Computation Complexity
6.4.4.3 Communication Cost
6.4.4.4 Number of Iterations
6.5 The Influence of Graph Partitions
6.5.1 Edge-support Law
6.5.2 Partition Influence on PeTa
6.5.3 Implementation Details
6.6 Experiments
6.6.1 Environment Setup
6.6.2 The Influence of Partition Schemes for PeTa
6.6.3 Performance Comparison
6.6.4 Scalability
6.7 Summary
References
7 Conclusions
References
Acknowledgements
Contents
1 Introduction
1.1 Background
1.2 Graph Analysis Tasks
1.2.1 Subgraph Matching and Enumeration
1.2.2 Graph Extraction
1.2.3 Cohesive Subgraph Detection
1.3 The Research Issues
1.4 The Overview of the Book
References
2 Graph Computing Systems for Large-Scale Graph Analysis
2.1 Distributed Graph Computing Systems
2.2 Vertex Programming Abstraction
2.3 Gather–Apply–Scatter Programming Abstraction
2.4 Workload-Aware Cost Model for Optimizations
2.4.1 Cost Model Analysis
2.4.1.1 Cost Model of BSP-Based Systems
2.4.1.2 Cost Model in the Context of Graph
2.4.1.3 The Problem of Workload Balance
2.4.2 The Principles of Optimizations
2.4.2.1 Optimizations with Respect to the Workload Source
2.4.2.2 Optimizations with Respect to Workload Distribution
References
3 Partition-Aware Graph Computing System
3.1 Introduction
3.2 Message Processing in Pregel-Like Systems
3.2.1 The Influence of Graph Algorithms
3.2.2 The Total Cost of a Worker
3.3 PAGE: A Partition-Aware System for Large-Scale Graph Analysis
3.3.1 Graph Algorithm Execution in PAGE
3.3.2 Dual Concurrent Message Processor
3.3.3 Partition-Aware Concurrency Control Model
3.3.3.1 Mathematical Formulation of DCCM
3.3.3.2 Adaptiveness on Various Graph Partitions
3.3.3.3 Implementation of DCCM
3.4 Experiments
3.4.1 Experimental Setup
3.4.2 Evaluation on the Concurrency Control Model
3.4.2.1 Concurrency Determined by DCCM
3.4.2.2 Results by Manual Tuning
3.4.2.3 Adaptivity of DCCM
3.4.3 Comparison with Other Pregel-Like Systems
3.4.3.1 Advantage of PAGE
3.4.3.2 Performance on Various Graph Algorithms
3.4.3.3 Performance by Varying Numbers of Partitions
3.5 Summary
References
4 Efficient Parallel Subgraph Enumeration
4.1 Introduction
4.2 Problem Definition and Backgrounds
4.2.1 Problem and Notations
4.2.2 Partial Subgraph Instance
4.2.3 Automorphism of a Pattern Graph
4.3 Parallel Subgraph Enumeration Framework
4.3.1 Independence Property of Enumeration
4.3.2 PSgL Framework
4.3.3 Partial Subgraph Instance Expansion
4.3.4 Cost Analysis
4.4 The Optimizations of the Framework
4.4.1 Workload Distribution Strategy
4.4.1.1 Workload-Aware Distribution Strategy
4.4.1.2 Naive Distribution Strategies
4.4.2 Partial Subgraph Instance Reduction
4.4.2.1 Automorphism Breaking of the Pattern Graph
4.4.2.2 Initial Pattern Vertex Selection
4.4.2.3 Pruning Invalid Partial Subgraph Instance
4.4.3 Implementation Details
4.5 Experiments
4.5.1 Experimental Setup
4.5.2 Effects of Workload Distribution Strategies
4.5.3 Effects of Partial Subgraph Instance Reduction
4.5.3.1 Importance of the Initial Pattern Vertex
4.5.3.2 Efficiency of the Light-Weight Edge Index
4.5.4 Performance on Various Pattern Graphs
4.5.5 Scalability
4.5.5.1 Scalability on Large Graphs
4.5.5.2 Scalability on the Number of Workers
4.6 Summary
References
5 Efficient Parallel Graph Extraction
5.1 Introduction
5.2 Graph Extraction Problem
5.2.1 Preliminaries
5.2.2 Definition of Homogeneous Graph Extraction
5.2.3 The Characteristics of Graph Extraction
5.2.3.1 Path Enumeration and Pair-Wise Aggregation
5.2.3.2 The Hardness of Graph Extraction
5.3 Parallel Graph Extraction Framework
5.3.1 Primitive Pattern and Path Concatenation Plan
5.3.2 PCP Evaluation Algorithm
5.3.3 Cost Analysis
5.4 Aggregation in Homogeneous Graph Extraction
5.4.1 Distributive, Algebraic, and Holistic Aggregation
5.4.2 Optimization with Partial Aggregation
5.5 Path Concatenation Plan Selection
5.5.1 The Path Size Estimation for PCP
5.5.2 PCP Selection
5.5.2.1 Iteration Optimized Strategy
5.5.2.2 Path Optimized Strategy
5.5.2.3 Hybrid Strategy
5.6 Experiments
5.6.1 Experiment Settings
5.6.2 Effectiveness of Partial Aggregation Technique
5.6.3 Comparison of Different Plans
5.6.4 Comparison of Standalone Solution
5.6.5 Comparison of RPQ-Based Solution
5.6.6 Scalability
5.7 Summary
References
6 Efficient Parallel Cohesive Subgraph Detection
6.1 Introduction
6.2 Problem Definition
6.2.1 Preliminaries
6.2.2 Fundamental Operation
6.2.3 Parallel Computing Context
6.3 The Existing Parallel Algorithms
6.3.1 Improved L. Quick's Algorithm
6.3.2 Limitations Discussion
6.4 The Framework of PeTa
6.4.1 Subgraph-Oriented Model
6.4.2 Triangle Complete Subgraph
6.4.3 The Local Subgraph Algorithm in PeTa
6.4.4 Complexity Analysis of PeTa
6.4.4.1 Space Cost
6.4.4.2 Computation Complexity
6.4.4.3 Communication Cost
6.4.4.4 Number of Iterations
6.5 The Influence of Graph Partitions
6.5.1 Edge-support Law
6.5.2 Partition Influence on PeTa
6.5.3 Implementation Details
6.6 Experiments
6.6.1 Environment Setup
6.6.2 The Influence of Partition Schemes for PeTa
6.6.3 Performance Comparison
6.6.4 Scalability
6.7 Summary
References
7 Conclusions
References
date open sourced
2020-07-01
🚀 Fast downloads
Become a member to support the long-term preservation of books, papers, and more. To show our gratitude for your support, you get fast downloads. ❤️
- Option #1: Fast Partner Server #1 (recommended) (open in viewer) (no redirect) (short filename) (no browser verification or waitlists)
- Option #2: Fast Partner Server #2 (open in viewer) (no redirect) (short filename)
- Option #3: Fast Partner Server #3 (open in viewer) (no redirect) (short filename)
- Option #4: Fast Partner Server #4 (open in viewer) (no redirect) (short filename)
- Option #5: Fast Partner Server #5 (open in viewer) (no redirect) (short filename)
🐢 Slow downloads
From trusted partners. More information in the FAQ. (might require browser verification — unlimited downloads!)
- Option #1: Slow Partner Server #1 (slightly faster but with waitlist)
- Option #2: Slow Partner Server #2 (slightly faster but with waitlist)
- Option #3: Slow Partner Server #3 (no waitlist, but can be very slow)
- After downloading: Open in our viewer
External downloads
All download options have the same file, and should be safe to use. That said, always be cautious when downloading files from the internet, especially from sites external to Anna’s Archive. For example, be sure to keep your devices updated.
-
For large files, we recommend using a download manager to prevent interruptions.
Recommended download managers: JDownloader -
You will need an ebook or PDF reader to open the file, depending on the file format.
Recommended ebook readers: Anna’s Archive online viewer, ReadEra, and Calibre -
Use online tools to convert between formats.
Recommended conversion tools: CloudConvert -
You can send both PDF and EPUB files to your Kindle or Kobo eReader.
Recommended tools: Amazon‘s “Send to Kindle” and djazz‘s “Send to Kobo/Kindle” -
Support authors and libraries
✍️ If you like this and can afford it, consider buying the original, or supporting the authors directly.
📚 If this is available at your local library, consider borrowing it for free there.
Total downloads:
A “file MD5” is a hash that gets computed from the file contents, and is reasonably unique based on that content. All shadow libraries that we have indexed on here primarily use MD5s to identify files.
A file might appear in multiple shadow libraries. For information about the various datasets that we have compiled, see the Datasets page.
For information about this particular file, check out its JSON file. Live/debug JSON version. Live/debug page.