I've been a fan of this series! The work by this team, as well as kuzudb (acquired by apple) and relational.ai, have similar vibes.
One area that has been especially interesting to me is identifying cases where new kinds of vector-friendly join operators are helpful . We've been doing a very different kind of oss gpu graph query language & engine (gfql), where we're solving how to turn declarative cypher property graph queries on big parquets / sql db results / etc -> query plans over scalable cpu/gpu dataframe operations that trounce neo4j etc at a fraction of the time & cost and without needing a DB, and these join algorithm results carry over quite enticingly despite not being datalog.
Also surfacing this comment I noticed which points out some pretty big caveats with the paper:
> As a heavy Datalog user, here's my opinion of the paper.
> Generally, my objection to this work would be that almost all "Datalog" programs they evaluate (7 programs in total) are either just SQL (i.e., they only have joins, no recursion) or they are transitive closure (i.e., they only have linear recursion). This is not Datalog. This is provably (one of the few things we can prove in complexity theory) a lower complexity class than Datalog. So, there is very strong reason to believe that these results do not translate to full Datalog. Transitive closure can be sped up arbitrarily much, compared to more complex recursion.
> The authors only evaluate one program (out of the 7) that has complex recursion. That one is "context-sensitive program analysis" (CSPA). However:
> a) the souffle version is not optimized for complex recursion, I strongly suspect it can be sped up by 2 orders of magnitude, so I really doubt the GPU speedup for this one
> b) despite the name, the analysis is not "context-sensitive", but let's just say this is an oversight
> c) it's still just a 10-line program, hardly representative of full Datalog.
> Methodologically, I cannot reproduce anything from this paper, at least not without talking to the authors directly. The artifact link in the paper (https://file.io/YZE8MKx12iqX) is broken already. In the repo, most of the large inputs are replaced with git lfs links and the github repo doesn't seem to serve git lfs, e.g., https://github.com/harp-lab/gdlog/blob/main/data/cspa/httpd/...
Modern Datalog engines (e.g., LogicBlox, Soufflé, ddlog) enable their users to write declarative queries which compute recursive deductions over extensional facts, leaving high-performance operationalization (query planning, semi-naïve evaluation, and parallelization) to the engine. Such engines form the backbone of modern high-throughput applications in static analysis, network monitoring, and social-media mining. In this paper, we present a methodology for implementing a modern in-memory Datalog engine on data center GPUs, allowing us to achieve significant (up to 45×) gains compared to Soufflé (a modern CPU-based engine) on context-sensitive points-to analysis of httpd. We present GPUlog, a Datalog engine backend that implements iterated relational algebra kernels over a novel range-indexed data structure we call the hash-indexed sorted array (HISA). HISA combines the algorithmic benefits of incremental range-indexed relations with the raw computation throughput of operations over dense data structures. Our experiments show that GPUlog is significantly faster than CPU-based Datalog engines, while achieving favorable memory footprint compared to contemporary GPU-based joins.
This summary was generated using automated tools and was not authored or reviewed by the article's author(s). It is provided to support discovery, help readers assess relevance, and assist readers from adjacent research areas in understanding the work. It is intended to complement the author-supplied abstract, which remains the primary summary of the paper. The full article remains the authoritative version of record. Click here to learn more.
Click here to comment on the accuracy, clarity, and usefulness of this summary. Doing so will help inform refinements and future regenerated versions.
To view this AI-generated plain language summary, you must have Premium access.
You can view the full content in the following formats:
[1]
Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of databases, volume 8. Addison-Wesley Reading, 1995.
[2]
AMD. Hip documentation, 2024.
[3]
Argonne Leadership Computing Facility. Aurora supercomputer, 2024.
[4]
Muhammad A Awad, Saman Ashkiani, Rob Johnson, Martín Farach-Colton, and John D Owens. Engineering a high-performance gpu b-tree. In Proceedings of the 24th symposium on principles and practice of parallel programming, pages 145--157, 2019.
[5]
George Balatsouras and Yannis Smaragdakis. Structure-sensitive points-to analysis for c and c++. In Static Analysis: 23rd International Symposium, SAS 2016, Edinburgh, UK, September 8--10, 2016, Proceedings 23, pages 84--104. Springer, 2016.
[6]
Martin Bravenboer and Yannis Smaragdakis. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, pages 243--262, 2009.
[7]
Martin Bravenboer and Yannis Smaragdakis. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, pages 243--262, 2009.
[8]
Mihai Budiu, Tej Chajed, Frank McSherry, Leonid Ryzhyk, and Val Tannen. Dbsp: Automatic incremental view maintenance for rich query languages. Proc. VLDB Endow., 16(7):1601--1614, mar 2023.
[9]
Timothy A. Davis and Yifan Hu. The university of florida sparse matrix collection. ACM Trans. Math. Softw., 38(1), dec 2011.
[10]
Ke Fan, Thomas Gilray, Valerio Pascucci, Xuan Huang, Kristopher Micinski, and Sidharth Kumar. Optimizing the bruck algorithm for non-uniform all-to-all communication. In Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing, pages 172--184, 2022.
[11]
Ke Fan, Steve Petruzza, Thomas Gilray, and Sidharth Kumar. Configurable algorithms for all-to-all collectives. In ISC High Performance 2024 Research Paper Proceedings (39th International Conference), pages 1--12. Prometeus GmbH, 2024.
[12]
Zhiwei Fan, Jianqiao Zhu, Zuyu Zhang, Aws Albarghouthi, Paraschos Koutris, and Jignesh M. Patel. Scaling-up in-memory datalog processing: Observations and techniques. Proc. VLDB Endow., 12(6):695--708, Feb 2019.
[13]
Antonio Flores-Montoya and Eric Schulte. Datalog disassembly. In 29th USENIX Security Symposium (USENIX Security 20), pages 1075--1092, 2020.
[14]
Thomas Gilray, Sidharth Kumar, and Kristopher Micinski. Compiling data-parallel datalog. In Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction, pages 23--35, 2021.
[15]
Oded Green. Hashgraph-scalable hash tables using a sparse graph data structure. ACM Trans. Parallel Comput., 8(2), jul 2021.
[16]
Oded Green, Robert McColl, and David A Bader. Gpu merge path: a gpu merging algorithm. In Proceedings of the 26th ACM international conference on Supercomputing, pages 331--340, 2012.
[17]
Todd J Green, Shan Shan Huang, Boon Thau Loo,Wenchao Zhou, et al. Datalog and recursive query processing. Foundations and Trends® in Databases, 5(2):105--195, 2013.
[18]
Jiaqi Gu, Yugo H Watanabe, William A Mazza, Alexander Shkapsky, Mohan Yang, Ling Ding, and Carlo Zaniolo. Rasql: Greater power and performance for big data analytics with recursive-aggregate-sql on spark. In Proceedings of the 2019 International Conference on Management of Data, pages 467--484, 2019.
[19]
Intel. oneAPI Threading Building Blocks (oneTBB). https://github.com/oneapi-src/oneTBB, 2024.
[20]
JEDEC. High Bandwidth Memory (HBM) DRAM. https://www.jedec.org/document\_search?search\_api\_views\_fulltext=jesd235, 2021.
[21]
Herbert Jordan, Bernhard Scholz, and Pavle Subotić. Soufflé: On synthesis of program analyzers. In Computer Aided Verification: 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part II 28, pages 422--430. Springer, 2016.
[22]
Herbert Jordan, Pavle Subotić, David Zhao, and Bernhard Scholz. Brie: A specialized trie for concurrent datalog. In Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores, pages 31--40, 2019.
[23]
Herbert Jordan, Pavle Subotić, David Zhao, and Bernhard Scholz. A specialized b-tree for concurrent datalog evaluation. In Proceedings of the 24th symposium on principles and practice of parallel programming, pages 327--339, 2019.
[24]
Sidharth Kumar and Thomas Gilray. Distributed relational algebra at scale. In International Conference on High Performance Computing, Data, and Analytics (HiPC). IEEE, volume 1, 2019.
[25]
Sidharth Kumar and Thomas Gilray. Load-balancing parallel relational algebra. In High Performance Computing: 35th International Conference, ISC High Performance 2020, Frankfurt/Main, Germany, June 22-25, 2020, Proceedings 35, pages 288--308. Springer, 2020.
[26]
Monica S Lam, John Whaley, V Benjamin Livshits, Michael C Martin, Dzintars Avots, Michael Carbin, and Christopher Unkel. Context-sensitive program analysis as database queries. In Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 1--12, 2005.
[27]
Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
[28]
Feifei Li, Dihan Cheng, Marios Hadjieleftheriou, George Kollios, and Shang-Hua Teng. On trip planning queries in spatial databases. In International symposium on spatial and temporal databases, pages 273--290. Springer, 2005.
[29]
Yuchen Li, Qiwei Zhu, Zheng Lyu, Zhongdong Huang, and Jianling Sun. Dycuckoo: dynamic hash tables on gpus. In 2021 IEEE 37th international conference on data engineering (ICDE), pages 744--755. IEEE, 2021.
[30]
David Maier, K Tuncay Tekle, Michael Kifer, and David S Warren. Datalog: concepts, history, and outlook. In Declarative Logic Programming: Theory, Systems, and Applications, pages 3--100. 2018.
[31]
Carlos Alberto Martínez-Angeles, Inês Dutra, Vítor Santos Costa, and Jorge Buenabad-Chávez. A datalog engine for gpus. In International Conference on Applications of Declarative Programming and Knowledge Management, pages 152--168. Springer, 2013.
[32]
Frank McSherry, Derek Gordon Murray, Rebecca Isaacs, and Michael Isard. Differential dataflow. In CIDR, 2013.
[33]
Raymond J Mooney. Inductive logic programming for natural language processing. In International conference on inductive logic programming, pages 1--22. Springer, 1996.
[34]
Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. Naiad: A timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, page 439--455, New York, NY, USA, 2013. Association for Computing Machinery.
[35]
Naeris Netterville, Ke Fan, Sidharth Kumar, and Thomas Gilray. A visual guide to mpi all-to-all. In 2022 IEEE 29th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW), pages 20--27. IEEE, 2022.
[36]
Nvida. Thrust: The C++ Parallel Algorithms Library. https://nvidia.github.io/cccl/thrust/, 2024.
[37]
NVIDIA. CUDA C Programming Guide: SIMT. https://docs.nvidia.com/cuda/cuda-c-programming-guide/#simt-architecture, 2024.
[38]
Oak Ridge National Laboratory. Aurora supercomputer, 2024.
[39]
RAPIDS. RMM: RAPIDS Memory Manager. https://github.com/rapidsai/rmm, 2024.
[40]
RAPIDS Development Team. cuDF: Gpu dataframe library. https://github.com/rapidsai/cudf, 2021.
[41]
Daniel Ritter and Till Westmann. Business network reconstruction using datalog. In Pablo Barceló and Reinhard Pichler, editors, Datalog in Academia and Industry, pages 148--152, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.
[42]
Arash Sahebolamri, Langston Barrett, Scott Moore, and Kristopher Micinski. Bring your own data structures to datalog. Proc. ACM Program. Lang., 7(OOPSLA2), oct 2023.
[43]
Bernhard Scholz, Herbert Jordan, Pavle Subotić, and Till Westmann. On fast large-scale program analysis in datalog. In Proceedings of the 25th International Conference on Compiler Construction, CC 2016, page 196--206, New York, NY, USA, 2016. ACM.
[44]
Jiwon Seo, Stephen Guo, and Monica S Lam. Socialite: Datalog extensions for efficient social network analysis. In 2013 IEEE 29th International Conference on Data Engineering (ICDE), pages 278--289. IEEE, 2013.
[45]
Alexander Shkapsky, Mohan Yang, Matteo Interlandi, Hsuan Chiu, Tyson Condie, and Carlo Zaniolo. Big data analytics with datalog queries on spark. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16, page 1135--1149, New York, NY, USA, 2016. Association for Computing Machinery.
[46]
Ahmedur Rahman Shovon. Public github repository of GPUJoin. https://github.com/harp-lab/usenixATC23, 2023.
[47]
Ahmedur Rahman Shovon, Landon Richard Dyken, Oded Green, Thomas Gilray, and Sidharth Kumar. Accelerating datalog applications with cudf. In 2022 IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms (IA3), pages 41--45. IEEE, 2022.
[48]
Ahmedur Rahman Shovon, Thomas Gilray, Kristopher Micinski, and Sidharth Kumar. Towards iterative relational algebra on the {GPU}. In 2023 USENIX Annual Technical Conference (USENIX ATC 23), pages 1009--1016, 2023.
[49]
Evgeny Skvortsov, Yilin Xia, and Bertram Ludäscher. Logica: Declarative data science for mere mortals. In EDBT, pages 842--845, 2024.
[50]
Pavle Subotić, Herbert Jordan, Lijun Chang, Alan Fekete, and Bernhard Scholz. Automatic index selection for large-scale datalog computation. Proc. VLDB Endow., 12(2):141--153, oct 2018.
[51]
Yihao Sun, Sidharth Kumar, Thomas Gilray, and Kristopher Micinski. Communication-avoiding recursive aggregation. In 2023 IEEE International Conference on Cluster Computing (CLUSTER), pages 197--208. IEEE, 2023.
[52]
Todd L Veldhuizen. Leapfrog triejoin: A simple, worst-case optimal join algorithm. In Proc. International Conference on Database Theory, 2014.
[53]
Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Harry Xu, and Ardalan Amiri Sani. Graspan: A single-machine disk-based graph system for interprocedural static analyses of large-scale systems code. Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017.
[54]
Yisu Remy Wang, Max Willsey, and Dan Suciu. Free join: Unifying worst-case optimal and traditional joins. Proc. ACM Manag. Data, 1(2), jun 2023.
[55]
Haicheng Wu, Gregory Diamos, Tim Sheard, Molham Aref, Sean Baxter, Michael Garland, and Sudhakar Yalamanchili. Red fox: An execution environment for relational query processing on gpus. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, pages 44--54, 2014.
[56]
Jiacheng Wu, Jin Wang, and Carlo Zaniolo. Optimizing parallel recursive datalog evaluation on multicore machines. In Proceedings of the 2022 International Conference on Management of Data, pages 1433--1446, 2022.
[57]
Hangdong Zhao, Shaleen Deep, Paraschos Koutris, Sudeepa Roy, and Val Tannen. Evaluating datalog over semirings: A grounding-based approach. Proceedings of the ACM on Management of Data, 2(2):1--26, 2024.