CHACE-KO: A Connected, Hybrid, Accommodating, Contained, and Evolving Knowledge-Ocean

Abstract

A knowledge ocean aims for problem solving with multi-model knowledge at all times and in all over the world, synergizing knowledge graphs with large language models, and performing bidirectional reasoning driven by both data and knowledge. We have constructed such a large knowledge ocean, entitled CHACE-KO (a Connected, Hybrid, Accommodating, Contained, and Evolving Knowledge-Ocean) that contains the largest knowledge graph in the world with 488 million entities and 2.24 billion relations (https://ko.zhonghuapu.com/EN). In this talk, we will present each of the CHACE dimensions of the CHACE-KO design, and illustrate how "big", "dynamic" and "sparkling" applications are implemented by these CHACE characteristics and the HAO intelligence that integrates human intelligence, artificial intelligence and organizational intelligence.

Biography

Xindong Wu is Director and Professor of the Key Laboratory of Knowledge Engineering with Big Data (the Ministry of Education of China), Hefei University of Technology, China. He is also a Senior Research Scientist at Zhejiang Lab, China. His research interests include big data analytics, data mining and knowledge engineering. He received his Bachelor's and Master's degrees in Computer Science from the Hefei University of Technology, China, and his Ph.D. degree in Artificial Intelligence from the University of Edinburgh, Britain. He is a Foreign Member of the Russian Academy of Engineering, and a Fellow of IEEE and the AAAS (American Association for the Advancement of Science).

Dr. Wu is the Steering Committee Chair of the IEEE International Conference on Data Mining (ICDM), and the Editor in-Chief of Knowledge and Information Systems (KAIS, by Springer). He was the Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering (TKDE) between 2005 and 2008 and Co-Editor-in-Chief of the ACM Transactions on Knowledge Discovery from Data Engineering between 2017 and 2020. He served as a program committee chair/co-chair for ICDM 2003 (the 3rd IEEE International Conference on Data Mining), KDD 2007 (the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining), CIKM 2010 (the 19th ACM Conference on Information and Knowledge Management), and ICBK 2017 (the 8th IEEE International Conference on Big Knowledge). One of his completed projects is Knowledge Engineering With Big Data (BigKE), which was a 54-month, 45-million RMB, 15-institution national grand project, as described in detail at https://ieeexplore.ieee.org/abstract/document/7948800.

Cardinality Estimation of Queries in Database Systems - Where are we now?

Abstract

To process a query in database systems, the query optimizer selects the most efficient plan among the possible execution plans of the query. Without query optimization, database systems would be highly inefficient. Since the cost of a plan is estimated with the result size of each operator in the plan, the accurate cardinality estimation of subqueries is essential to produce an optimal execution plan of a query. Thus, there have been extensive works using histograms, wavelet synopses and locality sensitive hashing techniques for cardinality estimation of queries. Since deep learning models can reflect the underlying patterns and correlations of data well, deep learning models are recently investigated and shown to outperform the traditional methods for cardinality estimation of queries. In my talk, I will present an overview of the traditional as well as deep learning methods developed for cardinality estimation of queries in database systems.

Biography

Kyuseok Shim is a Professor at Department of Electrical and Computer Engineering in Seoul National University, Korea. Before that, he was an Assistant Professor at Computer Science Department in KAIST (Korea), a member of technical staff at Bell Laboratories (Murray Hill) and a member of Quest Data Mining project at IBM Almaden Research Center. He is currently an Editor-In-Chief of the VLDB Journal and was previously an Associate Editor for the IEEE TKDE, VLDB as well as PVLDB journals. He also served as a Program Co-chair for PAKDD 2003, WWW 2014, ICDE 2015, APWeb 2016, BigComp 2019 and ICDM 2019 conferences and have been serving on Program Committees of the leading database as well as data mining conferences including SIGMOD, SIGKDD, ICDE, ICDM, EDBT, VLDB, WWW and CIKM. He became an ACM fellow and an IEEE fellow for the contributions to scalable data mining and query processing. He was previously a member of the VLDB Endowment Board of Trustees and is currently a steering committee member of PAKDD as well as DASFAA conferences. He served as the president of the Korean Information Scientist and Engineers (KIISE) in 2022 and became a member of National Academy of Engineering of Korea in 2023. He has been working in the area of data mining, machine learning, privacy preservation, query processing, query optimization, data warehousing, semi-structured data (XML), stream data and histograms.

Abstract

Large language models can achieve state-of-the-art performance on a range of natural language processing tasks, such as language translation, text classification, and question-answering. As LLMs can be used to capture semantic and contextual information, it is natural to explore how LLMs can be used for the task of entity linking, which is an important task to identify and match entities (such as people, organizations, or products) across different datasets. Entity matching as a database research topic is conducted among structured data (i.e., records in database tables). In this talk, we will present a review on entity matching research from the early work of using rule-based or statistical approaches, to recent approaches based on deep learning and pre-trained models with finetuning, and the current approaches using foundation models. We will identify the research trends and gaps in this new area, and present our work on using LLM for database entity matching research.

Biography

Professor Xiaofang Zhou is Otto Poon Professor of Engineering and Chair Professor of Computer Science and Engineering at The Hong Kong University of Science and Technology. Currently, he is Head of Department of Computer Science and Engineering and Co-Director of Big Data Institute. He is the founding director of HKUST-HKPC Joint Lab on Industrial AI and Robotics Research, HKUST-China Unicom Joint Lab on Smart Society, and JC STEM Lab on Data Science Foundations. He has been working in data science, spatiotemporal databases, data mining, data quality management, high-performance query processing, big data analytics, and machine learning, co-authored over 500 research papers. He received Best Paper Awards from WISE 2012&2013, ICDE 2015&2019, DASFAA 2016 and ADC 2019. He was Program Committee Chair of IEEE International Conference on Data Engineering (ICDE 2013), ACM International Conference on Information and Knowledge Management (CIKM 2016), and International Conference on Very Large Databases (PVLDB 2020). Professor Zhou is a Global STEM Scholar of Hong Kong and an IEEE Fellow.

Crowdsourcing and its Data-Driving Methodology

Abstract

As the popularity of mobile internet and smart devices increases, Crowdsourcing,a human-centric paradigm for performing tasks, has drawn rising attention. As a collaborative approach that utilizes the collective efforts of individuals to accomplish complex jobs, crowdsourcing scenarios are often modeled as combination optimization problems, which poses challenges in algorithm design and face with the difficulty of high computational complexity. In recent years, data-driven approaches to crowdsourcing problem-solving are becoming popular. This talk provides an overview of crowdsourcing, highlighting a series of achievements in the field of crowdsourcing problem-solving methods, and research progresses by our team.

Biography

Guihai Chen is a distinguished professor of Shanghai Jiao Tong University. He is IEEE Fellow and CCF Fellow. He earned his Ph.D. degree in computer science from the University of Hong Kong in 1997. He had been invited as a visiting professor by Kyushu Institute of Technology in Japan, University of Queensland in Australia and Wayne State University in USA. He has a wide range of research interests with focus on parallel computing, wireless networks, data engineering and AI technology. He has published more than 700 peer-reviewed papers, including more than 100 ACM/IEEE Transactions papers. He has won 14 best paper awards. His papers are cited for more than 21000 times.

Distributed Machine Learning System for Big Models

Abstract

Machine/Deep learning (ML/DL) systems are important foundations for artificial intelligence and have attracted a lot of attention in academia and industry in recent years. The increasing scale of Deep Learning models (e.g., ChatGPT) and data brings severe challenges to existing systems, and distributed deep learning systems are becoming more and more important. As the intersection of ML/DL and systems, it is necessary to pay attention not only to the data characteristics, model structures, training methods, and optimization algorithms, but also to the execution problems in the computing, storage, communication, scheduling, and hardware of the system. In this talk, I will introduce the current development of "big models" and then share our efforts on the system optimizations for distributed training of big models, as well as the explorations of automated parallel training. Based on these efforts, I will also briefly present our open-sourced system -- Hetu, a new distributed deep learning system for large-scale model training.

Biography

Bin Cui is a professor and Vice Dean in School of CS at Peking University. His research interests include database system, big data management and analytics, and ML system. He has regularly served in the Technical Program Committee of various international conferences including SIGMOD, VLDB and KDD, and is the Editor-in-Chief of Data Science and Engineering, also in the Editorial Board of Distributed and Parallel Databases, Journal of Computer Science and Technology, and SCIENCE CHINA Information Sciences, and was an associate editor of IEEE TKDE and VLDB Journal, and Trustee Board Member of VLDB Endowment. He is serving as Vice Chair of Technical Committee on Database (CCF). He was awarded Microsoft Young Professorship award (MSRA 2008), CCF Young Scientist award (2009), Second Prize of Natural Science Award of MOE China (2014), and appointed as Cheung Kong distinguished Professor by MOE China in 2016.

ADMA 2023 - International Conference on Advanced Data Mining and Applications, Shenyang, China