Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Highlights on Large Language Models at KDD 2023

Gabriel MoreiraFollowTowards Data Science--ListenShareA few weeks ago I had the opportunity to attend ACM SIGKDD (short for KDD) for the first time. KDD 2023 took place in Long Beach, CA, and is the oldest and most important academic conference in the data mining field, pioneering topics related to data science and big data.It spanned 5 days and was attended by over 2,200 people, with a strong presence of attendees from industry. I was impressed by the diversity of topics covered, but the hot ones from my perspective were Large Language Models (LLMs) and Graph Learning. Also found lots of content on RecSys, for which I have some special attention.In this post, I summarize my highlights about Llms from workshops, tutorials, and paper presentations that I attended and liked, with links to online resources for additional information. Warning: long post full of resource links ahead!Ed H. Chi, a distinguished scientist, and director at Google, presented at a much-awaited keynote on The LLM Revolution. He reflected on the tech revolutions we faced from the internet, through mobile devices, the rise of deep learning, and now LLMs, which is by far mind-blowing.He talked about what makes human intelligence different from ML - (1) learning from a few examples, (2) explaining their predictions/decisions, (3) strong out-of-distribution generalization abilities — and how LLM can finally start filling this gap.He then talked about the techniques that are making LLM able to perform some reasoning: (1) chain-of-thought prompting, (2) self-consistency, (3) least-to-most prompting, and (4) instruction fine-tuning. More on this in the talk of Denny Zhou on the LLM day (next section).Finally, he shared his vision on what are the next challenges for LLMs: (1) responsibility and safety, (2) Factuality, Grounding, and Attribution, (3) Human AI content loop an Ecosystem, and (4) Personalization and user memory.KDD devoted a special day to LLM, with 5 distinct researchers giving longer talks about how Microsoft, Google DeepMind, Meta, Zhipu AI, and OpenAI have been pushing forward the LLM technology, challenges, and what they foresee as the future evolution in this area. The presentation slides are available and are highly recommended.The talk covered different research and applied topics on LLM quality issues (e.g. how to deal with low-resourced languages), efficient training on the cloud, Retrieval-Augmented Generation (RAG) as a sustainable way to leverage private Knowledge Bases (KB), differential privacy for fine-tuning, good practices on prompt engineering and chat log analysis.He focused on the holy grail for ML — Reasoning — as a way to learn from only a few examples. Some core techniques that make LLMs so powerful were summarized:He presented Meta journey in training Llama foundation models and fine-tuning with instructions using SFT data (high-quality 27k collected samples). Their reward Model was trained on 1M collected samples. He also describes their Iterative Finetuning with RLHF, evaluation (human, safety). He finished the presentation by talking about challenges ahead training and deploying LLMs, which I transcribed here:I got to know Zhipu AI, a company that is challenging Open.ai for the Chinese language. They had a strong presence at KDD as the Diamond Sponsor and delivered a keynote at the Banquet Celebration. Zhipu presented results that show that they are the best LLM for Chinese in many tasks, even better than GPT-4. They described how they developed ChatGLM and VisualGLRM on top of their base model (GLM-130B). They open-sourced the ChatGLM-6B on HuggingFace.Very grounded talk on the scaling laws for reaching the current state of LLMs and the emerging abilities (including reasoning) that can be observed when LLMs get higher than 100B parameters. Also talked about reasoning via prompting techniques: Chain-of-Thought and Least-to-most prompting.I think that the LLM-AI workshop was the most disputed one at the conference. I literally could not join it in the morning, as a crowd had completely filled the small room right after the KDD morning keynote. Fortunately, I could find a seat right after the coffee break and could attend a few sessions.He described SalesForce’s XGen LLM — an in-house JaxFormer library, which follows LLaMA-7B and is instruction-tuned with WizardLM, which can answer questions based on unstructured and structured data (e.g. Spark and SQL databases). Also presented some techniques they use for reasoning preparation, for breaking down the questions with Chain-of-Thought, and for selecting the most relevant knowledge base by training a model for Adaptive Query Generation with LoRA on natural sentences, SPARQL, and SQL. That process generates a query for each reasoning step, which is executed on the knowledge source.This talk introduced IBM’s foundation models: (1) Sandstone — encoder-decoder architecture well suited for fine-tuning for specific tasks, (2) Granite — decoder-only, GPT-like for generative tasks, (3) Obsidian — A new modular architecture providing high inference efficiency and levels of performance across a variety of tasksHe also described some challenges they faced with LLM:They present their ModuleFormer, which addresses the above problems powered by a Sparse Mixture of Experts (SMoE). It can activate only a subset of its modules for each input and is more immune to catastrophic forgetting than dense LLMs. And finetuning ModuleFormer can specialize a subset of the modules and the task-unrelated modules can be pruned for lightweight deployment.These tutorials were presented at the same time, so I had to split my time to catch a bit of both. Fortunately, their great slides were made available and are very detailed.Very comprehensive tutorial on Intelligent Assistants that are multi-modal and can leverage as context the user location, what users can hear and see (e.g. using Google Glasses, Meta Quest 2). The tutorial described how the different modules are connected: ASR, CV, NLU, Dialog State Tracker, NLG, TTS, KB, Personalization/ Recommendation, and Privacy-preserving, among others.Covered advances in pre-training language models, compared them to traditional NLU tasks and described how LLMs can be used to extract entities and hierarchical relations, topics discovery, and document understanding. A good insight I got from this tutorial was using some NLU techniques to evaluate if a generated answer addresses the question.Here is a short list of some NLP / LLM papers I enjoyed.This great paper combines lexical and semantic retrieval systems. They build their solution on top of lexical retrievers by proposing a Term Weighting BERT (TW-BERT) model. TW-BERT learns to predict the weight for individual n-gram (e.g., uni-grams and bi-grams) query input terms. These inferred weights and terms can be used directly by a retrieval system to perform a query search. The learned weights can be easily utilized by standard lexical retrievers (e.g. BM25) and by other retrieval techniques such as query expansion.Another interesting proposal is to unify dense-vector and lexicon-based retrieval in one model with a dual-representing capability. It is trained with a two-stage self-learning pipeline and improves upon state-of-the-art lexical and dense retrieval models.Typically in multi-turn conversations, the historical queries are used to expand the current query. However, not all previous queries are related to or useful to expand the next question. The paper proposes a method for selecting relevant historical queries that are useful for the current query. They use a pseudo-labeling mechanism to annotate the relevant historical queries and train a selection model together with the retriever trainingDescribes how noisy private KB can be used by RAG-based dialogs. They propose a novel evaluation method to allow humans to converse with multiple deployed bots simultaneously and compare their performance implicitly instead of explicitly rating using multidimensional metrics.This paper describes how Home Depot is improving semantic search for e-commerce by using a cluster-specific language model instead of the typical bi-encoder architecture. Their method first maps the user query to clusters using K-Means and uses the selected cluster-specific language model for retrieval.These were my highlights on LLMs from KDD 2023. I hope you can find some useful information and inspiration from this summary and the resources I compiled.“Sorry for the long [post]. If I had more time, I would have written a shorter [one]” :)----Towards Data ScienceGabriel Moreira is a Phd, GDE and Senior Research Scientist at NVIDIA working on the intersection of Deep Learning and Recommender Systems.Gabriel MoreirainNVIDIA Merlin--2Damian GilinTowards Data Science--21Giuseppe ScalamognainTowards Data Science--17Gabriel MoreirainNVIDIA Merlin--1Heiko HotzinTowards Data Science--17Haifeng Li--Ryan NguyeninTowards AI--2Jayita BhattacharyyainGoPenAI--1AL Anany--349ai geek (wishesh)--1HelpStatusWritersBlogCareersPrivacyTermsAboutText to speechTeams



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

Highlights on Large Language Models at KDD 2023

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×